Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech

Identifieur interne : 00E103 ( Main/Merge ); précédent : 00E102; suivant : 00E104

Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech

Auteurs : Y. Gong ; Jean-Paul Haton [France]

Source :

RBID : CRIN:gong91b

English descriptors

Abstract

The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.

Links toward previous steps (curation, corpus...)


Links to Exploration step

CRIN:gong91b

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" wicri:score="654">Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
</titleStmt>
<publicationStmt>
<idno type="RBID">CRIN:gong91b</idno>
<date when="1991" year="1991">1991</date>
<idno type="wicri:Area/Crin/Corpus">000C90</idno>
<idno type="wicri:Area/Crin/Curation">000C90</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Curation">000C90</idno>
<idno type="wicri:Area/Crin/Checkpoint">003902</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Checkpoint">003902</idno>
<idno type="wicri:Area/Main/Merge">00E103</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
<author>
<name sortKey="Gong, Y" sort="Gong, Y" uniqKey="Gong Y" first="Y." last="Gong">Y. Gong</name>
</author>
<author>
<name sortKey="Haton, J P" sort="Haton, J P" uniqKey="Haton J" first="J.-P." last="Haton">Jean-Paul Haton</name>
<affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>interpolation</term>
<term>neural networks</term>
<term>speaker identification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en" wicri:score="4552">The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 00E103 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 00E103 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     CRIN:gong91b
   |texte=   Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022