Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech
Identifieur interne : 000C90 ( Crin/Curation ); précédent : 000C89; suivant : 000C91Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech
Auteurs : Y. Gong ; J.-P. HatonSource :
English descriptors
Abstract
The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.
Links toward previous steps (curation, corpus...)
- to stream Crin, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000C90
Links to Exploration step
CRIN:gong91bLe document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" wicri:score="654">Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
</titleStmt>
<publicationStmt><idno type="RBID">CRIN:gong91b</idno>
<date when="1991" year="1991">1991</date>
<idno type="wicri:Area/Crin/Corpus">000C90</idno>
<idno type="wicri:Area/Crin/Curation">000C90</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Curation">000C90</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
<author><name sortKey="Gong, Y" sort="Gong, Y" uniqKey="Gong Y" first="Y." last="Gong">Y. Gong</name>
</author>
<author><name sortKey="Haton, J P" sort="Haton, J P" uniqKey="Haton J" first="J.-P." last="Haton">J.-P. Haton</name>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>interpolation</term>
<term>neural networks</term>
<term>speaker identification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en" wicri:score="4552">The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.</div>
</front>
</TEI>
<BibTex type="inproceedings"><ref>gong91b</ref>
<crinnumber>91-R-242</crinnumber>
<category>3</category>
<equipe>RFIA</equipe>
<author><e>Gong, Y.</e>
<e>Haton, J.-P.</e>
</author>
<title>Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
<booktitle>{Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto (Canada)}</booktitle>
<year>1991</year>
<volume>1</volume>
<pages>121-124</pages>
<month>may</month>
<keywords><e>neural networks</e>
<e>speaker identification</e>
<e>interpolation</e>
</keywords>
<abstract>The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.</abstract>
</BibTex>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Crin/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C90 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Crin/Curation/biblio.hfd -nk 000C90 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Crin |étape= Curation |type= RBID |clé= CRIN:gong91b |texte= Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech }}
![]() | This area was generated with Dilib version V0.6.33. | ![]() |