InforLorV4, Crin, Curation, bibRecord, 000C90

Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech

Identifieur interne : 000C90 ( Crin/Curation ); précédent : 000C89; suivant : 000C91

Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech

Auteurs : Y. Gong ; J.-P. Haton

Source :

RBID : CRIN:gong91b

English descriptors

KwdEn :
- interpolation, neural networks, speaker identification.

Abstract

The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.

Links toward previous steps (curation, corpus...)

to stream Crin, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000C90

Links to Exploration step

CRIN:gong91b

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" wicri:score="654">Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
</titleStmt>
<publicationStmt><idno type="RBID">CRIN:gong91b</idno>
<date when="1991" year="1991">1991</date>
<idno type="wicri:Area/Crin/Corpus">000C90</idno>
<idno type="wicri:Area/Crin/Curation">000C90</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Curation">000C90</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
<author><name sortKey="Gong, Y" sort="Gong, Y" uniqKey="Gong Y" first="Y." last="Gong">Y. Gong</name>
</author>
<author><name sortKey="Haton, J P" sort="Haton, J P" uniqKey="Haton J" first="J.-P." last="Haton">J.-P. Haton</name>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>interpolation</term>
<term>neural networks</term>
<term>speaker identification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en" wicri:score="4552">The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.</div>
</front>
</TEI>
<BibTex type="inproceedings"><ref>gong91b</ref>
<crinnumber>91-R-242</crinnumber>
<category>3</category>
<equipe>RFIA</equipe>
<author><e>Gong, Y.</e>
<e>Haton, J.-P.</e>
</author>
<title>Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech</title>
<booktitle>{Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto (Canada)}</booktitle>
<year>1991</year>
<volume>1</volume>
<pages>121-124</pages>
<month>may</month>
<keywords><e>neural networks</e>
<e>speaker identification</e>
<e>interpolation</e>
</keywords>
<abstract>The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For continuous speech under phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. {\em vector-pair} interpolator models yield best recognition rate. Compared to a VQ-coded reference comparison technique, this model gives close global recognition rate and significantly outperforms for plosive sounds.</abstract>
</BibTex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Crin/Curation

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C90 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Crin/Curation/biblio.hfd -nk 000C90 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Crin
   |étape=   Curation
   |type=    RBID
   |clé=     CRIN:gong91b
   |texte=   Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech

Non-Linear Vector Interpolation by Neural Network for Phoneme Identification in Continuous Speech

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri