Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

On-Line Frame-Synchronous Noise Compensation

Identifieur interne : 007A79 ( Main/Merge ); précédent : 007A78; suivant : 007A80

On-Line Frame-Synchronous Noise Compensation

Auteurs : Vincent Barreaud ; Irina Illina ; Dominique Fohr

Source :

RBID : CRIN:barreaud03b

English descriptors

Abstract

In real life speech recognition applications, mismatch between training and testing data is known to degrade performances. This mismatch is mostly due to various and unknown noise sources that corrupt incoming features. Moreover, the mismatch function cannot be considered as stationary. There are two possible approaches to enhance speech in a robust manner, when stochastic models are used for recognition. First, adaptation techniques, such as Parallel Model Combination (PMC), propose to modify the parameters of the HMMs to make the transformed stochastic models better characterize the distorted features. Second, the corrupted features can be compensated with a transformation estimated from the noise characteristics. This second approach gathers techniques such as Cepstral Mean Subtraction (CMS), Spectral Subtraction (SS) and Stochastic Matching. The method developed here belongs to this category. Frame synchronous algorithms are usually used to cope with non-stationary noise sources and are naturally appealing. The most popular frame synchronous technique is CMS : the mean of the incoming sequence of cepstra is computed and subtracted to the next observation. We believe that this method can be enhanced by taking into account statistics of the HMMs used during the recognition. For each time-frame, a transformation is applied to the incoming noisy feature in order to compensate the action of the environment. This transformed feature is then integrated in the Viterbi process : a forward probability is computed for every state of the models. The largest forward probability gives the most probable emitting state giving the known set of previous observations. The distance from this state to the transformed feature is then used to re-estimate the transformation to be applied to the next noisy feature. Thus, this on-line algorithm performs compensation in parallel with recognition and does not make any hypothesis on the nature of the noise or perform any specific models training, contrary to CMS, SS or adaptation techniques. Simple transformations such as bias or linear functions give good results. More complex solutions, such as model-specific transforms could be studied. Our noise compensation algorithm is evaluated on the VODIS database recorded in a moving car. For each task, our technique outperforms significantly the classical methods. For instance, the algorithm gave an error rate improvement of 9.48% on PMC, 12.25% on SS and 27.74% on CMS for the phonetical numbers recognition task.

Links toward previous steps (curation, corpus...)


Links to Exploration step

CRIN:barreaud03b

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" wicri:score="293">On-Line Frame-Synchronous Noise Compensation</title>
</titleStmt>
<publicationStmt>
<idno type="RBID">CRIN:barreaud03b</idno>
<date when="2003" year="2003">2003</date>
<idno type="wicri:Area/Crin/Corpus">003782</idno>
<idno type="wicri:Area/Crin/Curation">003782</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Curation">003782</idno>
<idno type="wicri:Area/Crin/Checkpoint">000C32</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Checkpoint">000C32</idno>
<idno type="wicri:Area/Main/Merge">007A79</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">On-Line Frame-Synchronous Noise Compensation</title>
<author>
<name sortKey="Barreaud, Vincent" sort="Barreaud, Vincent" uniqKey="Barreaud V" first="Vincent" last="Barreaud">Vincent Barreaud</name>
</author>
<author>
<name sortKey="Illina, Irina" sort="Illina, Irina" uniqKey="Illina I" first="Irina" last="Illina">Irina Illina</name>
</author>
<author>
<name sortKey="Fohr, Dominique" sort="Fohr, Dominique" uniqKey="Fohr D" first="Dominique" last="Fohr">Dominique Fohr</name>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>compensation</term>
<term>hmm</term>
<term>speech</term>
<term>stochastic matching</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en" wicri:score="8108">In real life speech recognition applications, mismatch between training and testing data is known to degrade performances. This mismatch is mostly due to various and unknown noise sources that corrupt incoming features. Moreover, the mismatch function cannot be considered as stationary. There are two possible approaches to enhance speech in a robust manner, when stochastic models are used for recognition. First, adaptation techniques, such as Parallel Model Combination (PMC), propose to modify the parameters of the HMMs to make the transformed stochastic models better characterize the distorted features. Second, the corrupted features can be compensated with a transformation estimated from the noise characteristics. This second approach gathers techniques such as Cepstral Mean Subtraction (CMS), Spectral Subtraction (SS) and Stochastic Matching. The method developed here belongs to this category. Frame synchronous algorithms are usually used to cope with non-stationary noise sources and are naturally appealing. The most popular frame synchronous technique is CMS : the mean of the incoming sequence of cepstra is computed and subtracted to the next observation. We believe that this method can be enhanced by taking into account statistics of the HMMs used during the recognition. For each time-frame, a transformation is applied to the incoming noisy feature in order to compensate the action of the environment. This transformed feature is then integrated in the Viterbi process : a forward probability is computed for every state of the models. The largest forward probability gives the most probable emitting state giving the known set of previous observations. The distance from this state to the transformed feature is then used to re-estimate the transformation to be applied to the next noisy feature. Thus, this on-line algorithm performs compensation in parallel with recognition and does not make any hypothesis on the nature of the noise or perform any specific models training, contrary to CMS, SS or adaptation techniques. Simple transformations such as bias or linear functions give good results. More complex solutions, such as model-specific transforms could be studied. Our noise compensation algorithm is evaluated on the VODIS database recorded in a moving car. For each task, our technique outperforms significantly the classical methods. For instance, the algorithm gave an error rate improvement of 9.48% on PMC, 12.25% on SS and 27.74% on CMS for the phonetical numbers recognition task.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 007A79 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 007A79 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     CRIN:barreaud03b
   |texte=   On-Line Frame-Synchronous Noise Compensation
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022