Codebook Design for Speech Guided Car Infotainment Systems
Identifieur interne : 001151 ( Istex/Curation ); précédent : 001150; suivant : 001152Codebook Design for Speech Guided Car Infotainment Systems
Auteurs : Martin Raab [Allemagne] ; Rainer Gruhn [Allemagne] ; Elmar Noeth [Allemagne]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2008.
English descriptors
- Teeft :
- Additional gaussians, Additional language, Additional languages, Algorithm, Baseline, City names, Codebook, Codebook design, Codebooks, Database, English codebook, Experimental setup, Foreign names, Future work, Gaussians, German codebook, Gruhn, Hiwire, Hiwire data, Hiwire database, Human input, Infotainment, Infotainment scenario, Infotainment systems, Initial codebooks, Main language, Main language codebook, Main language performance, Maximum accuracy, Multilingual, Multilingual input, Multilingual recognition, Multilingual speech recognition, Multiple languages, Music titles, Mwcs, Native english codebook, Native speech, Nearest neighbor connections, Nonnative speech, Other words, Quantization, Raab, Results show, Same time, Sound patterns, Speech recognition, Speech recognizers, Such collections, Such systems, Training samples, Vector quantization, Word accuracies.
Abstract
Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.
Url:
DOI: 10.1007/978-3-540-69369-7_6
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: Pour aller vers cette notice dans l'étape Curation :001240
Links to Exploration step
ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DCLe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation wicri:level="1"><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
<affiliation><mods:affiliation>E-mail: mraab@harmanbecker.com</mods:affiliation>
<wicri:noCountry code="no comma">E-mail: mraab@harmanbecker.com</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation wicri:level="1"><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Information Technology, University of Ulm, Ulm</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation wicri:level="1"><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-69369-7_6</idno>
<idno type="url">https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001240</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001240</idno>
<idno type="wicri:Area/Istex/Curation">001151</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation wicri:level="1"><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
<affiliation><mods:affiliation>E-mail: mraab@harmanbecker.com</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation wicri:level="1"><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Information Technology, University of Ulm, Ulm</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation wicri:level="1"><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Additional gaussians</term>
<term>Additional language</term>
<term>Additional languages</term>
<term>Algorithm</term>
<term>Baseline</term>
<term>City names</term>
<term>Codebook</term>
<term>Codebook design</term>
<term>Codebooks</term>
<term>Database</term>
<term>English codebook</term>
<term>Experimental setup</term>
<term>Foreign names</term>
<term>Future work</term>
<term>Gaussians</term>
<term>German codebook</term>
<term>Gruhn</term>
<term>Hiwire</term>
<term>Hiwire data</term>
<term>Hiwire database</term>
<term>Human input</term>
<term>Infotainment</term>
<term>Infotainment scenario</term>
<term>Infotainment systems</term>
<term>Initial codebooks</term>
<term>Main language</term>
<term>Main language codebook</term>
<term>Main language performance</term>
<term>Maximum accuracy</term>
<term>Multilingual</term>
<term>Multilingual input</term>
<term>Multilingual recognition</term>
<term>Multilingual speech recognition</term>
<term>Multiple languages</term>
<term>Music titles</term>
<term>Mwcs</term>
<term>Native english codebook</term>
<term>Native speech</term>
<term>Nearest neighbor connections</term>
<term>Nonnative speech</term>
<term>Other words</term>
<term>Quantization</term>
<term>Raab</term>
<term>Results show</term>
<term>Same time</term>
<term>Sound patterns</term>
<term>Speech recognition</term>
<term>Speech recognizers</term>
<term>Such collections</term>
<term>Such systems</term>
<term>Training samples</term>
<term>Vector quantization</term>
<term>Word accuracies</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001151 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 001151 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Istex |étape= Curation |type= RBID |clé= ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC |texte= Codebook Design for Speech Guided Car Infotainment Systems }}
This area was generated with Dilib version V0.6.33. |