Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Codebook Design for Speech Guided Car Infotainment Systems

Identifieur interne : 001151 ( Istex/Curation ); précédent : 001150; suivant : 001152

Codebook Design for Speech Guided Car Infotainment Systems

Auteurs : Martin Raab [Allemagne] ; Rainer Gruhn [Allemagne] ; Elmar Noeth [Allemagne]

Source :

RBID : ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC

English descriptors

Abstract

Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.

Url:
DOI: 10.1007/978-3-540-69369-7_6

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author>
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation wicri:level="1">
<mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: mraab@harmanbecker.com</mods:affiliation>
<wicri:noCountry code="no comma">E-mail: mraab@harmanbecker.com</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation wicri:level="1">
<mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Information Technology, University of Ulm, Ulm</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation wicri:level="1">
<mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-69369-7_6</idno>
<idno type="url">https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001240</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001240</idno>
<idno type="wicri:Area/Istex/Curation">001151</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author>
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation wicri:level="1">
<mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: mraab@harmanbecker.com</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation wicri:level="1">
<mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Information Technology, University of Ulm, Ulm</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation wicri:level="1">
<mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="Teeft" xml:lang="en">
<term>Additional gaussians</term>
<term>Additional language</term>
<term>Additional languages</term>
<term>Algorithm</term>
<term>Baseline</term>
<term>City names</term>
<term>Codebook</term>
<term>Codebook design</term>
<term>Codebooks</term>
<term>Database</term>
<term>English codebook</term>
<term>Experimental setup</term>
<term>Foreign names</term>
<term>Future work</term>
<term>Gaussians</term>
<term>German codebook</term>
<term>Gruhn</term>
<term>Hiwire</term>
<term>Hiwire data</term>
<term>Hiwire database</term>
<term>Human input</term>
<term>Infotainment</term>
<term>Infotainment scenario</term>
<term>Infotainment systems</term>
<term>Initial codebooks</term>
<term>Main language</term>
<term>Main language codebook</term>
<term>Main language performance</term>
<term>Maximum accuracy</term>
<term>Multilingual</term>
<term>Multilingual input</term>
<term>Multilingual recognition</term>
<term>Multilingual speech recognition</term>
<term>Multiple languages</term>
<term>Music titles</term>
<term>Mwcs</term>
<term>Native english codebook</term>
<term>Native speech</term>
<term>Nearest neighbor connections</term>
<term>Nonnative speech</term>
<term>Other words</term>
<term>Quantization</term>
<term>Raab</term>
<term>Results show</term>
<term>Same time</term>
<term>Sound patterns</term>
<term>Speech recognition</term>
<term>Speech recognizers</term>
<term>Such collections</term>
<term>Such systems</term>
<term>Training samples</term>
<term>Vector quantization</term>
<term>Word accuracies</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001151 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 001151 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC
   |texte=   Codebook Design for Speech Guided Car Infotainment Systems
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024