MusicSarreV3, Istex, Corpus, bibRecord, 001240

Codebook Design for Speech Guided Car Infotainment Systems

Identifieur interne : 001240 ( Istex/Corpus ); précédent : 001239; suivant : 001241

Codebook Design for Speech Guided Car Infotainment Systems

Auteurs : Martin Raab ; Rainer Gruhn ; Elmar Noeth

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2008.

RBID : ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC

English descriptors

Teeft :
- Additional gaussians, Additional language, Additional languages, Algorithm, Baseline, City names, Codebook, Codebook design, Codebooks, Database, English codebook, Experimental setup, Foreign names, Future work, Gaussians, German codebook, Gruhn, Hiwire, Hiwire data, Hiwire database, Human input, Infotainment, Infotainment scenario, Infotainment systems, Initial codebooks, Main language, Main language codebook, Main language performance, Maximum accuracy, Multilingual, Multilingual input, Multilingual recognition, Multilingual speech recognition, Multiple languages, Music titles, Mwcs, Native english codebook, Native speech, Nearest neighbor connections, Nonnative speech, Other words, Quantization, Raab, Results show, Same time, Sound patterns, Speech recognition, Speech recognizers, Such collections, Such systems, Training samples, Vector quantization, Word accuracies.

Abstract

Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.

Url:

https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf

DOI: 10.1007/978-3-540-69369-7_6

Links to Exploration step

ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: mraab@harmanbecker.com</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-69369-7_6</idno>
<idno type="url">https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001240</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001240</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: mraab@harmanbecker.com</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation><mods:affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation><mods:affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Additional gaussians</term>
<term>Additional language</term>
<term>Additional languages</term>
<term>Algorithm</term>
<term>Baseline</term>
<term>City names</term>
<term>Codebook</term>
<term>Codebook design</term>
<term>Codebooks</term>
<term>Database</term>
<term>English codebook</term>
<term>Experimental setup</term>
<term>Foreign names</term>
<term>Future work</term>
<term>Gaussians</term>
<term>German codebook</term>
<term>Gruhn</term>
<term>Hiwire</term>
<term>Hiwire data</term>
<term>Hiwire database</term>
<term>Human input</term>
<term>Infotainment</term>
<term>Infotainment scenario</term>
<term>Infotainment systems</term>
<term>Initial codebooks</term>
<term>Main language</term>
<term>Main language codebook</term>
<term>Main language performance</term>
<term>Maximum accuracy</term>
<term>Multilingual</term>
<term>Multilingual input</term>
<term>Multilingual recognition</term>
<term>Multilingual speech recognition</term>
<term>Multiple languages</term>
<term>Music titles</term>
<term>Mwcs</term>
<term>Native english codebook</term>
<term>Native speech</term>
<term>Nearest neighbor connections</term>
<term>Nonnative speech</term>
<term>Other words</term>
<term>Quantization</term>
<term>Raab</term>
<term>Results show</term>
<term>Same time</term>
<term>Sound patterns</term>
<term>Speech recognition</term>
<term>Speech recognizers</term>
<term>Such collections</term>
<term>Such systems</term>
<term>Training samples</term>
<term>Vector quantization</term>
<term>Word accuracies</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</div>
</front>
</TEI>
<istex><corpusName>springer</corpusName>
<keywords><teeft><json:string>codebook</json:string>
<json:string>multilingual</json:string>
<json:string>gaussians</json:string>
<json:string>raab</json:string>
<json:string>mwcs</json:string>
<json:string>german codebook</json:string>
<json:string>infotainment</json:string>
<json:string>hiwire</json:string>
<json:string>main language</json:string>
<json:string>quantization</json:string>
<json:string>codebooks</json:string>
<json:string>gruhn</json:string>
<json:string>additional languages</json:string>
<json:string>database</json:string>
<json:string>speech recognition</json:string>
<json:string>codebook design</json:string>
<json:string>city names</json:string>
<json:string>multiple languages</json:string>
<json:string>infotainment systems</json:string>
<json:string>algorithm</json:string>
<json:string>main language codebook</json:string>
<json:string>baseline</json:string>
<json:string>additional gaussians</json:string>
<json:string>nonnative speech</json:string>
<json:string>vector quantization</json:string>
<json:string>word accuracies</json:string>
<json:string>english codebook</json:string>
<json:string>native english codebook</json:string>
<json:string>music titles</json:string>
<json:string>initial codebooks</json:string>
<json:string>same time</json:string>
<json:string>other words</json:string>
<json:string>main language performance</json:string>
<json:string>experimental setup</json:string>
<json:string>future work</json:string>
<json:string>additional language</json:string>
<json:string>results show</json:string>
<json:string>multilingual speech recognition</json:string>
<json:string>sound patterns</json:string>
<json:string>nearest neighbor connections</json:string>
<json:string>native speech</json:string>
<json:string>multilingual recognition</json:string>
<json:string>hiwire data</json:string>
<json:string>multilingual input</json:string>
<json:string>hiwire database</json:string>
<json:string>human input</json:string>
<json:string>maximum accuracy</json:string>
<json:string>training samples</json:string>
<json:string>such collections</json:string>
<json:string>foreign names</json:string>
<json:string>speech recognizers</json:string>
<json:string>such systems</json:string>
<json:string>infotainment scenario</json:string>
</teeft>
</keywords>
<author><json:item><name>Martin Raab</name>
<affiliations><json:string>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</json:string>
<json:string>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</json:string>
<json:string>E-mail: mraab@harmanbecker.com</json:string>
</affiliations>
</json:item>
<json:item><name>Rainer Gruhn</name>
<affiliations><json:string>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</json:string>
<json:string>Dept. of Information Technology, University of Ulm, Ulm, Germany</json:string>
</affiliations>
</json:item>
<json:item><name>Elmar Noeth</name>
<affiliations><json:string>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</json:string>
</affiliations>
</json:item>
</author>
<language><json:string>eng</json:string>
</language>
<originalGenre><json:string>OriginalPaper</json:string>
</originalGenre>
<abstract>Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</abstract>
<qualityIndicators><score>6.052</score>
<pdfWordCount>2728</pdfWordCount>
<pdfCharCount>16483</pdfCharCount>
<pdfVersion>1.6</pdfVersion>
<pdfPageCount>8</pdfPageCount>
<pdfPageSize>430 x 660 pts</pdfPageSize>
<refBibsNative>false</refBibsNative>
<abstractWordCount>152</abstractWordCount>
<abstractCharCount>1041</abstractCharCount>
<keywordCount>0</keywordCount>
</qualityIndicators>
<title>Codebook Design for Speech Guided Car Infotainment Systems</title>
<chapterId><json:string>6</json:string>
<json:string>Chap6</json:string>
</chapterId>
<genre><json:string>conference</json:string>
</genre>
<serie><title>Lecture Notes in Computer Science</title>
<language><json:string>unknown</json:string>
</language>
<copyrightDate>2008</copyrightDate>
<issn><json:string>0302-9743</json:string>
</issn>
<eissn><json:string>1611-3349</json:string>
</eissn>
</serie>
<host><title>Perception in Multimodal Dialogue Systems</title>
<language><json:string>unknown</json:string>
</language>
<copyrightDate>2008</copyrightDate>
<doi><json:string>10.1007/978-3-540-69369-7</json:string>
</doi>
<issn><json:string>0302-9743</json:string>
</issn>
<eissn><json:string>1611-3349</json:string>
</eissn>
<eisbn><json:string>978-3-540-69369-7</json:string>
</eisbn>
<bookId><json:string>978-3-540-69369-7</json:string>
</bookId>
<isbn><json:string>978-3-540-69368-0</json:string>
</isbn>
<volume>5078</volume>
<pages><first>44</first>
<last>51</last>
</pages>
<genre><json:string>book-series</json:string>
</genre>
<editor><json:item><name>Elisabeth André</name>
</json:item>
<json:item><name>Laila Dybkjær</name>
</json:item>
<json:item><name>Wolfgang Minker</name>
</json:item>
<json:item><name>Heiko Neumann</name>
</json:item>
<json:item><name>Roberto Pieraccini</name>
</json:item>
<json:item><name>Michael Weber</name>
</json:item>
</editor>
<subject><json:item><value>Computer Science</value>
</json:item>
<json:item><value>Computer Science</value>
</json:item>
<json:item><value>Artificial Intelligence (incl. Robotics)</value>
</json:item>
<json:item><value>Language Translation and Linguistics</value>
</json:item>
<json:item><value>User Interfaces and Human Computer Interaction</value>
</json:item>
<json:item><value>Computers and Society</value>
</json:item>
<json:item><value>Image Processing and Computer Vision</value>
</json:item>
</subject>
</host>
<categories><inist><json:string>sciences humaines et sociales</json:string>
</inist>
</categories>
<publicationDate>2008</publicationDate>
<copyrightDate>2008</copyrightDate>
<doi><json:string>10.1007/978-3-540-69369-7_6</json:string>
</doi>
<id>B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</id>
<score>1</score>
<fulltext><json:item><extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf</uri>
</json:item>
<json:item><extension>zip</extension>
<original>false</original>
<mimetype>application/zip</mimetype>
<uri>https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/tei"><teiHeader><fileDesc><titleStmt><title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
</titleStmt>
<publicationStmt><authority>ISTEX</authority>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<availability><p>Springer-Verlag Berlin Heidelberg, 2008</p>
</availability>
<date>2008</date>
</publicationStmt>
<sourceDesc><biblStruct type="inbook"><analytic><title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author xml:id="author-0000"><persName><forename type="first">Martin</forename>
<surname>Raab</surname>
</persName>
<email>mraab@harmanbecker.com</email>
<affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</affiliation>
<affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</affiliation>
</author>
<author xml:id="author-0001"><persName><forename type="first">Rainer</forename>
<surname>Gruhn</surname>
</persName>
<affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</affiliation>
<affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</affiliation>
</author>
<author xml:id="author-0002"><persName><forename type="first">Elmar</forename>
<surname>Noeth</surname>
</persName>
<affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</affiliation>
</author>
<idno type="istex">B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</idno>
<idno type="DOI">10.1007/978-3-540-69369-7_6</idno>
<idno type="ChapterID">6</idno>
<idno type="ChapterID">Chap6</idno>
</analytic>
<monogr><title level="m">Perception in Multimodal Dialogue Systems</title>
<title level="m" type="sub">4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, June 16-18, 2008. Proceedings</title>
<idno type="DOI">10.1007/978-3-540-69369-7</idno>
<idno type="pISBN">978-3-540-69368-0</idno>
<idno type="eISBN">978-3-540-69369-7</idno>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="book-title-ID">164793</idno>
<idno type="book-ID">978-3-540-69369-7</idno>
<idno type="book-chapter-count">37</idno>
<idno type="book-volume-number">5078</idno>
<idno type="book-sequence-number">5078</idno>
<idno type="PartChapterCount">6</idno>
<editor xml:id="book-author-0000"><persName><forename type="first">Elisabeth</forename>
<surname>André</surname>
</persName>
</editor>
<editor xml:id="book-author-0001"><persName><forename type="first">Laila</forename>
<surname>Dybkjær</surname>
</persName>
</editor>
<editor xml:id="book-author-0002"><persName><forename type="first">Wolfgang</forename>
<surname>Minker</surname>
</persName>
</editor>
<editor xml:id="book-author-0003"><persName><forename type="first">Heiko</forename>
<surname>Neumann</surname>
</persName>
</editor>
<editor xml:id="book-author-0004"><persName><forename type="first">Roberto</forename>
<surname>Pieraccini</surname>
</persName>
</editor>
<editor xml:id="book-author-0005"><persName><forename type="first">Michael</forename>
<surname>Weber</surname>
</persName>
</editor>
<imprint><publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<date type="published" when="2008"></date>
<biblScope unit="volume">5078</biblScope>
<biblScope unit="page" from="44">44</biblScope>
<biblScope unit="page" to="51">51</biblScope>
</imprint>
</monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<biblScope><date>2008</date>
</biblScope>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="series-Id">558</idno>
</series>
<series><title level="s">Lecture Notes in Artificial Intelligence</title>
<editor xml:id="serie-author-0000"><persName><forename type="first">Jaime</forename>
<forename type="first">G.</forename>
<surname>Carbonell</surname>
</persName>
</editor>
<editor xml:id="serie-author-0001"><persName><forename type="first">J\"org</forename>
<surname>Siekmann</surname>
</persName>
</editor>
<editor xml:id="serie-author-0002"><persName><forename type="first">Elisabeth</forename>
<surname>André</surname>
</persName>
</editor>
<editor xml:id="serie-author-0003"><persName><forename type="first">Laila</forename>
<surname>Dybkjær</surname>
</persName>
</editor>
<editor xml:id="serie-author-0004"><persName><forename type="first">Wolfgang</forename>
<surname>Minker</surname>
</persName>
</editor>
<editor xml:id="serie-author-0005"><persName><forename type="first">Heiko</forename>
<surname>Neumann</surname>
</persName>
</editor>
<editor xml:id="serie-author-0006"><persName><forename type="first">Roberto</forename>
<surname>Pieraccini</surname>
</persName>
</editor>
<editor xml:id="serie-author-0007"><persName><forename type="first">Michael</forename>
<surname>Weber</surname>
</persName>
</editor>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<biblScope unit="seriesId">1244</biblScope>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><creation><date>2008</date>
</creation>
<langUsage><language ident="en">en</language>
</langUsage>
<abstract xml:lang="en"><p>Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</p>
</abstract>
<textClass><keywords scheme="Book-Subject-Collection"><list><label>SUCO11645</label>
<item><term>Computer Science</term>
</item>
</list>
</keywords>
</textClass>
<textClass><keywords scheme="Book-Subject-Group"><list><label>I</label>
<item><term>Computer Science</term>
</item>
<label>I21017</label>
<item><term>Artificial Intelligence (incl. Robotics)</term>
</item>
<label>I21041</label>
<item><term>Language Translation and Linguistics</term>
</item>
<label>I18067</label>
<item><term>User Interfaces and Human Computer Interaction</term>
</item>
<label>I24040</label>
<item><term>Computers and Society</term>
</item>
<label>I22021</label>
<item><term>Image Processing and Computer Vision</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc><change when="2008">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item><extension>txt</extension>
<original>false</original>
<mimetype>text/plain</mimetype>
<uri>https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata><istex:metadataXml wicri:clean="Springer, Publisher found" wicri:toSee="no header"><istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//Springer-Verlag//DTD A++ V2.4//EN" URI="http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd" name="istex:docType"></istex:docType>
<istex:document><Publisher><PublisherInfo><PublisherName>Springer Berlin Heidelberg</PublisherName>
<PublisherLocation>Berlin, Heidelberg</PublisherLocation>
</PublisherInfo>
<Series><SeriesInfo SeriesType="Series" TocLevels="0"><SeriesID>558</SeriesID>
<SeriesPrintISSN>0302-9743</SeriesPrintISSN>
<SeriesElectronicISSN>1611-3349</SeriesElectronicISSN>
<SeriesTitle Language="En">Lecture Notes in Computer Science</SeriesTitle>
</SeriesInfo>
<SubSeries><SubSeriesInfo><SubSeriesID>1244</SubSeriesID>
<SubSeriesPrintISSN>0302-9743</SubSeriesPrintISSN>
<SubSeriesElectronicISSN>1611-3349</SubSeriesElectronicISSN>
<SubSeriesTitle Language="En">Lecture Notes in Artificial Intelligence</SubSeriesTitle>
</SubSeriesInfo>
<SubSeriesHeader><EditorGroup><Editor><EditorName DisplayOrder="Western"><GivenName>Jaime</GivenName>
<GivenName>G.</GivenName>
<FamilyName>Carbonell</FamilyName>
</EditorName>
</Editor>
<Editor><EditorName DisplayOrder="Western"><GivenName>J\"org</GivenName>
<FamilyName>Siekmann</FamilyName>
</EditorName>
</Editor>
</EditorGroup>
</SubSeriesHeader>
</SubSeries>
<Book Language="En"><BookInfo BookProductType="Proceedings" ContainsESM="No" Language="En" MediaType="eBook" NumberingStyle="Unnumbered" OutputMedium="All" TocLevels="0"><BookID>978-3-540-69369-7</BookID>
<BookTitle>Perception in Multimodal Dialogue Systems</BookTitle>
<BookSubTitle>4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, June 16-18, 2008. Proceedings</BookSubTitle>
<BookVolumeNumber>5078</BookVolumeNumber>
<BookSequenceNumber>5078</BookSequenceNumber>
<BookDOI>10.1007/978-3-540-69369-7</BookDOI>
<BookTitleID>164793</BookTitleID>
<BookPrintISBN>978-3-540-69368-0</BookPrintISBN>
<BookElectronicISBN>978-3-540-69369-7</BookElectronicISBN>
<BookChapterCount>37</BookChapterCount>
<BookCopyright><CopyrightHolderName>Springer-Verlag Berlin Heidelberg</CopyrightHolderName>
<CopyrightYear>2008</CopyrightYear>
</BookCopyright>
<BookSubjectGroup><BookSubject Code="I" Type="Primary">Computer Science</BookSubject>
<BookSubject Code="I21017" Priority="1" Type="Secondary">Artificial Intelligence (incl. Robotics)</BookSubject>
<BookSubject Code="I21041" Priority="2" Type="Secondary">Language Translation and Linguistics</BookSubject>
<BookSubject Code="I18067" Priority="3" Type="Secondary">User Interfaces and Human Computer Interaction</BookSubject>
<BookSubject Code="I24040" Priority="4" Type="Secondary">Computers and Society</BookSubject>
<BookSubject Code="I22021" Priority="5" Type="Secondary">Image Processing and Computer Vision</BookSubject>
<SubjectCollection Code="SUCO11645">Computer Science</SubjectCollection>
</BookSubjectGroup>
</BookInfo>
<BookHeader><EditorGroup><Editor><EditorName DisplayOrder="Western"><GivenName>Elisabeth</GivenName>
<FamilyName>André</FamilyName>
</EditorName>
<Contact><Email>andre@informatik.uni-augsburg.de</Email>
</Contact>
</Editor>
<Editor><EditorName DisplayOrder="Western"><GivenName>Laila</GivenName>
<FamilyName>Dybkjær</FamilyName>
</EditorName>
<Contact><Email>laila@pdc.dk</Email>
</Contact>
</Editor>
<Editor><EditorName DisplayOrder="Western"><GivenName>Wolfgang</GivenName>
<FamilyName>Minker</FamilyName>
</EditorName>
<Contact><Email>wolfgang.minker@e-technik.uni-ulm.de</Email>
</Contact>
</Editor>
<Editor><EditorName DisplayOrder="Western"><GivenName>Heiko</GivenName>
<FamilyName>Neumann</FamilyName>
</EditorName>
<Contact><Email>heiko.neumann@uni-ulm.de</Email>
</Contact>
</Editor>
<Editor><EditorName DisplayOrder="Western"><GivenName>Roberto</GivenName>
<FamilyName>Pieraccini</FamilyName>
</EditorName>
<Contact><Email>roberto@speechcycle.com</Email>
</Contact>
</Editor>
<Editor><EditorName DisplayOrder="Western"><GivenName>Michael</GivenName>
<FamilyName>Weber</FamilyName>
</EditorName>
<Contact><Email>Michael.Weber@uni-ulm.de</Email>
</Contact>
</Editor>
</EditorGroup>
</BookHeader>
<Part ID="Part2"><PartInfo TocLevels="0"><PartID>2</PartID>
<PartSequenceNumber>2</PartSequenceNumber>
<PartTitle>Multimodal and Spoken Dialogue Systems</PartTitle>
<PartChapterCount>6</PartChapterCount>
<PartContext><SeriesID>558</SeriesID>
<BookTitle>Perception in Multimodal Dialogue Systems</BookTitle>
</PartContext>
</PartInfo>
<Chapter ID="Chap6" Language="En"><ChapterInfo ChapterType="OriginalPaper" ContainsESM="No" NumberingStyle="Unnumbered" TocLevels="0"><ChapterID>6</ChapterID>
<ChapterDOI>10.1007/978-3-540-69369-7_6</ChapterDOI>
<ChapterSequenceNumber>6</ChapterSequenceNumber>
<ChapterTitle Language="En">Codebook Design for Speech Guided Car Infotainment Systems</ChapterTitle>
<ChapterFirstPage>44</ChapterFirstPage>
<ChapterLastPage>51</ChapterLastPage>
<ChapterCopyright><CopyrightHolderName>Springer-Verlag Berlin Heidelberg</CopyrightHolderName>
<CopyrightYear>2008</CopyrightYear>
</ChapterCopyright>
<ChapterGrants Type="Regular"><MetadataGrant Grant="OpenAccess"></MetadataGrant>
<AbstractGrant Grant="OpenAccess"></AbstractGrant>
<BodyPDFGrant Grant="Restricted"></BodyPDFGrant>
<BodyHTMLGrant Grant="Restricted"></BodyHTMLGrant>
<BibliographyGrant Grant="Restricted"></BibliographyGrant>
<ESMGrant Grant="Restricted"></ESMGrant>
</ChapterGrants>
<ChapterContext><SeriesID>558</SeriesID>
<PartID>2</PartID>
<BookID>978-3-540-69369-7</BookID>
<BookTitle>Perception in Multimodal Dialogue Systems</BookTitle>
</ChapterContext>
</ChapterInfo>
<ChapterHeader><AuthorGroup><Author AffiliationIDS="Aff1 Aff2"><AuthorName DisplayOrder="Western"><GivenName>Martin</GivenName>
<FamilyName>Raab</FamilyName>
</AuthorName>
<Contact><Email>mraab@harmanbecker.com</Email>
<URL>http://www.harmanbecker.de</URL>
</Contact>
</Author>
<Author AffiliationIDS="Aff1 Aff3"><AuthorName DisplayOrder="Western"><GivenName>Rainer</GivenName>
<FamilyName>Gruhn</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff2"><AuthorName DisplayOrder="Western"><GivenName>Elmar</GivenName>
<FamilyName>Noeth</FamilyName>
</AuthorName>
</Author>
<Affiliation ID="Aff1"><OrgName>Harman Becker Automotive Systems, Speech Dialog Systems</OrgName>
<OrgAddress><State>Ulm</State>
<Country>Germany</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff2"><OrgDivision>Dept. of Pattern Recognition</OrgDivision>
<OrgName>University of Erlangen</OrgName>
<OrgAddress><City>Erlangen</City>
<Country>Germany</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff3"><OrgDivision>Dept. of Information Technology</OrgDivision>
<OrgName>University of Ulm</OrgName>
<OrgAddress><State>Ulm</State>
<Country>Germany</Country>
</OrgAddress>
</Affiliation>
</AuthorGroup>
<Abstract ID="Abs1" Language="En"><Heading>Abstract</Heading>
<Para>In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input.</Para>
<Para>In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization.</Para>
<Para>We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</Para>
</Abstract>
</ChapterHeader>
<NoBody></NoBody>
</Chapter>
</Part>
</Book>
</Series>
</Publisher>
</istex:document>
</istex:metadataXml>
<mods version="3.6"><titleInfo lang="en"><title>Codebook Design for Speech Guided Car Infotainment Systems</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en"><title>Codebook Design for Speech Guided Car Infotainment Systems</title>
</titleInfo>
<name type="personal"><namePart type="given">Martin</namePart>
<namePart type="family">Raab</namePart>
<affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</affiliation>
<affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</affiliation>
<affiliation>E-mail: mraab@harmanbecker.com</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Rainer</namePart>
<namePart type="family">Gruhn</namePart>
<affiliation>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany</affiliation>
<affiliation>Dept. of Information Technology, University of Ulm, Ulm, Germany</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Elmar</namePart>
<namePart type="family">Noeth</namePart>
<affiliation>Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="conference" displayLabel="OriginalPaper"></genre>
<originInfo><publisher>Springer Berlin Heidelberg</publisher>
<place><placeTerm type="text">Berlin, Heidelberg</placeTerm>
</place>
<dateIssued encoding="w3cdtf">2008</dateIssued>
<copyrightDate encoding="w3cdtf">2008</copyrightDate>
</originInfo>
<language><languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription><internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</abstract>
<relatedItem type="host"><titleInfo><title>Perception in Multimodal Dialogue Systems</title>
<subTitle>4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, June 16-18, 2008. Proceedings</subTitle>
</titleInfo>
<name type="personal"><namePart type="given">Elisabeth</namePart>
<namePart type="family">André</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Laila</namePart>
<namePart type="family">Dybkjær</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Wolfgang</namePart>
<namePart type="family">Minker</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Heiko</namePart>
<namePart type="family">Neumann</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Roberto</namePart>
<namePart type="family">Pieraccini</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Michael</namePart>
<namePart type="family">Weber</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="book-series" displayLabel="Proceedings"></genre>
<originInfo><copyrightDate encoding="w3cdtf">2008</copyrightDate>
<issuance>monographic</issuance>
</originInfo>
<subject><genre>Book-Subject-Collection</genre>
<topic authority="SpringerSubjectCodes" authorityURI="SUCO11645">Computer Science</topic>
</subject>
<subject><genre>Book-Subject-Group</genre>
<topic authority="SpringerSubjectCodes" authorityURI="I">Computer Science</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I21017">Artificial Intelligence (incl. Robotics)</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I21041">Language Translation and Linguistics</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18067">User Interfaces and Human Computer Interaction</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I24040">Computers and Society</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I22021">Image Processing and Computer Vision</topic>
</subject>
<identifier type="DOI">10.1007/978-3-540-69369-7</identifier>
<identifier type="ISBN">978-3-540-69368-0</identifier>
<identifier type="eISBN">978-3-540-69369-7</identifier>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="BookTitleID">164793</identifier>
<identifier type="BookID">978-3-540-69369-7</identifier>
<identifier type="BookChapterCount">37</identifier>
<identifier type="BookVolumeNumber">5078</identifier>
<identifier type="BookSequenceNumber">5078</identifier>
<identifier type="PartChapterCount">6</identifier>
<part><date>2008</date>
<detail type="part"><title>Multimodal and Spoken Dialogue Systems</title>
</detail>
<detail type="volume"><number>5078</number>
<caption>vol.</caption>
</detail>
<extent unit="pages"><start>44</start>
<end>51</end>
</extent>
</part>
<recordInfo><recordOrigin>Springer-Verlag Berlin Heidelberg, 2008</recordOrigin>
</recordInfo>
</relatedItem>
<relatedItem type="series"><titleInfo><title>Lecture Notes in Computer Science</title>
</titleInfo>
<originInfo><copyrightDate encoding="w3cdtf">2008</copyrightDate>
<issuance>serial</issuance>
</originInfo>
<relatedItem type="constituent"><titleInfo><title>Lecture Notes in Artificial Intelligence</title>
</titleInfo>
<name type="personal"><namePart type="given">Jaime</namePart>
<namePart type="given">G.</namePart>
<namePart type="family">Carbonell</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">J\"org</namePart>
<namePart type="family">Siekmann</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Elisabeth</namePart>
<namePart type="family">André</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Laila</namePart>
<namePart type="family">Dybkjær</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Wolfgang</namePart>
<namePart type="family">Minker</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Heiko</namePart>
<namePart type="family">Neumann</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Roberto</namePart>
<namePart type="family">Pieraccini</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Michael</namePart>
<namePart type="family">Weber</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="sub-series"></genre>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="SubSeriesID">1244</identifier>
</relatedItem>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="SeriesID">558</identifier>
<recordInfo><recordOrigin>Springer-Verlag Berlin Heidelberg, 2008</recordOrigin>
</recordInfo>
</relatedItem>
<identifier type="istex">B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</identifier>
<identifier type="DOI">10.1007/978-3-540-69369-7_6</identifier>
<identifier type="ChapterID">6</identifier>
<identifier type="ChapterID">Chap6</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Springer-Verlag Berlin Heidelberg, 2008</accessCondition>
<recordInfo><recordContentSource>SPRINGER</recordContentSource>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2008</recordOrigin>
</recordInfo>
</mods>
</metadata>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Istex/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001240 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 001240 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC
   |texte=   Codebook Design for Speech Guided Car Infotainment Systems
}}

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024

	Serveur d'exploration sur la musique en Sarre
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la musique en Sarre

Codebook Design for Speech Guided Car Infotainment Systems

Codebook Design for Speech Guided Car Infotainment Systems

Source :

English descriptors

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri