Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder

Identifieur interne : 002364 ( Istex/Corpus ); précédent : 002363; suivant : 002365

A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder

Auteurs : Chul Hong Kwon ; Minkyu Lee

Source :

RBID : ISTEX:277C416094E76274A422B581F87BA2BA72291868

Abstract

Currently, time domain pitch synchronous overlap‐add (TD‐PSOLA) is the most popular synthesis algorithm for text‐to‐speech (TTS) systems. The algorithm produces very high quality synthetic speech, particularly when a pitch modification factor is small. However, as the pitch modification factor becomes larger, the quality degradation due to a slight pitch epoch detection error becomes severe. On the other hand, the vocoder framework has very flexible prosody manipulation. It can obtain a uniform voice quality over a wide range of prosody modification. Unfortunately, the synthesized speech quality from the vocoder is far from natural human speech, often showing buzzy quality. To remedy buzzy quality from the vocoder and make more natural synthetic speech, we propose a new speech synthesis algorithm for high quality TTS systems that is based on the homomorphic vocoder framework. The impulse response of vocal tract is obtained by mixing the minimum phase in lower frequency band and original phase in higher frequency band—thus, the name is a mixed phase vocoder. Informal subjective listening tests reveal that the mixed phase vocoder is a good candidate for TTS synthesis with high intelligibility and naturalness. Copyright © 2004 AEI.

Url:
DOI: 10.1002/ett.996

Links to Exploration step

ISTEX:277C416094E76274A422B581F87BA2BA72291868

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
<author>
<name sortKey="Kwon, Chul Hong" sort="Kwon, Chul Hong" uniqKey="Kwon C" first="Chul Hong" last="Kwon">Chul Hong Kwon</name>
<affiliation>
<mods:affiliation>Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Lee, Minkyu" sort="Lee, Minkyu" uniqKey="Lee M" first="Minkyu" last="Lee">Minkyu Lee</name>
<affiliation>
<mods:affiliation>Bell Laboratories, Lucent Technologies, 600 Mountain Ave., Murray Hill, NJ 07974, USA</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:277C416094E76274A422B581F87BA2BA72291868</idno>
<date when="2004" year="2004">2004</date>
<idno type="doi">10.1002/ett.996</idno>
<idno type="url">https://api.istex.fr/document/277C416094E76274A422B581F87BA2BA72291868/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002364</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">002364</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
<author>
<name sortKey="Kwon, Chul Hong" sort="Kwon, Chul Hong" uniqKey="Kwon C" first="Chul Hong" last="Kwon">Chul Hong Kwon</name>
<affiliation>
<mods:affiliation>Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Lee, Minkyu" sort="Lee, Minkyu" uniqKey="Lee M" first="Minkyu" last="Lee">Minkyu Lee</name>
<affiliation>
<mods:affiliation>Bell Laboratories, Lucent Technologies, 600 Mountain Ave., Murray Hill, NJ 07974, USA</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">European Transactions on Telecommunications</title>
<title level="j" type="abbrev">Eur. Trans. Telecomm.</title>
<idno type="ISSN">1124-318X</idno>
<idno type="eISSN">1541-8251</idno>
<imprint>
<publisher>John Wiley & Sons, Ltd.</publisher>
<pubPlace>Chichester, UK</pubPlace>
<date type="published" when="2004-09">2004-09</date>
<biblScope unit="volume">15</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="491">491</biblScope>
<biblScope unit="page" to="496">496</biblScope>
</imprint>
<idno type="ISSN">1124-318X</idno>
</series>
<idno type="istex">277C416094E76274A422B581F87BA2BA72291868</idno>
<idno type="DOI">10.1002/ett.996</idno>
<idno type="ArticleID">ETT996</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1124-318X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Currently, time domain pitch synchronous overlap‐add (TD‐PSOLA) is the most popular synthesis algorithm for text‐to‐speech (TTS) systems. The algorithm produces very high quality synthetic speech, particularly when a pitch modification factor is small. However, as the pitch modification factor becomes larger, the quality degradation due to a slight pitch epoch detection error becomes severe. On the other hand, the vocoder framework has very flexible prosody manipulation. It can obtain a uniform voice quality over a wide range of prosody modification. Unfortunately, the synthesized speech quality from the vocoder is far from natural human speech, often showing buzzy quality. To remedy buzzy quality from the vocoder and make more natural synthetic speech, we propose a new speech synthesis algorithm for high quality TTS systems that is based on the homomorphic vocoder framework. The impulse response of vocal tract is obtained by mixing the minimum phase in lower frequency band and original phase in higher frequency band—thus, the name is a mixed phase vocoder. Informal subjective listening tests reveal that the mixed phase vocoder is a good candidate for TTS synthesis with high intelligibility and naturalness. Copyright © 2004 AEI.</div>
</front>
</TEI>
<istex>
<corpusName>wiley</corpusName>
<author>
<json:item>
<name>Chul Hong Kwon</name>
<affiliations>
<json:string>Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea</json:string>
</affiliations>
</json:item>
<json:item>
<name>Minkyu Lee</name>
<affiliations>
<json:string>Bell Laboratories, Lucent Technologies, 600 Mountain Ave., Murray Hill, NJ 07974, USA</json:string>
</affiliations>
</json:item>
</author>
<articleId>
<json:string>ETT996</json:string>
</articleId>
<language>
<json:string>eng</json:string>
</language>
<abstract>Currently, time domain pitch synchronous overlap‐add (TD‐PSOLA) is the most popular synthesis algorithm for text‐to‐speech (TTS) systems. The algorithm produces very high quality synthetic speech, particularly when a pitch modification factor is small. However, as the pitch modification factor becomes larger, the quality degradation due to a slight pitch epoch detection error becomes severe. On the other hand, the vocoder framework has very flexible prosody manipulation. It can obtain a uniform voice quality over a wide range of prosody modification. Unfortunately, the synthesized speech quality from the vocoder is far from natural human speech, often showing buzzy quality. To remedy buzzy quality from the vocoder and make more natural synthetic speech, we propose a new speech synthesis algorithm for high quality TTS systems that is based on the homomorphic vocoder framework. The impulse response of vocal tract is obtained by mixing the minimum phase in lower frequency band and original phase in higher frequency band—thus, the name is a mixed phase vocoder. Informal subjective listening tests reveal that the mixed phase vocoder is a good candidate for TTS synthesis with high intelligibility and naturalness. Copyright © 2004 AEI.</abstract>
<qualityIndicators>
<score>5.486</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>567 x 737 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>1248</abstractCharCount>
<pdfWordCount>3230</pdfWordCount>
<pdfCharCount>19501</pdfCharCount>
<pdfPageCount>6</pdfPageCount>
<abstractWordCount>188</abstractWordCount>
</qualityIndicators>
<title>A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
<genre.original>
<json:string>article</json:string>
</genre.original>
<genre>
<json:string>article</json:string>
</genre>
<host>
<volume>15</volume>
<publisherId>
<json:string>ETT</json:string>
</publisherId>
<pages>
<total>6</total>
<last>496</last>
<first>491</first>
</pages>
<issn>
<json:string>1124-318X</json:string>
</issn>
<issue>5</issue>
<author>
<json:item>
<name>Achille Pattavina</name>
</json:item>
</author>
<subject>
<json:item>
<value>Research Article</value>
</json:item>
</subject>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1541-8251</json:string>
</eissn>
<title>European Transactions on Telecommunications</title>
<doi>
<json:string>10.1002/(ISSN)1541-8251</json:string>
</doi>
</host>
<publicationDate>2004</publicationDate>
<copyrightDate>2004</copyrightDate>
<doi>
<json:string>10.1002/ett.996</json:string>
</doi>
<id>277C416094E76274A422B581F87BA2BA72291868</id>
<score>1</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/277C416094E76274A422B581F87BA2BA72291868/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/277C416094E76274A422B581F87BA2BA72291868/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/277C416094E76274A422B581F87BA2BA72291868/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>John Wiley & Sons, Ltd.</publisher>
<pubPlace>Chichester, UK</pubPlace>
<availability>
<p>WILEY</p>
</availability>
<date>2004</date>
</publicationStmt>
<notesStmt>
<note>Korea Science and Engineering Foundation - No. R01‐2002‐000‐00283‐0;</note>
</notesStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
<author>
<persName>
<forename type="first">Chul Hong</forename>
<surname>Kwon</surname>
</persName>
<note type="correspondence">
<p>Correspondence: Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea.</p>
</note>
<affiliation>Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea</affiliation>
</author>
<author>
<persName>
<forename type="first">Minkyu</forename>
<surname>Lee</surname>
</persName>
<affiliation>Bell Laboratories, Lucent Technologies, 600 Mountain Ave., Murray Hill, NJ 07974, USA</affiliation>
</author>
</analytic>
<monogr>
<title level="j">European Transactions on Telecommunications</title>
<title level="j" type="abbrev">Eur. Trans. Telecomm.</title>
<idno type="pISSN">1124-318X</idno>
<idno type="eISSN">1541-8251</idno>
<idno type="DOI">10.1002/(ISSN)1541-8251</idno>
<imprint>
<publisher>John Wiley & Sons, Ltd.</publisher>
<pubPlace>Chichester, UK</pubPlace>
<date type="published" when="2004-09"></date>
<biblScope unit="volume">15</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="491">491</biblScope>
<biblScope unit="page" to="496">496</biblScope>
</imprint>
</monogr>
<idno type="istex">277C416094E76274A422B581F87BA2BA72291868</idno>
<idno type="DOI">10.1002/ett.996</idno>
<idno type="ArticleID">ETT996</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2004</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Currently, time domain pitch synchronous overlap‐add (TD‐PSOLA) is the most popular synthesis algorithm for text‐to‐speech (TTS) systems. The algorithm produces very high quality synthetic speech, particularly when a pitch modification factor is small. However, as the pitch modification factor becomes larger, the quality degradation due to a slight pitch epoch detection error becomes severe. On the other hand, the vocoder framework has very flexible prosody manipulation. It can obtain a uniform voice quality over a wide range of prosody modification. Unfortunately, the synthesized speech quality from the vocoder is far from natural human speech, often showing buzzy quality. To remedy buzzy quality from the vocoder and make more natural synthetic speech, we propose a new speech synthesis algorithm for high quality TTS systems that is based on the homomorphic vocoder framework. The impulse response of vocal tract is obtained by mixing the minimum phase in lower frequency band and original phase in higher frequency band—thus, the name is a mixed phase vocoder. Informal subjective listening tests reveal that the mixed phase vocoder is a good candidate for TTS synthesis with high intelligibility and naturalness. Copyright © 2004 AEI.</p>
</abstract>
<textClass>
<keywords scheme="Journal Subject">
<list>
<head>article-category</head>
<item>
<term>Research Article</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2003-03-12">Received</change>
<change when="2004-05-24">Registration</change>
<change when="2004-09">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/277C416094E76274A422B581F87BA2BA72291868/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<mods version="3.6">
<titleInfo lang="en">
<title>A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
</titleInfo>
<titleInfo type="abbreviated" lang="en">
<title>MIXED PHASE VOCODER</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en">
<title>A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder</title>
</titleInfo>
<name type="personal">
<namePart type="given">Chul Hong</namePart>
<namePart type="family">Kwon</namePart>
<affiliation>Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea</affiliation>
<description>Correspondence: Department of Information and Communication Engineering, Daejon University, 96.3, Yongun‐dong, Dong‐ju, Daejon, 300‐716, Korea.</description>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Minkyu</namePart>
<namePart type="family">Lee</namePart>
<affiliation>Bell Laboratories, Lucent Technologies, 600 Mountain Ave., Murray Hill, NJ 07974, USA</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="article" displayLabel="article"></genre>
<originInfo>
<publisher>John Wiley & Sons, Ltd.</publisher>
<place>
<placeTerm type="text">Chichester, UK</placeTerm>
</place>
<dateIssued encoding="w3cdtf">2004-09</dateIssued>
<dateCaptured encoding="w3cdtf">2003-03-12</dateCaptured>
<dateValid encoding="w3cdtf">2004-05-24</dateValid>
<copyrightDate encoding="w3cdtf">2004</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
<extent unit="figures">8</extent>
<extent unit="tables">2</extent>
<extent unit="references">10</extent>
</physicalDescription>
<abstract lang="en">Currently, time domain pitch synchronous overlap‐add (TD‐PSOLA) is the most popular synthesis algorithm for text‐to‐speech (TTS) systems. The algorithm produces very high quality synthetic speech, particularly when a pitch modification factor is small. However, as the pitch modification factor becomes larger, the quality degradation due to a slight pitch epoch detection error becomes severe. On the other hand, the vocoder framework has very flexible prosody manipulation. It can obtain a uniform voice quality over a wide range of prosody modification. Unfortunately, the synthesized speech quality from the vocoder is far from natural human speech, often showing buzzy quality. To remedy buzzy quality from the vocoder and make more natural synthetic speech, we propose a new speech synthesis algorithm for high quality TTS systems that is based on the homomorphic vocoder framework. The impulse response of vocal tract is obtained by mixing the minimum phase in lower frequency band and original phase in higher frequency band—thus, the name is a mixed phase vocoder. Informal subjective listening tests reveal that the mixed phase vocoder is a good candidate for TTS synthesis with high intelligibility and naturalness. Copyright © 2004 AEI.</abstract>
<note type="funding">Korea Science and Engineering Foundation - No. R01‐2002‐000‐00283‐0; </note>
<relatedItem type="host">
<titleInfo>
<title>European Transactions on Telecommunications</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>Eur. Trans. Telecomm.</title>
</titleInfo>
<name type="personal">
<namePart type="given">Achille</namePart>
<namePart type="family">Pattavina</namePart>
</name>
<genre type="journal">journal</genre>
<subject>
<genre>article-category</genre>
<topic>Research Article</topic>
</subject>
<identifier type="ISSN">1124-318X</identifier>
<identifier type="eISSN">1541-8251</identifier>
<identifier type="DOI">10.1002/(ISSN)1541-8251</identifier>
<identifier type="PublisherID">ETT</identifier>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>15</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>5</number>
</detail>
<extent unit="pages">
<start>491</start>
<end>496</end>
<total>6</total>
</extent>
</part>
</relatedItem>
<identifier type="istex">277C416094E76274A422B581F87BA2BA72291868</identifier>
<identifier type="DOI">10.1002/ett.996</identifier>
<identifier type="ArticleID">ETT996</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Copyright © 2004 AEI</accessCondition>
<recordInfo>
<recordContentSource>WILEY</recordContentSource>
<recordOrigin>John Wiley & Sons, Ltd.</recordOrigin>
</recordInfo>
</mods>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002364 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 002364 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:277C416094E76274A422B581F87BA2BA72291868
   |texte=   A new speech synthesis algorithm for high quality TTS systems: a mixed phase vocoder
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024