Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography

Identifieur interne : 000032 ( Pmc/Corpus ); précédent : 000031; suivant : 000033

Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography

Auteurs : João Freitas ; Ant Nio Teixeira ; Samuel Silva ; Catarina Oliveira ; Miguel Sales Dias

Source :

RBID : PMC:4466523

Abstract

Nasality is a very important characteristic of several languages, European Portuguese being one of them. This paper addresses the challenge of nasality detection in surface electromyography (EMG) based speech interfaces. We explore the existence of useful information about the velum movement and also assess if muscles deeper down in the face and neck region can be measured using surface electrodes, and the best electrode location to do so. The procedure we adopted uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from a set of speakers, providing a method to interpret EMG data. By ensuring compatible data recording conditions, and proper time alignment between the EMG and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement when a nasal vowel occurs. The combination of these two sources revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered, which motivated a classification experiment. Overall results of this experiment provide evidence that it is possible to detect velum movement using sensors positioned below the ear, between mastoid process and the mandible, in the upper neck region. In a frame-based classification scenario, error rates as low as 32.5% for all speakers and 23.4% for the best speaker have been achieved, for nasal vowel detection. This outcome stands as an encouraging result, fostering the grounds for deeper exploration of the proposed approach as a promising route to the development of an EMG-based speech interface for languages with strong nasal characteristics.


Url:
DOI: 10.1371/journal.pone.0127040
PubMed: 26069968
PubMed Central: 4466523

Links to Exploration step

PMC:4466523

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography</title>
<author>
<name sortKey="Freitas, Joao" sort="Freitas, Joao" uniqKey="Freitas J" first="João" last="Freitas">João Freitas</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Teixeira, Ant Nio" sort="Teixeira, Ant Nio" uniqKey="Teixeira A" first="Ant Nio" last="Teixeira">Ant Nio Teixeira</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Samuel" sort="Silva, Samuel" uniqKey="Silva S" first="Samuel" last="Silva">Samuel Silva</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Catarina" sort="Oliveira, Catarina" uniqKey="Oliveira C" first="Catarina" last="Oliveira">Catarina Oliveira</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Health School, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dias, Miguel Sales" sort="Dias, Miguel Sales" uniqKey="Dias M" first="Miguel Sales" last="Dias">Miguel Sales Dias</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff005">
<addr-line>Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisboa, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26069968</idno>
<idno type="pmc">4466523</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466523</idno>
<idno type="RBID">PMC:4466523</idno>
<idno type="doi">10.1371/journal.pone.0127040</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000032</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000032</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography</title>
<author>
<name sortKey="Freitas, Joao" sort="Freitas, Joao" uniqKey="Freitas J" first="João" last="Freitas">João Freitas</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Teixeira, Ant Nio" sort="Teixeira, Ant Nio" uniqKey="Teixeira A" first="Ant Nio" last="Teixeira">Ant Nio Teixeira</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Samuel" sort="Silva, Samuel" uniqKey="Silva S" first="Samuel" last="Silva">Samuel Silva</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Catarina" sort="Oliveira, Catarina" uniqKey="Oliveira C" first="Catarina" last="Oliveira">Catarina Oliveira</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Health School, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dias, Miguel Sales" sort="Dias, Miguel Sales" uniqKey="Dias M" first="Miguel Sales" last="Dias">Miguel Sales Dias</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff005">
<addr-line>Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisboa, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Nasality is a very important characteristic of several languages, European Portuguese being one of them. This paper addresses the challenge of nasality detection in surface electromyography (EMG) based speech interfaces. We explore the existence of useful information about the velum movement and also assess if muscles deeper down in the face and neck region can be measured using surface electrodes, and the best electrode location to do so. The procedure we adopted uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from a set of speakers, providing a method to interpret EMG data. By ensuring compatible data recording conditions, and proper time alignment between the EMG and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement when a nasal vowel occurs. The combination of these two sources revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered, which motivated a classification experiment. Overall results of this experiment provide evidence that it is possible to detect velum movement using sensors positioned below the ear, between mastoid process and the mandible, in the upper neck region. In a frame-based classification scenario, error rates as low as 32.5% for all speakers and 23.4% for the best speaker have been achieved, for nasal vowel detection. This outcome stands as an encouraging result, fostering the grounds for deeper exploration of the proposed approach as a promising route to the development of an EMG-based speech interface for languages with strong nasal characteristics.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Acero, A" uniqKey="Acero A">A Acero</name>
</author>
<author>
<name sortKey="Hon H, W" uniqKey="Hon H W">W Hon H-</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flynn, R" uniqKey="Flynn R">R Flynn</name>
</author>
<author>
<name sortKey="Jones, E" uniqKey="Jones E">E Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freng, J" uniqKey="Freng J">J Freng</name>
</author>
<author>
<name sortKey="Ramabhadran, B" uniqKey="Ramabhadran B">B Ramabhadran</name>
</author>
<author>
<name sortKey="Hansen, Jhl" uniqKey="Hansen J">JHL Hansen</name>
</author>
<author>
<name sortKey="Williams, Jd" uniqKey="Williams J">JD Williams</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Denby, B" uniqKey="Denby B">B Denby</name>
</author>
<author>
<name sortKey="Schultz, T" uniqKey="Schultz T">T Schultz</name>
</author>
<author>
<name sortKey="Honda, K" uniqKey="Honda K">K Honda</name>
</author>
<author>
<name sortKey="Hueber, T" uniqKey="Hueber T">T Hueber</name>
</author>
<author>
<name sortKey="Gilbert, Jm" uniqKey="Gilbert J">JM Gilbert</name>
</author>
<author>
<name sortKey="Brumberg, Js" uniqKey="Brumberg J">JS Brumberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lacerda, A" uniqKey="Lacerda A">A Lacerda</name>
</author>
<author>
<name sortKey="Head, B" uniqKey="Head B">B Head</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sampson, R" uniqKey="Sampson R">R Sampson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beddor, Ps" uniqKey="Beddor P">PS Beddor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fritzell, B" uniqKey="Fritzell B">B Fritzell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hardcastle, Wj" uniqKey="Hardcastle W">WJ Hardcastle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Seikel, Ja" uniqKey="Seikel J">JA Seikel</name>
</author>
<author>
<name sortKey="King, Dw" uniqKey="King D">DW King</name>
</author>
<author>
<name sortKey="Drumright, Dg" uniqKey="Drumright D">DG Drumright</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuehn, Dp" uniqKey="Kuehn D">DP Kuehn</name>
</author>
<author>
<name sortKey="Folkins, Jw" uniqKey="Folkins J">JW Folkins</name>
</author>
<author>
<name sortKey="Linville, Rn" uniqKey="Linville R">RN Linville</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bell Berti, F" uniqKey="Bell Berti F">F Bell-Berti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lubker, Jf" uniqKey="Lubker J">JF Lubker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuehn, Dp" uniqKey="Kuehn D">DP Kuehn</name>
</author>
<author>
<name sortKey="Folkins, Jw" uniqKey="Folkins J">JW Folkins</name>
</author>
<author>
<name sortKey="Cutting, Cb" uniqKey="Cutting C">CB Cutting</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcgill, S" uniqKey="Mcgill S">S McGill</name>
</author>
<author>
<name sortKey="Juker, D" uniqKey="Juker D">D Juker</name>
</author>
<author>
<name sortKey="Kropf, P" uniqKey="Kropf P">P Kropf</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kitamura, T" uniqKey="Kitamura T">T Kitamura</name>
</author>
<author>
<name sortKey="Takemoto, H" uniqKey="Takemoto H">H Takemoto</name>
</author>
<author>
<name sortKey="Honda, K" uniqKey="Honda K">K Honda</name>
</author>
<author>
<name sortKey="Shimada, Y" uniqKey="Shimada Y">Y Shimada</name>
</author>
<author>
<name sortKey="Fujimoto, I" uniqKey="Fujimoto I">I Fujimoto</name>
</author>
<author>
<name sortKey="Syakudo, Y" uniqKey="Syakudo Y">Y Syakudo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stone, M" uniqKey="Stone M">M Stone</name>
</author>
<author>
<name sortKey="Stock, G" uniqKey="Stock G">G Stock</name>
</author>
<author>
<name sortKey="Bunin, K" uniqKey="Bunin K">K Bunin</name>
</author>
<author>
<name sortKey="Kumar, K" uniqKey="Kumar K">K Kumar</name>
</author>
<author>
<name sortKey="Epstein, M" uniqKey="Epstein M">M Epstein</name>
</author>
<author>
<name sortKey="Kambhamettu, C" uniqKey="Kambhamettu C">C Kambhamettu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Engwall, O" uniqKey="Engwall O">O Engwall</name>
</author>
<author>
<name sortKey="Harrington, J" uniqKey="Harrington J">J Harrington</name>
</author>
<author>
<name sortKey="Tabain, M" uniqKey="Tabain M">M Tabain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perry, Jl" uniqKey="Perry J">JL Perry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moon, Jb" uniqKey="Moon J">JB Moon</name>
</author>
<author>
<name sortKey="Canady, Jw" uniqKey="Canady J">JW Canady</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pereda, E" uniqKey="Pereda E">E Pereda</name>
</author>
<author>
<name sortKey="Quiroga, Rq" uniqKey="Quiroga R">RQ Quiroga</name>
</author>
<author>
<name sortKey="Bhattacharya, J" uniqKey="Bhattacharya J">J Bhattacharya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yao, Yy" uniqKey="Yao Y">YY Yao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hudgins, B" uniqKey="Hudgins B">B Hudgins</name>
</author>
<author>
<name sortKey="Parker, P" uniqKey="Parker P">P Parker</name>
</author>
<author>
<name sortKey="Scott, Rn" uniqKey="Scott R">RN Scott</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burges, Cjc" uniqKey="Burges C">CJC Burges</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Winter, Da" uniqKey="Winter D">DA Winter</name>
</author>
<author>
<name sortKey="Yack, Hj" uniqKey="Yack H">HJ Yack</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freitas, J" uniqKey="Freitas J">J Freitas</name>
</author>
<author>
<name sortKey="Teixeira, A" uniqKey="Teixeira A">A Teixeira</name>
</author>
<author>
<name sortKey="Vaz, F" uniqKey="Vaz F">F Vaz</name>
</author>
<author>
<name sortKey="Dias, Ms" uniqKey="Dias M">MS Dias</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26069968</article-id>
<article-id pub-id-type="pmc">4466523</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0127040</article-id>
<article-id pub-id-type="publisher-id">PONE-D-14-31701</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography</article-title>
<alt-title alt-title-type="running-head">Velum Movement Detection Based on Surface EMG</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Freitas</surname>
<given-names>João</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
<xref rid="cor001" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Teixeira</surname>
<given-names>António</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
<xref rid="cor001" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Silva</surname>
<given-names>Samuel</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Oliveira</surname>
<given-names>Catarina</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Dias</surname>
<given-names>Miguel Sales</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff005">
<sup>5</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>Department of Electronics Telecommunications & Informatics (DETI), University of Aveiro, Aveiro, Portugal</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>Institute for Electronics and Telematics Engineering (IEETA), University de Aveiro, Aveiro, Portugal</addr-line>
</aff>
<aff id="aff004">
<label>4</label>
<addr-line>Health School, University of Aveiro, Aveiro, Portugal</addr-line>
</aff>
<aff id="aff005">
<label>5</label>
<addr-line>Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisboa, Portugal</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Lebedev</surname>
<given-names>Mikhail A.</given-names>
</name>
<role>Academic Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Duke University, UNITED STATES</addr-line>
</aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>
<bold>Competing Interests: </bold>
Two of the authors are employed by Microsoft, the authors declare that this does not alter adherence to PLOS ONE policies on sharing data and materials.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: JF AT MSD. Performed the experiments: JF CO SS. Analyzed the data: JF AT SS. Contributed reagents/materials/analysis tools: JF AT SS CO MSD. Wrote the paper: JF AT SS MSD.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>t-joaof@microsoft.com</email>
,
<email>ajst@ua.pt</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>6</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>10</volume>
<issue>6</issue>
<elocation-id>e0127040</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>7</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>4</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-year>2015</copyright-year>
<copyright-holder>Freitas et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0127040.pdf"></self-uri>
<abstract>
<p>Nasality is a very important characteristic of several languages, European Portuguese being one of them. This paper addresses the challenge of nasality detection in surface electromyography (EMG) based speech interfaces. We explore the existence of useful information about the velum movement and also assess if muscles deeper down in the face and neck region can be measured using surface electrodes, and the best electrode location to do so. The procedure we adopted uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from a set of speakers, providing a method to interpret EMG data. By ensuring compatible data recording conditions, and proper time alignment between the EMG and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement when a nasal vowel occurs. The combination of these two sources revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered, which motivated a classification experiment. Overall results of this experiment provide evidence that it is possible to detect velum movement using sensors positioned below the ear, between mastoid process and the mandible, in the upper neck region. In a frame-based classification scenario, error rates as low as 32.5% for all speakers and 23.4% for the best speaker have been achieved, for nasal vowel detection. This outcome stands as an encouraging result, fostering the grounds for deeper exploration of the proposed approach as a promising route to the development of an EMG-based speech interface for languages with strong nasal characteristics.</p>
</abstract>
<funding-group>
<funding-statement>JF, SS, AT, MSD: Marie Curie Actions, Project IRIS (ref. 610986, FP7-PEOPLE-2013-IAPP) JF, MSD: Marie Curie Golem (ref. 251415, FP7-PEOPLE-2009-IAPP). AT, SS, CO: National Funds through the Foundation for Science and Technology (FST), Institute for Electronics and Telematics Engineering (IEETA) Research Unit (ref. UID/CEC/00127/2013 and Incentivo/EEI/UI0127/2014) and Project HERON II (PTDC/EEA-PLP/098298/2008). SS: Quadro de Referência Estratégico Nacional (QREN), Mais Centro Program, Project Cloud Thinking (ref. CENTRO-07-ST24-FEDER-002031). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="15"></fig-count>
<table-count count="8"></table-count>
<page-count count="26"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>All relevant data is available in the manuscript and its Supporting Information.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>All relevant data is available in the manuscript and its Supporting Information.</p>
</notes>
</front>
<body>
<sec sec-type="intro" id="sec001">
<title>Introduction</title>
<p>Speech-based human-computer interfaces have reached high accuracy levels in controlled environments and are now commercially available. However, robust speech recognition and improved user experience, with this type of interface, remains a challenge [
<xref rid="pone.0127040.ref001" ref-type="bibr">1</xref>
] and an attractive research topic [
<xref rid="pone.0127040.ref002" ref-type="bibr">2</xref>
,
<xref rid="pone.0127040.ref003" ref-type="bibr">3</xref>
]. One of the reasons for this has to do with the fact that a conventional speech interface solely relies on the acoustic signal. Hence, this type of interface becomes inappropriate when used in the presence of environmental noise, such as in office settings, or when used in situations where privacy or confidentiality is required. For the same reason, speech-impaired persons, such as those who were subjected to a laryngectomy, are unable to use this type of interface. In this regard, Silent Speech Interfaces (SSI) can be viewed as a possible alternative since they allow for communication to occur in the absence of an acoustic signal. Although SSI are still at an early stage of development, the latest results have shown that they can be used to help tackle the issues raised by speech-based interfaces [
<xref rid="pone.0127040.ref004" ref-type="bibr">4</xref>
]. Surface Electromyography (EMG) is, among others, one of the approaches reported in literature that is suitable for implementing SSI, having achieved promising results [
<xref rid="pone.0127040.ref005" ref-type="bibr">5</xref>
,
<xref rid="pone.0127040.ref006" ref-type="bibr">6</xref>
].</p>
<p>A known challenge in SSI, including those based on surface EMG, is the detection of the nasality phenomenon in speech production, being unclear if information on nasality can be retrieved from the EMG signal. Nasality is an important characteristic of several languages, such as European Portuguese (EP) [
<xref rid="pone.0127040.ref007" ref-type="bibr">7</xref>
,
<xref rid="pone.0127040.ref008" ref-type="bibr">8</xref>
], which is the selected language for the experiments reported here. Additionally, no SSI exists for EP and, as shown before [
<xref rid="pone.0127040.ref009" ref-type="bibr">9</xref>
], nasality can negatively impact accuracy for this language. Given the particular relevance of nasality for EP [
<xref rid="pone.0127040.ref010" ref-type="bibr">10</xref>
,
<xref rid="pone.0127040.ref011" ref-type="bibr">11</xref>
], we have conducted an experiment that aims at improving on the current state-of-the-art in this area by determining the possibility of detecting nasal vowels in surface EMG-based speech interfaces, and consequently improving this type of interaction system. It is important to note that, as explained later, in more detail, the literature does not provide enough information regarding the possibility of detecting the nasality phenomenon, with surface EMG, and which sensor positions might be more adequate, to that purpose. Therefore, the goal of this first, exploratory stage is not to develop a complete, fully functional SSI system, based on EMG, considering nasality, but to conduct an exploratory study that informs future research. The main idea behind this experiment focuses on investigating two types of data containing information about the velum movement: (1) images collected using real time magnetic resonance imaging (RT-MRI) and (2) the myoelectric signal collected using surface EMG sensors. By combining these two sources, ensuring compatible recording conditions and proper time alignment, we are able to accurately estimate the time when the velum moves and the type of movement (i.e. ascending or descending), and infer the differences between nasal and oral vowels using surface EMG.</p>
<p>The use of surface electrodes, to target deeper muscles, presents several difficulties. It is not clear to what extent a surface electrode can detect the myoelectric signal, not only because of the depth at which the target muscles are located, but also due to signal propagation conditions in different tissues and associated noise. Also, the signal output of surface electrodes in the region of the face and neck will probably reflect a high level of cross talk, i.e. a mixture of signals from muscles that lie in the vicinity of, or superimposed on, the muscle fibers of interest. Therefore, this paper not only analyses the possibility of nasal vowel detection, using surface EMG, but also assesses if deeper muscles can be sensed using surface electrodes in the regions of the face and neck and the best electrode location to do so. Addressing these problems is an integral part of a challenging research agenda with the potential to impact speech, health and accessibility technologies by improving nasal vowel recognition in speech and in silent speech interfaces.</p>
<p>The remainder of this document is structured as follows: the following subsections present a brief description of relevant muscles associated with nasality in speech production and previous work focusing on EMG to measure the muscles associated with the movement of the soft palate; Section 2 describes the methodology used in this experiment, including the corpora description for both data collections and the methods used to extract nasality information from RT-MRI and to synchronize signal information; Section 3 presents the results for the exploratory analysis and classification experiment; Section 4 discusses the main outcomes and limitations of this study in light of their importance to advance the state-of-the-art and to support further research on the subject. Finally, conclusions and future work are presented in section 5.</p>
<sec id="sec002" sec-type="intro">
<title>Background</title>
<p>The production of a nasal vowel involves air flow through the oral and nasal cavities. This air passage for the nasal cavity is essentially controlled by the velum which, when lowered, allows for the velopharyngeal port to be open, enabling resonance in the nasal cavity, which causes the sound to be perceived as nasal. The production of oral sounds occurs when the velum is raised and the access to the nasal cavity is closed [
<xref rid="pone.0127040.ref012" ref-type="bibr">12</xref>
]. The process of moving the velum involves the following muscles [
<xref rid="pone.0127040.ref013" ref-type="bibr">13</xref>
<xref rid="pone.0127040.ref015" ref-type="bibr">15</xref>
], also depicted in
<xref rid="pone.0127040.g001" ref-type="fig">Fig 1</xref>
.</p>
<fig id="pone.0127040.g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Muscles of the soft palate from posterior (left), and the side (right) view.</title>
</caption>
<graphic xlink:href="pone.0127040.g001"></graphic>
</fig>
<list list-type="bullet">
<list-item>
<p>
<italic>Levator veli palatini</italic>
: This muscle has its origin in the inferior surface of the apex of the petrous part of the temporal bone and its insertion in the superior surface of the palatine aponeurosis. Its main function is to elevate and retract the soft palate achieving velopharyngeal closure;</p>
</list-item>
<list-item>
<p>
<italic>Musculus uvulae</italic>
: This muscle is embodied in the structure of the soft palate. In speech, it helps velopharyngeal closure by filling the space between the elevated velum and the posterior pharyngeal wall [
<xref rid="pone.0127040.ref016" ref-type="bibr">16</xref>
];</p>
</list-item>
<list-item>
<p>
<italic>Superior pharyngeal constrictor</italic>
: Although this is a pharyngeal muscle, when it contracts it narrows the pharynx upper wall, which elevates the soft palate;</p>
</list-item>
<list-item>
<p>
<italic>Tensor veli palatini</italic>
: This muscle tenses and spreads the soft palate and assists the
<italic>levator veli palatini</italic>
in elevating it. It also dilates the Eustachian tube. This muscle is innervated by means of the mandibular nerve of the V trigeminal, and not by the XI accessory nerve, as the remaining muscles of the soft palate. It is the only muscle of the soft palate that is innervated by a different nerve;</p>
</list-item>
<list-item>
<p>
<italic>Palatoglossus</italic>
: Along with gravity, relaxation of the above-mentioned muscles and the
<italic>Palatopharyngeous</italic>
, this muscle is responsible for the lowering of the soft palate.</p>
</list-item>
</list>
</sec>
<sec id="sec003">
<title>Related Work</title>
<p>In previous studies, the application of EMG to measure the level of activity of these muscles has been performed by means of intramuscular electrodes [
<xref rid="pone.0127040.ref013" ref-type="bibr">13</xref>
,
<xref rid="pone.0127040.ref017" ref-type="bibr">17</xref>
] and surface electrodes positioned directly on the oral surface of the soft palate [
<xref rid="pone.0127040.ref018" ref-type="bibr">18</xref>
,
<xref rid="pone.0127040.ref019" ref-type="bibr">19</xref>
]. Our work differs from the cited papers, since none of them uses surface electrodes placed in the face and neck regions, a significantly less invasive approach and quite more realistic and representative of the SSI case scenarios. Also, although intramuscular electrodes may offer more reliable myoelectric signals, they also require considerable medical skills and, for that reason, intramuscular electrodes were discarded for this study.</p>
<p>To the best of our knowledge, no literature exists in terms of detecting the muscles involved in the velopharyngeal function with surface EMG electrodes placed on the face and neck. Previous studies, in the lumbar spine region, have shown that, if proper electrode positioning is considered, a representation of deeper muscles can be acquired [
<xref rid="pone.0127040.ref020" ref-type="bibr">20</xref>
], thus raising a question that is currently unanswered: Is surface EMG, positioned in the face and neck regions, able to detect activity of the muscles related to nasal port opening/closing and, consequently, detect the nasality phenomena? Another related question that can be raised is how we can ascertain, with some confidence, that the signal we are retrieving is, in fact, a myoelectric signal mostly generated by the velum movement and not by spurious movements caused by neighboring muscles, unrelated to the velopharyngeal function.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="sec004">
<title>Methodology</title>
<p>All human participants have provided informed written consent. The experiments reported in this study have been evaluated and approved by the ethics committee of the University Institute of Lisbon (ISCTE-IUL) regulated by dispatch n°7095/2011.</p>
<p>For determining the possibility of detecting nasal vowels using surface EMG we needed to know when the velum was moving, to avoid misinterpreting signals coming from other muscles, artifacts and noise, as signals coming from the target muscles. To overcome this problem we took advantage of a previous data collection based on RT-MRI [
<xref rid="pone.0127040.ref021" ref-type="bibr">21</xref>
], which provides an excellent method to estimate when the velum was moving and interpret EMG data accordingly.</p>
<p>Recent advances in MRI technology allow real-time visualization of the vocal tract with an acceptable spatial and temporal resolution. This technology enabled access to real time images with relevant articulatory information for our study, including velum raising and lowering. In order to align both signals, audio recordings were performed in both data collections for the same set of speakers. It is important to notice that EMG and RT-MRI data cannot be collected together, due to hardware restrictions, so the best option was to collect the same corpus for the same set of speakers, at different times, reading the same prompts in EMG and RT-MRI.</p>
<p>Both corpora (EMG and RT-MRI velum information) have been made available according to the journal’s data policy and can be accessed in
<ext-link ext-link-type="uri" xlink:href="http://sweet.ua.pt/sss/resources">http://sweet.ua.pt/sss/resources</ext-link>
.</p>
<sec id="sec005">
<title>Corpora</title>
<p>The two corpora collected in this study (RT-MRI and EMG) share the same prompts. The set of prompts is composed of several nonsense words containing five EP nasal vowels (using the Speech Assessment Methods Phonetic Alphabet (SAMPA) [6~, e~, i~, o~, u~]) isolated and in word-initial, word-internal and word-final context (e.g. ampa [6~p6], pampa [p6~p6], pam [p6~]). The nasal vowels were flanked by the bilabial stop or the labiodental fricative. For comparison purposes, the set of prompts also includes oral vowels, both isolated and in context. In the EMG data collection a total of 90 utterances per speaker were recorded. A more detailed description of the RT-MRI corpus can be found in [
<xref rid="pone.0127040.ref021" ref-type="bibr">21</xref>
]. In the EMG data collection, we have also recorded three silence prompts (i.e. prompts where the speaker does not speak or makes any kind of movement) to further validate the system and the acquired EMG signal.</p>
<p>For validation purposes, to support the assessment of aspects such as reproducibility, we have also recorded four additional EMG sessions, for one random speaker, with the same prompts.</p>
<sec id="sec006">
<title>Speakers</title>
<p>The three speakers participating in this study were all female, native speakers of EP, with the following ages: 33, 22 and 22 years. No history of hearing or speech disorders is known for any of them. The first speaker is a professor in the area of Phonetics and the remaining speakers are students in the area of Speech Therapy. All speakers have provided written and informed consent for the data collections.</p>
<p>Regarding the number of speakers considered, and how they might affect the results of this exploratory study, please bear in mind the following aspects. The aim of this work does not include proposing a finished system, but to explore the possibility of developing such system. It should be noted that the required resources, to accomplish this kind of complex studies, are very expensive, and a large data collection is strongly unadvisable without an adequate set of exploratory studies to assert its feasibility and applicability. Additionally, the literature, given the innovative nature of the presented work, does not provide enough information to allow skipping these initial steps. Therefore, we chose the number of speakers based on the data available from an existing RT-MRI study that suited our needs.</p>
</sec>
</sec>
<sec id="sec007">
<title>RT-RMI Data</title>
<p>The RT-MRI data collection was previously conducted at IBILI/Coimbra for nasal production studies. Images were acquired in the midsagittal and coronal oblique planes of the vocal tract using an Ultra-Fast RF-spoiled Gradient Echo (GE) pulse sequence and yielding a frame rate of 14 frames/second. Each recorded sequence contained 75 images. Additional information concerning the image acquisition protocol can be found in [
<xref rid="pone.0127040.ref022" ref-type="bibr">22</xref>
].</p>
<p>The audio was recorded simultaneously with the real-time images, inside the scanner, at a sampling rate of 16000Hz, using a fiber optic microphone. For synchronization purposes a TTL pulse was generated from the RT-MRI scanner [
<xref rid="pone.0127040.ref021" ref-type="bibr">21</xref>
]. Currently, the corpus contains only three speakers due to costs per recording session and availability of the technology involved.</p>
</sec>
<sec id="sec008">
<title>Extraction of information on nasal port from RT-MRI data</title>
<p>For the mid-sagittal RT-MRI sequences of the vocal tract, since the main interest was to interpret velum position/movement from the sagittal RT-MRI sequences, instead of measuring distances (e.g. from velum tip to the posterior pharyngeal wall), we opted for a method based on the area variation between the velum and pharynx, which is closely related to velum position.</p>
<p>An image with the velum fully lowered was used to define a region of interest (ROI). Then, a region growing algorithm was applied with a seed defined in a hypo-intense pixel inside the ROI. This ROI is roughly positioned between the open velum and the back of the vocal tract. The main purpose is that the velum will move over that region when closing. Since this first ROI could be defined as enclosing a larger region, even including a part of the velum (which will not influence the process), it is only important that the seed is placed in a dark (hypo-intense) pixel inside it, in order to exclude the most of the velum from the region growing when it is positioned inside the ROI. Since there is spatial coherence between the images in each sequence, by defining a seed neighborhood including time, a single seed is typically enough to propagate the region growing inside the ROI over all image frames.
<xref rid="pone.0127040.g002" ref-type="fig">Fig 2</xref>
presents the contours of the segmented region over different image frames encompassing velum lowering and rising. For representation purposes, in order not to occlude the image beneath, only the contour of the segmented region is presented. Processing is always performed over the pixels enclosed in the depicted region. Notice that the blue boundaries presented in the images depict the result of the region growing inside the defined ROI (which just limits the growth) and not the ROI itself. The number of hypo-intense pixels (corresponding to an area) inside the ROI decreases when the velum closes and increases when the velum opens. Therefore, a closed velum corresponds to area minima while an open velum corresponds to local area maxima, which allows detecting the frames where the velum is open. Since for all image sequences there was no informant movement, the ROI has to be set only once for each informant, and can then be reused throughout all the processed sagittal real-time sequences. After ROI definition (around one minute and reusable throughout all image sequences from the same speaker), setting a seed, revising the results and storing the data took one minute per image sequence.</p>
<fig id="pone.0127040.g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g002</object-id>
<label>Fig 2</label>
<caption>
<title>Mid-sagittal RT-MRI images of the vocal tract for several velum positions, over time, showing evolution from a raised velum, to a lowered velum and back to initial conditions.</title>
<p>The presented curve, used for analysis, was derived from the images.</p>
</caption>
<graphic xlink:href="pone.0127040.g002"></graphic>
</fig>
<p>These images allowed deriving a signal, over time that describes the velum movement (also shown in
<xref rid="pone.0127040.g002" ref-type="fig">Fig 2</xref>
and depicted as dashed line in
<xref rid="pone.0127040.g003" ref-type="fig">Fig 3</xref>
). As can be observed, minima correspond to a closed velopharingeal port (oral sound) and maxima to an open port (nasal sound).</p>
<fig id="pone.0127040.g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g003</object-id>
<label>Fig 3</label>
<caption>
<title>Exemplification of the warped signal representing the nasal information extracted from RT-MRI (dashed line) superimposed on the speech recorded during the corresponding RT-MRI and EMG acquisition, for the sentence [6~p6, p6~p6, p6~].</title>
</caption>
<graphic xlink:href="pone.0127040.g003"></graphic>
</fig>
<p>Concerning the coronal-oblique image sequences, exploring the spatial coherence between image frames, a segmentation method based on region growing (with a neighborhood defined including time) was used. The user had to place a few seeds inside the oral cavity and inside the nasal cavity (typically four or five seeds in total). This process of segmenting the oral and nasal cavities, marking the seeds where needed, and obtaining/inspecting the oral and nasal cavities segmentations, by going through the image sequence, took around one minute per sequence. The area data for the nasal cavity was automatically computed and the resulting variation curve depicts the velum movement with the maxima and minima (zero) corresponding to an open and closed velopharyngeal port respectively. Additional details concerning the segmentation of the oblique real-time images can be found in [
<xref rid="pone.0127040.ref021" ref-type="bibr">21</xref>
].</p>
<p>
<xref rid="pone.0127040.g004" ref-type="fig">Fig 4</xref>
presents different image frames showing the nasal cavity (depicted in white), encompassing lowering and rising of the velum and showing the derived curve.</p>
<fig id="pone.0127040.g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g004</object-id>
<label>Fig 4</label>
<caption>
<title>Coronal-oblique RT-MRI images depicting the nasal cavity (in white), over time, and the curve derived for analysis purposes.</title>
</caption>
<graphic xlink:href="pone.0127040.g004"></graphic>
</fig>
</sec>
<sec id="sec009">
<title>Surface EMG Data Collection</title>
<p>For this data collection the same speakers from the RT-MRI recordings were involved. For each speaker, the corpus was recorded completely in a single session, and the sensors were never removed during the recordings. Before placing the sensors, the sensor location was first cleaned with alcohol. While uttering the prompts an assistant asked the speaker to perform no other movement, besides the one associated with speech production, particularly any kind of neck movement. To minimize spurious muscle movements the speaker was allowed to rest between prompts, however, although the prompts had a short duration, we cannot guarantee the complete absence of innate movements such as breathing. The recordings took place in an isolated, quiet room. The assistant was responsible for pushing the record button and also for stopping the recording, to avoid unwanted muscle activity. The prompts were presented to the speaker in a random order and were selected based on the already existent RT-MRI corpus [
<xref rid="pone.0127040.ref021" ref-type="bibr">21</xref>
]. In this data collection two signals were acquired: myoelectric and audio. For synchronization purposes, after starting the recording, a marker was generated in both signals. In the sections below surface EMG and audio acquisition setups are described.</p>
<sec id="sec010">
<title>Surface EMG setup</title>
<p>The same acquisition system from Plux [
<xref rid="pone.0127040.ref023" ref-type="bibr">23</xref>
] was used to collect all EMG data. We used 5 pairs of EMG surface electrodes connected to a device that communicates with a computer via Bluetooth. The sensors were attached to the skin using single-use 2.5cm diameter clear plastic self-adhesive surfaces with approximately 2cm spacing between the electrodes’ center. One of the difficulties found, while preparing this study, was that no specific background literature in speech science exists to inform on the best sensor position to detect the muscles referred in section 0. Hence, based on anatomy and physiology literature (for example [
<xref rid="pone.0127040.ref014" ref-type="bibr">14</xref>
]) and preliminary trials, we determined sensor placement to cover, as much as possible, the positions that, most likely, are the best for detecting the targeted muscles. To measure the myoelectric activity we used both a bipolar and monopolar surface electrode configuration. In the monopolar configuration, instead of having both electrodes placed directly on the articulatory muscles (as in the bipolar configuration), one of the electrodes is used as a reference (i.e. located in a place with low or negligible muscle activity). In both configurations the result will be the amplified difference between the pair of electrodes.</p>
<p>As depicted in
<xref rid="pone.0127040.g005" ref-type="fig">Fig 5</xref>
, the 5 sensors pairs were positioned in the following locations:
<list list-type="bullet">
<list-item>
<p>EMG 1 was placed in the area superior to the mandibular notch, superficial to the mandibular fossa;</p>
</list-item>
<list-item>
<p>EMG 2 was placed in the area inferior to the ear between the mastoid process and the mandible angle, on the right side of the face using a monopolar configuration;</p>
</list-item>
<list-item>
<p>EMG 3 was also placed in the same position as EMG2 but used a bipolar configuration and was placed on the left side of the face;</p>
</list-item>
<list-item>
<p>EMG 4 was placed in the superior neck area, beneath the mandibular corpus, at an equal distance from the mandible angle (EMG 2) and the mandible mental prominences (EMG 5);</p>
</list-item>
<list-item>
<p>EMG 5 was placed in the superior neck area, beneath the mandible mental prominences.</p>
</list-item>
</list>
</p>
<fig id="pone.0127040.g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g005</object-id>
<label>Fig 5</label>
<caption>
<title>EMG electrodes positioning and the respective channels (1 to 5) plus the reference electrode (R).</title>
<p>EMG 1 and 2 use unipolar configurations and EMG 3, 4 and 5 use bipolar configurations.</p>
</caption>
<graphic xlink:href="pone.0127040.g005"></graphic>
</fig>
<p>The reference electrodes (EMG R) were placed in the mastoid portion of the temporal bone and in the cervical vertebrae. Considering the sensors’ location, they were expected to acquire unwanted myoelectric signals due to the superposition of the muscles in these areas, such as the jaw muscles. However, in spite of the muscles of the velum being remote from this peripheral region, we expected to be able to select a sensor location that enabled us to identify and classify the targeted muscle signal with success.</p>
<p>The technical specifications of the acquisition system [
<xref rid="pone.0127040.ref023" ref-type="bibr">23</xref>
] include snaps with a diameter of 1.46cm and 0.62cm of height, a voltage range that goes from 0.0V to 5.0V and a voltage gain of 1000. The recording signal was sampled at 600Hz and 12 bit samples were used. For system validation, we have conducted several preliminary tests on larger superficial muscles.</p>
</sec>
<sec id="sec011">
<title>Audio system setup</title>
<p>The audio recordings were performed using a laptop integrated dual-microphone array using a sample rate of 8000Hz, 16 bits per sample and a single audio channel. Since the audio quality was not a requirement in this collection we opted for this solution instead of a headset microphone, which could cause interference with the EMG signal.</p>
</sec>
</sec>
<sec id="sec012">
<title>Signal Synchronization</title>
<p>In order to address the nasal vowel detection problem we needed to synchronize the EMG and RT-MRI signals. Both the EMG and the RT-MRI datasets were aligned with their corresponding audio recordings. Next, we resampled the audio recordings to 12000Hz and applied Dynamic Time Warping (DTW) [
<xref rid="pone.0127040.ref024" ref-type="bibr">24</xref>
]to find the optimal match between corresponding audio recordings in both datasets. Based on the DTW result we mapped the information extracted from the RT-MRI to the EMG time axis, establishing the required correspondence between the EMG and the RT-MRI information, as depicted in
<xref rid="pone.0127040.g003" ref-type="fig">Fig 3</xref>
. In order to validate the alignment, we annotated the audio
<italic>corpora</italic>
of speaker 1 and compared the beginning of each word in the warped RT-MRI audio signal with the EMG audio signal. The relative position of the nasality signal of the EMG audio signal was found to be very similar to the one observed in the RT-MRI audio signal, which was an indication of good synchronization.</p>
<p>Based on the information extracted from the RT-MRI signal (after signal alignment), we were able to segment the EMG signal into nasal and non-nasal zones, using the zone boundary information to identify which parts of the EMG signal were nasal or non-nasal, as depicted in
<xref rid="pone.0127040.g006" ref-type="fig">Fig 6</xref>
. Considering a normalized RT-MRI signal
<italic>x</italic>
, we determined that
<italic>x(n)</italic>
<inline-formula id="pone.0127040.e001">
<alternatives>
<graphic xlink:href="pone.0127040.e001.jpg" id="pone.0127040.e001g" position="anchor" mimetype="image" orientation="portrait"></graphic>
<mml:math id="M1">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
+ (σ / 2) was nasal and
<italic>x(n)</italic>
<
<inline-formula id="pone.0127040.e002">
<alternatives>
<graphic xlink:href="pone.0127040.e002.jpg" id="pone.0127040.e002g" position="anchor" mimetype="image" orientation="portrait"></graphic>
<mml:math id="M2">
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
+ (σ / 2) was non-nasal, based on an empirical analysis that considered the signals from all users. However, in order to include the whole transitional part of the signal (i.e. lowering and raising of the velum) in the nasal zones, we used the angle between the nearest peak and the points where the
<italic>x(n)</italic>
=
<inline-formula id="pone.0127040.e003">
<alternatives>
<graphic xlink:href="pone.0127040.e003.jpg" id="pone.0127040.e003g" position="anchor" mimetype="image" orientation="portrait"></graphic>
<mml:math id="M3">
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
to calculate the nasal zone boundaries. This was done after testing different methods and their ability to cope with signal variability among speakers.</p>
<fig id="pone.0127040.g006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Exemplification of the EMG signal segmentation into nasal and non-nasal zones based on the information extracted from the RT-RMI (dashed red line).</title>
<p>The square wave depicted with a black line represents the velum information split into two classes where 0 stands for non-nasal and 1 for nasal. The blue line is the average of the RT-MRI information (after normalization) and the green line is the average plus half of the standard deviation.</p>
</caption>
<graphic xlink:href="pone.0127040.g006"></graphic>
</fig>
<p>As detailed in
<xref rid="pone.0127040.g006" ref-type="fig">Fig 6</xref>
, if we picture two right triangles formed between the peak and
<italic>x(n) =</italic>
0, by knowing angle θ
<sub>1</sub>
(between the peak and the
<italic>yy</italic>
axis), the opposing
<italic>cathetus</italic>
of θ
<sub>2</sub>
and assuming that θ
<sub>1</sub>
= θ
<sub>2</sub>
, we were able to determine the magnitude of
<italic>v</italic>
<sub>2</sub>
and set a zone boundary that included either the lowering or the raising part of the signal. In one of the speakers there were a few cases where the velum remains open for longer than expected. For these situations different peaks were used for the initial and end boundary of a nasal zone.</p>
<p>Regarding the different speaker postures during acquisition, between the RT-MRI and EMG data, and how these could influence the matching of both datasets, several studies have been presented in the literature concerning the differences, in vocal tract configuration, between sitting/upright and supine acquisitions of articulatory data. In general, mild differences in vocal tract shape and articulator position, due to body posture, were reported by several authors [
<xref rid="pone.0127040.ref025" ref-type="bibr">25</xref>
<xref rid="pone.0127040.ref028" ref-type="bibr">28</xref>
]. These mostly refer to an overall tendency of the articulators to deform according to gravity, resulting in more retracted positions of the tongue and, to a lesser extent, of the lips and lower end of the uvula in the supine acquisitions. To which extent this effect is observed varies between speakers [
<xref rid="pone.0127040.ref026" ref-type="bibr">26</xref>
]. None of these studies specifically addressed the velum or nasality, but analysis of vocal tract outlines, when they contemplate the velum [
<xref rid="pone.0127040.ref026" ref-type="bibr">26</xref>
], did not seem to show any major difference. Considering the acoustic properties of speech, one important finding was that no significant differences have been found between the two body postures, during acquisition [
<xref rid="pone.0127040.ref027" ref-type="bibr">27</xref>
,
<xref rid="pone.0127040.ref029" ref-type="bibr">29</xref>
]. Furthermore, the acquisition of running speech, as shown by Tiede et al. [
<xref rid="pone.0127040.ref025" ref-type="bibr">25</xref>
] and Engwall [
<xref rid="pone.0127040.ref029" ref-type="bibr">29</xref>
], minimizes vocal tract shape and articulator position differences occurring between upright and supine positions.</p>
<p>Concentrating on the velopharyngeal mechanism, Perry [
<xref rid="pone.0127040.ref030" ref-type="bibr">30</xref>
] studied its configuration for upright and supine positions during speech production, and concluded that no significant difference was present regarding velar length, velar height and
<italic>levator</italic>
muscle length.</p>
<p>Concerning muscle activity, Moon et al. [
<xref rid="pone.0127040.ref031" ref-type="bibr">31</xref>
] present a study where the activity of the
<italic>levator veli palatini</italic>
and
<italic>palatoglossus</italic>
muscles for upright and supine speaker postures was assessed using electromyography. The activation levels were smaller, for the supine position, during the closing movement (with gravity working in the same direction), but no timing differences were reported.</p>
<p>Therefore, considering the used methodology and the knowledge available in the literature, the different postures do not seem to be a major differencing factor between the two datasets that would preclude their matching through the acoustic signal. Furthermore, no evidence exists that muscle activity is relevantly affected by speaker posture. For both postures, apart from different activation levels, similar muscular configurations and activity seem to be observed, supporting the assumption that, after aligning both datasets, muscle activity relating to the velopharyngeal mechanism should also be aligned between both signals. This allows inferring muscle activity intervals from velum movement extracted from the MRI images.</p>
</sec>
<sec id="sec013">
<title>Data processing and analysis methods</title>
<p>In order to facilitate the analysis, we first pre-processed the EMG signal by normalizing it and applying a 12-point moving average filter with zero-phase distortion to the absolute value of the normalized EMG signal. An example of this pre-processing is depicted in
<xref rid="pone.0127040.g007" ref-type="fig">Fig 7</xref>
.</p>
<fig id="pone.0127040.g007" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g007</object-id>
<label>Fig 7</label>
<caption>
<title>Raw EMG signal and pre-processed EMG signal of channel 1 (top) and 3 (bottom) for the sentence [6~p6, p6~p6, p6~] from speaker 1.</title>
<p>The pre-processed signal has been normalized and filtered using a 12-point moving average filter.</p>
</caption>
<graphic xlink:href="pone.0127040.g007"></graphic>
</fig>
<p>In the signal analysis, to measure the dependence between the MRI information and the EMG signal we have used the mutual information concept. The mutual information [
<xref rid="pone.0127040.ref032" ref-type="bibr">32</xref>
,
<xref rid="pone.0127040.ref033" ref-type="bibr">33</xref>
], also referred to as transinformation, was derived from the bases of information theory, and is given by:
<disp-formula id="pone.0127040.e004">
<alternatives>
<graphic xlink:href="pone.0127040.e004.jpg" id="pone.0127040.e004g" position="anchor" mimetype="image" orientation="portrait"></graphic>
<mml:math id="M4">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo></mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo></mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
<label>(1)</label>
</disp-formula>
where
<italic>p(x</italic>
,
<italic>y)</italic>
is the joint probability distribution function of
<italic>X</italic>
and
<italic>Y</italic>
, and
<italic>p(x)</italic>
and
<italic>p(y)</italic>
are the marginal probability distribution functions of
<italic>X</italic>
and
<italic>Y</italic>
respectively. This measures the common dependence of the two variables, providing the difference of information of one signal given the other. In this study we used a normalized mutual information [
<xref rid="pone.0127040.ref034" ref-type="bibr">34</xref>
] measure to analyze the relation between the signals.</p>
<p>To estimate classifier performance we have split the EMG signal into 100ms frames with a frame shift of 20ms. Afterwards, we applied a 10-fold cross-validation technique to the whole set of frames from the 3 speakers to split the data into training and validation subsets.</p>
<p>From each frame we extracted 9 first order temporal features similar to the ones used by Hudgins et al. [
<xref rid="pone.0127040.ref035" ref-type="bibr">35</xref>
]. Our feature vector was then composed of mean, absolute mean, standard deviation, maximum, minimum, kurtosis, energy, zero-crossing rate and mean absolute slope. Both feature set and frame sizes were determined empirically after several experiments. For classification we used Support Vector Machines (SVM) with a Gaussian Radial Basis Function.</p>
<p>To confirm the existence of significant differences among the results of the EMG channels, we used an Analysis of Variance (ANOVA) of the error rate, using SPSS (SPSS 19.0 –SPSS Inc., Chicago, IL, USA) and R [
<xref rid="pone.0127040.ref036" ref-type="bibr">36</xref>
,
<xref rid="pone.0127040.ref037" ref-type="bibr">37</xref>
].</p>
</sec>
</sec>
<sec sec-type="results" id="sec014">
<title>Results</title>
<p>In this section the results of the analysis combining the EMG signal with the information extracted from the RT-MRI signal, two classification experiments, and a reproducibility assessment are presented. In the first classification experiment, the EMG signal was divided into frames and each frame was classified as being nasal or non-nasal. The second experiment also divides the EMG signal into frames, but the classification was made by nasal and non-nasal zones, whose limits were known a priori based on the information extracted from the RT-MRI. The final part of this section addresses the problem of EMG signal variability across recording sessions.</p>
<p>For the signal and statistical analysis of the EMG signal we considered 43 observations per speaker covering all EP nasal vowels, each containing the nasal vowel in several word positions (initial, internal and final) and flanked by [p]. We have also used, for visual analysis only, 4 observations per speaker containing isolated EP nasal vowels ([6~, e~, i~, o~, u~]).</p>
<sec id="sec015">
<title>Exploratory Visual Analysis</title>
<p>After extracting the required information from the RT-MRI images and aligning it with the EMG signal we visually explored possible relations between the signals. The EMG signal, for all channels, after pre-processing, along with the data derived from the RT-MRI, aligned as described in the previous section, are depicted in
<xref rid="pone.0127040.g008" ref-type="fig">Fig 8</xref>
. Based on a visual analysis, it is worth noticing that several peaks anticipate the nasal sound, especially in channels 2, 3 and 4. These peaks are most accentuated for the middle and final word position.</p>
<fig id="pone.0127040.g008" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g008</object-id>
<label>Fig 8</label>
<caption>
<title>Filtered EMG signal (12-point moving average filter) for the several channels (pink), the aligned RT-MRI information (blue) and the respective audio signal for the sentence [6~p6, p6~p6, p6~] from speaker 1.</title>
<p>An amplitude gain was applied to the RT-MRI information and to the EMG for better visualization of the superimposed signals.</p>
</caption>
<graphic xlink:href="pone.0127040.g008"></graphic>
</fig>
<p>By using surface electrodes, the risk of acquiring myoelectric signal superposition is relatively high, particularly from muscles related with the movement of the lower jaw and the tongue, given the electrodes’ position. However, if we analyze an example of a close vowel such as [i~], for which the movement of the jaw is less prominent, the peaks found in the signal still anticipate the RT-MRI velar information for channels 3 and 4, as depicted in
<xref rid="pone.0127040.g009" ref-type="fig">Fig 9</xref>
. Channel 5 also exhibits a more active behavior in this case, which might be caused by its position near the tongue muscles and the tongue movement associated with the articulation of the [i~] vowel.</p>
<fig id="pone.0127040.g009" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g009</object-id>
<label>Fig 9</label>
<caption>
<title>Filtered EMG signal (12-point moving average filter) for the several channels (pink), the aligned RT-MRI information (blue) and the respective audio signal for the sentence [i~p6, i~p6, pi~] from speaker 1.</title>
<p>An amplitude gain was applied to the RT-MRI information and to the EMG for better visualization of the superimposed signals.</p>
</caption>
<graphic xlink:href="pone.0127040.g009"></graphic>
</fig>
<p>
<xref rid="pone.0127040.g010" ref-type="fig">Fig 10</xref>
shows the audio, RT-MRI and the EMG signal for an utterance that contains isolated EP nasal vowels in the following order: [6~, e~, i~, o~, u~]. These particular utterances are relevant since, in this case, minimal movement of the external articulators such as the lower jaw is required. If the same analysis is considered for isolated nasal vowels of the same speaker, EMG Channel 1 signal exhibits a clearer signal, apparently with less muscle crosstalk and peaks can be noticed before the nasal vowels. For the remaining channels there is not a clear relation with all the vowels, although signal amplitude variations can be noticed in the last three vowels for Channels 3 and 5.</p>
<fig id="pone.0127040.g010" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g010</object-id>
<label>Fig 10</label>
<caption>
<title>Portuguese vowels in an isolated context (Pre-processed EMG signal for all EMG channels (pink), the aligned RT-MRI information (blue) and the respective audio signal (red) for [6~, e~, i~, o~, u~]).</title>
<p>An amplitude gain was applied to the RT-MRI information and to the EMG for better visualization of the superimposed signals.</p>
</caption>
<graphic xlink:href="pone.0127040.g010"></graphic>
</fig>
<p>To confirm our first impressions, gathered from the analysis presented above, we have also conducted a quantitative analysis, investigating the existence of mutual information between the EMG signal and the information extracted from the RT-MRI signal. This measure of mutual dependence between the nasal zones of the RT-MRI signal and the EMG signal is depicted in
<xref rid="pone.0127040.g011" ref-type="fig">Fig 11</xref>
, allowing us to investigate the relation between the signals from another viewpoint. When conducting this analysis, considering the nasal zones for all speakers simultaneously, the mutual information values for channels 3, 4 and 5 were slightly higher, as depicted by the boxplots presented on
<xref rid="pone.0127040.g011" ref-type="fig">Fig 11</xref>
. When considering each speaker individually, we found that, at least for one speaker, the best results can be found for channels 3, 4 and 5 as well. Also, for the same speaker, we noticed that the amount of mutual information for non-nasal zones was close to zero.</p>
<fig id="pone.0127040.g011" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g011</object-id>
<label>Fig 11</label>
<caption>
<title>Boxplot of the mutual information in the nasal zones between the RT-MRI information and the EMG signal of all speakers and for a single speaker.</title>
</caption>
<graphic xlink:href="pone.0127040.g011"></graphic>
</fig>
<p>Other situations, such as the relation between the RT-MRI signal and the non-nasal zones of the EMG signal and the relation between the RT-MRI signal and the whole EMG signal, were also analyzed but no relevant information was found. Also, other measures such as the Pearson’s product-moment correlation coefficient, which measures the degree of linear dependence between two signals, were also employed, but magnitudes below 0.1 were found, indicating a very weak linear relationship between the signals.</p>
<p>The fact that all seemed to point to the presence of differences between the two classes (nasal and non-nasal) motivated an exploratory classification experiment based on SVM [
<xref rid="pone.0127040.ref038" ref-type="bibr">38</xref>
], which has shown to yield acceptable performance in other applications, even when trained with small data sets. The results of this experiment are presented in what follows.</p>
</sec>
<sec id="sec016">
<title>Frame-based Nasality Classification</title>
<p>In a real use situation the information about the nasal and non-nasal zones, extracted from the RT-MRI signal, is not available. Thus, in order to complement our study, and because one of our goals is to have a nasality feature detector, we have conducted an experiment where we split the EMG signal into frames and classify them as one of two classes: nasal or non-nasal. Relevant statistics and distribution of the used dataset are described in
<xref rid="pone.0127040.t001" ref-type="table">Table 1</xref>
.</p>
<table-wrap id="pone.0127040.t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t001</object-id>
<label>Table 1</label>
<caption>
<title>Class distribution for all speakers for a single EMG channel by zones and frames (nasal and non-nasal).</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t001g" xlink:href="pone.0127040.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker 1</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker 2</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker 3</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>All speakers</bold>
</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">
<italic>Utterances</italic>
</th>
<th align="left" rowspan="1" colspan="1">15</th>
<th align="left" rowspan="1" colspan="1">14</th>
<th align="left" rowspan="1" colspan="1">15</th>
<th align="left" rowspan="1" colspan="1">44</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Total Frames (percentage of frames in relation with the total for all speakers)</italic>
</td>
<td align="left" rowspan="1" colspan="1">836 (53.2%)</td>
<td align="left" rowspan="1" colspan="1">283 (18.0%)</td>
<td align="left" rowspan="1" colspan="1">453 (28.8%)</td>
<td align="left" rowspan="1" colspan="1">1572</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Nasal frames (percentage of nasal frames) / Non-nasal frames (percentage of non-nasal frames)</italic>
</td>
<td align="left" rowspan="1" colspan="1">357 (42.7%) / 479 (57.3%)</td>
<td align="left" rowspan="1" colspan="1">195 (68.9%) / 88 (31.1%)</td>
<td align="left" rowspan="1" colspan="1">249 (55.0%) / 204 (45.0%)</td>
<td align="left" rowspan="1" colspan="1">801 (51.0%) / 771 (49.0%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Total Zones (percentage of nasal and non-nasal zones in relation with the total for all speakers)</italic>
</td>
<td align="left" rowspan="1" colspan="1">76 (35.3%)</td>
<td align="left" rowspan="1" colspan="1">65 (30.2%)</td>
<td align="left" rowspan="1" colspan="1">74 (34.4%)</td>
<td align="left" rowspan="1" colspan="1">215</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Nasal Zones (percentage of nasal zones) / Non-nasal Zones (percentage of non-nasal zones)</italic>
</td>
<td align="left" rowspan="1" colspan="1">45 (59.2%) / 31 (40.8%)</td>
<td align="left" rowspan="1" colspan="1">45 (69.2%) / 20 (30.8%)</td>
<td align="left" rowspan="1" colspan="1">45 (60.8%) / 29 (39.2%)</td>
<td align="left" rowspan="1" colspan="1">135 (62.8%) / 80 (37.2%)</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t001fn001">
<p>The first row presents the number of utterances per speaker and the total number of utterances for all speakers. The second row presents the total frames: the percentages were calculated in relation to the total amount of frames of all speakers (e.g. Speaker 3 data contains 28.8% of the frames found in this corpus). The third row presents the number of nasal and non-nasal frames and the percentage values concern the distribution of the frame type (e.g. 42.7% of the frames for Speaker 1 are nasal). In the third and fourth rows we apply the same presentation structure (as in the second and third rows) but concerning nasal and non-nasal zones.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Classification was initially performed using the data from all speakers. The error rate results are depicted in
<xref rid="pone.0127040.g012" ref-type="fig">Fig 12</xref>
and the sensitivity and specificity results are presented in
<xref rid="pone.0127040.t002" ref-type="table">Table 2</xref>
. Besides the mean value of the 10-fold cross validation, 95% confidence intervals are also included. Results show a best result for EMG Channel 3 with a 32.5% mean error rate, presenting a mean sensitivity and mean specificity of 65.5% and 69.4%, respectively. Channels 4 and 2 presented similar results and achieved second and third best results with mean error rates of 32.7% and 33.2% and with slightly lower sensitivity values of 61.3% and 63.0%, and higher specificity values of 73.0% and 70.4%.</p>
<fig id="pone.0127040.g012" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g012</object-id>
<label>Fig 12</label>
<caption>
<title>Classification results (mean value of the 10-fold for error rate, sensitivity and specificity) for all channels and all speakers.</title>
<p>Error bars show a 95% confidence interval.</p>
</caption>
<graphic xlink:href="pone.0127040.g012"></graphic>
</fig>
<table-wrap id="pone.0127040.t002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t002</object-id>
<label>Table 2</label>
<caption>
<title>Mean sensitivity and specificity measures (%) for each EMG channel with a 95% confidence interval.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t002g" xlink:href="pone.0127040.t002"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG channel</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Specificity</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Sensitivity</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1</italic>
</td>
<td align="left" rowspan="1" colspan="1">67.1±3.2</td>
<td align="left" rowspan="1" colspan="1">61.1±3.8</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2</italic>
</td>
<td align="left" rowspan="1" colspan="1">70.4±2.5</td>
<td align="left" rowspan="1" colspan="1">63.0±2.9</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3</italic>
</td>
<td align="left" rowspan="1" colspan="1">69.4±2.7</td>
<td align="left" rowspan="1" colspan="1">65.5±3.3</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>4</italic>
</td>
<td align="left" rowspan="1" colspan="1">73.0±2.1</td>
<td align="left" rowspan="1" colspan="1">61.3±4.3</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>5</italic>
</td>
<td align="left" rowspan="1" colspan="1">67.5±3.6</td>
<td align="left" rowspan="1" colspan="1">59.8±3.1</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>Classification was also performed for each individual speaker. Error rates are shown in
<xref rid="pone.0127040.g013" ref-type="fig">Fig 13</xref>
, for each speaker (left) and overall, per channel (right), along with the corresponding 95% confidence interval. The best overall result of 24.3% was attained using EMG channel 3. The best results for each individual speaker were found for speaker 3 with 23.4% and 23.6% mean error rate in EMG channel 4 and 3. For speaker 1 and 2, EMG channel 3 presented the best results with 25.7% and 23.7% mean error rate. On average, speaker 1 presented the least variability of results, as shown by the confidence intervals. It is also interesting to note the difference of results in EMG channel 1, where speaker 2 attained a mean error rate of 24.1%. However, the class distribution slightly changes for speaker 2 with 68.9% nasal frames, compared with 42.7% and 55.0% frames for speaker 1 and 3. A closer look into the data of speaker 2 also reveals that the higher amount of nasal frames is explained by breaths between words, which implies opening the velum.</p>
<fig id="pone.0127040.g013" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g013</object-id>
<label>Fig 13</label>
<caption>
<title>The graph on the left shows the mean error rate for each speaker clustered by EMG channel.</title>
<p>The graph on the right shows the mean of the error rates from each speaker also clustered by EMG channel. Error bars show a 95% confidence interval.</p>
</caption>
<graphic xlink:href="pone.0127040.g013"></graphic>
</fig>
<p>Looking at the results from a different perspective, if we subtract the global mean error rate of all channels the following results were found. As depicted in
<xref rid="pone.0127040.g014" ref-type="fig">Fig 14</xref>
, Channel 3 exhibited a mean error rate 4.1% below the global mean error rate of all channels. Analyzing it by speaker, the best result was achieved for EMG channel 4 of speaker 3 with 5.1% below the mean. A noticeable result was also found for EMG channel 1 of speaker 2, obtaining results below the mean error rate. However, for these speakers, the 95% confidence interval was considerably higher, showing some instability in the results.</p>
<fig id="pone.0127040.g014" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g014</object-id>
<label>Fig 14</label>
<caption>
<title>Difference between the mean error rate of all channels and the respective result of each channel for all (left) and each (right) speaker.</title>
<p>Error bars show a 95% confidence interval.</p>
</caption>
<graphic xlink:href="pone.0127040.g014"></graphic>
</fig>
<p>When looking at the results, grouped by nasal vowel, as shown in
<xref rid="pone.0127040.t003" ref-type="table">Table 3</xref>
, an improvement can be noticed, particularly for the [u~] case, with a 27.5% mean error rate using EMG channel 3.</p>
<table-wrap id="pone.0127040.t003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t003</object-id>
<label>Table 3</label>
<caption>
<title>Mean error rate grouped by nasal vowel.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t003g" xlink:href="pone.0127040.t003"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG Channel</bold>
</th>
<th align="left" rowspan="1" colspan="1">[6~p6] [p6~p6] [p6~]</th>
<th align="left" rowspan="1" colspan="1">[e~p6] [pe~p6] [pe~]</th>
<th align="left" rowspan="1" colspan="1">[i~p6] [pi~p6] [pi~]</th>
<th align="left" rowspan="1" colspan="1">[o~p6] [po~p6] [po~]</th>
<th align="left" rowspan="1" colspan="1">[u~p6] [pu~p6] [pu~]</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1</italic>
</td>
<td align="left" rowspan="1" colspan="1">36.2%</td>
<td align="left" rowspan="1" colspan="1">33.6%</td>
<td align="left" rowspan="1" colspan="1">38.7%</td>
<td align="left" rowspan="1" colspan="1">
<bold>32.9%</bold>
</td>
<td align="left" rowspan="1" colspan="1">35.6%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>34.2%</bold>
</td>
<td align="left" rowspan="1" colspan="1">33.9%</td>
<td align="left" rowspan="1" colspan="1">34.6%</td>
<td align="left" rowspan="1" colspan="1">
<bold>33.5%</bold>
</td>
<td align="left" rowspan="1" colspan="1">29.9%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3</italic>
</td>
<td align="left" rowspan="1" colspan="1">39.8%</td>
<td align="left" rowspan="1" colspan="1">31.4%</td>
<td align="left" rowspan="1" colspan="1">35.8%</td>
<td align="left" rowspan="1" colspan="1">
<bold>29.4%</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>27.5%</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>4</italic>
</td>
<td align="left" rowspan="1" colspan="1">38.8%</td>
<td align="left" rowspan="1" colspan="1">
<bold>28.6%</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>32.8%</bold>
</td>
<td align="left" rowspan="1" colspan="1">35.1%</td>
<td align="left" rowspan="1" colspan="1">28.1%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>5</italic>
</td>
<td align="left" rowspan="1" colspan="1">39.5%</td>
<td align="left" rowspan="1" colspan="1">36.8%</td>
<td align="left" rowspan="1" colspan="1">36.1%</td>
<td align="left" rowspan="1" colspan="1">33.5%</td>
<td align="left" rowspan="1" colspan="1">35.0%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Mean</italic>
</td>
<td align="left" rowspan="1" colspan="1">37.7%</td>
<td align="left" rowspan="1" colspan="1">32.9%</td>
<td align="left" rowspan="1" colspan="1">35.6%</td>
<td align="left" rowspan="1" colspan="1">32.9%</td>
<td align="left" rowspan="1" colspan="1">31.2%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>To assess if any advantage could be extracted from using channel combination, to improve classification, we have also experienced classification with multiple EMG channels. The most relevant combinations are shown in
<xref rid="pone.0127040.t004" ref-type="table">Table 4</xref>
. The best results for all speakers and for each speaker individually were worse than the ones obtained previously.</p>
<table-wrap id="pone.0127040.t004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t004</object-id>
<label>Table 4</label>
<caption>
<title>Mean error rate using multiple channels combinations.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t004g" xlink:href="pone.0127040.t004"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG Channel</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>All speakers</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker 1</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker 2</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker 3</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1 + 3</italic>
</td>
<td align="left" rowspan="1" colspan="1">35.0%</td>
<td align="left" rowspan="1" colspan="1">30.1%</td>
<td align="left" rowspan="1" colspan="1">
<bold>24.7%</bold>
</td>
<td align="left" rowspan="1" colspan="1">29.5%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2 + 3</italic>
</td>
<td align="left" rowspan="1" colspan="1">36.3%</td>
<td align="left" rowspan="1" colspan="1">
<bold>28.0%</bold>
</td>
<td align="left" rowspan="1" colspan="1">31.3%</td>
<td align="left" rowspan="1" colspan="1">33.9%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2 + 4</italic>
</td>
<td align="left" rowspan="1" colspan="1">34.9%</td>
<td align="left" rowspan="1" colspan="1">31.4%</td>
<td align="left" rowspan="1" colspan="1">30.2%</td>
<td align="left" rowspan="1" colspan="1">33.0%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3 + 4</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>32.9%</bold>
</td>
<td align="left" rowspan="1" colspan="1">28.7%</td>
<td align="left" rowspan="1" colspan="1">33.0%</td>
<td align="left" rowspan="1" colspan="1">
<bold>27.6%</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2 + 3 + 4</italic>
</td>
<td align="left" rowspan="1" colspan="1">35.7%</td>
<td align="left" rowspan="1" colspan="1">29.0%</td>
<td align="left" rowspan="1" colspan="1">33.5%</td>
<td align="left" rowspan="1" colspan="1">32.1%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1 + 3 + 4 + 5</italic>
</td>
<td align="left" rowspan="1" colspan="1">39.1%</td>
<td align="left" rowspan="1" colspan="1">36.3%</td>
<td align="left" rowspan="1" colspan="1">32.5%</td>
<td align="left" rowspan="1" colspan="1">34.9%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<sec id="sec017">
<title>Statistical Analysis</title>
<p>To assess significant differences among the EMG channels’ error rate, we performed a repeated-measures ANOVA considering a pair-wise comparison between EMG channels. The results, summarized in
<xref rid="pone.0127040.t005" ref-type="table">Table 5</xref>
, show as significant (p < 0.05) the differences between EMG channel 3 and the remaining channels and between channels 2 and 4. Regarding the other channel pairs no significant differences were found. For some EMG channel pairs, statistically significant differences were found among speakers. In this analysis, no serious violations of normality and homogeneity of variance were found, and sphericity was guaranteed.</p>
<table-wrap id="pone.0127040.t005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t005</object-id>
<label>Table 5</label>
<caption>
<title>Results of the repeated-measures ANOVA analysis for the EMG channel pairs that attained significance level.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t005g" xlink:href="pone.0127040.t005"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG Channel Pair</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>F(1, 27)</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>p-Value</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<bold>Speaker Effect</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3 vs 1</italic>
</td>
<td align="left" rowspan="1" colspan="1">16.112</td>
<td align="left" rowspan="1" colspan="1">p<0.001</td>
<td align="left" rowspan="1" colspan="1">Significant</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3 vs 2</italic>
</td>
<td align="left" rowspan="1" colspan="1">27.532</td>
<td align="left" rowspan="1" colspan="1">p<0.001</td>
<td align="left" rowspan="1" colspan="1">Significant</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3 vs 4</italic>
</td>
<td align="left" rowspan="1" colspan="1">6.603</td>
<td align="left" rowspan="1" colspan="1">p = 0.016</td>
<td align="left" rowspan="1" colspan="1">Non significant</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3 vs 5</italic>
</td>
<td align="left" rowspan="1" colspan="1">16.008</td>
<td align="left" rowspan="1" colspan="1">p<0.001</td>
<td align="left" rowspan="1" colspan="1">Non significant</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2 vs 4</italic>
</td>
<td align="left" rowspan="1" colspan="1">7.394</td>
<td align="left" rowspan="1" colspan="1">p = 0.011</td>
<td align="left" rowspan="1" colspan="1">Significant</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
</sec>
<sec id="sec018">
<title>Nasal and Non-nasal Zone Classification</title>
<p>The information extracted from the RT-MRI signal, although not possible to use in a real classification scenario, allows further exploring and validating of our methodology. As described in section 0, the RT-MRI information allowed us to split the EMG signal into nasal and non-nasal zones. Thus, by knowing the zone boundaries in the EMG signal, we could conduct a classification experiment based on the majority of nasal/non-nasal frames of a certain zone. Assuming a decision by majority, a zone is nasal if the number of nasal frames is equal or higher than the number of non-nasal frames. Results using this technique are depicted in
<xref rid="pone.0127040.t006" ref-type="table">Table 6</xref>
. These results showed an absolute improvement of 11.0% when compared to what was achieved earlier using a single frame of the signal. When looking at a specific part of the zone, the results were still better than the ones achieved previously. The most important information seemed to be located in the initial half of each zone, since an accuracy degradation trend was observed for all channels when the initial part of the zone was not considered. When considering only the nasal zones, the error rate reached 12.6% for EMG channel 4 and the same trend of better results in the initial part of the zone was also verified, as depicted in
<xref rid="pone.0127040.t007" ref-type="table">Table 7</xref>
. When taking into account only the non-nasal zones we observed the best results in EMG channel 3, using only the central part of the non-nasal zones (i.e. the 25%-75% interval) with 25.0% mean error rate.</p>
<table-wrap id="pone.0127040.t006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t006</object-id>
<label>Table 6</label>
<caption>
<title>Mean error rates using a classification technique based on the majority of nasal/non-nasal frames for each zone.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t006g" xlink:href="pone.0127040.t006"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th colspan="4" align="center" rowspan="1">
<bold>Part of the zone considered</bold>
</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG Channel</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[0–100%]</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[0%-50%]</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[25%-75%]</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[50%-100%]</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1</italic>
</td>
<td align="left" rowspan="1" colspan="1">25.2%</td>
<td align="left" rowspan="1" colspan="1">28.0%</td>
<td align="left" rowspan="1" colspan="1">24.3%</td>
<td align="left" rowspan="1" colspan="1">
<bold>27.7%</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>21.5%</bold>
</td>
<td align="left" rowspan="1" colspan="1">25.2%</td>
<td align="left" rowspan="1" colspan="1">28.7%</td>
<td align="left" rowspan="1" colspan="1">32.2%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3</italic>
</td>
<td align="left" rowspan="1" colspan="1">25.7%</td>
<td align="left" rowspan="1" colspan="1">27.1%</td>
<td align="left" rowspan="1" colspan="1">28.7%</td>
<td align="left" rowspan="1" colspan="1">33.7%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>4</italic>
</td>
<td align="left" rowspan="1" colspan="1">24.3%</td>
<td align="left" rowspan="1" colspan="1">
<bold>22.9%</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>23.3%</bold>
</td>
<td align="left" rowspan="1" colspan="1">31.7%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>5</italic>
</td>
<td align="left" rowspan="1" colspan="1">32.2%</td>
<td align="left" rowspan="1" colspan="1">29.0%</td>
<td align="left" rowspan="1" colspan="1">30.7%</td>
<td align="left" rowspan="1" colspan="1">36.1%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Mean</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>25.8%</bold>
</td>
<td align="left" rowspan="1" colspan="1">26.5%</td>
<td align="left" rowspan="1" colspan="1">27.1%</td>
<td align="left" rowspan="1" colspan="1">32.3%</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t006fn001">
<p>Each column of the table shows which part of the zone is being considered, where 0% represents the zone start and 100% the zone end (e.g. in 50%-100% interval only the samples in the last half of each zone are being considered).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="pone.0127040.t007" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t007</object-id>
<label>Table 7</label>
<caption>
<title>Mean error rates using a classification technique based on the majority of nasal/non-nasal frames for each nasal zone.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t007g" xlink:href="pone.0127040.t007"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th colspan="4" align="center" rowspan="1">
<bold>Part of the NASAL zone considered</bold>
</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG Channel</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[0–100%]</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[0%-50%]</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[25%-75%]</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>[50%-100%]</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1</italic>
</td>
<td align="char" char="." rowspan="1" colspan="1">18.5%</td>
<td align="char" char="." rowspan="1" colspan="1">18.5%</td>
<td align="char" char="." rowspan="1" colspan="1">20.5%</td>
<td align="char" char="." rowspan="1" colspan="1">22.8%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2</italic>
</td>
<td align="char" char="." rowspan="1" colspan="1">14.1%</td>
<td align="char" char="." rowspan="1" colspan="1">19.3%</td>
<td align="char" char="." rowspan="1" colspan="1">26.0%</td>
<td align="char" char="." rowspan="1" colspan="1">22.8%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3</italic>
</td>
<td align="char" char="." rowspan="1" colspan="1">21.5%</td>
<td align="char" char="." rowspan="1" colspan="1">23.7%</td>
<td align="char" char="." rowspan="1" colspan="1">29.9%</td>
<td align="char" char="." rowspan="1" colspan="1">26.0%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>4</italic>
</td>
<td align="char" char="." rowspan="1" colspan="1">
<bold>12.6%</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">
<bold>15.6%</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">
<bold>13.4%</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">
<bold>16.5%</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>5</italic>
</td>
<td align="char" char="." rowspan="1" colspan="1">23.7%</td>
<td align="char" char="." rowspan="1" colspan="1">24.4%</td>
<td align="char" char="." rowspan="1" colspan="1">22.1%</td>
<td align="char" char="." rowspan="1" colspan="1">27.6%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Mean</italic>
</td>
<td align="char" char="." rowspan="1" colspan="1">
<bold>18.1%</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">20.3%</td>
<td align="char" char="." rowspan="1" colspan="1">22.4%</td>
<td align="char" char="." rowspan="1" colspan="1">23.2%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
<sec id="sec019">
<title>Reproducibility Assessment</title>
<p>The EMG signal varies across recording sessions, even for the same speaker. For that reason, we assessed the effect of such variability in our study. Using 4 additional sessions from speaker 1, recorded at different times, we have conducted the same analysis as presented in the previous sections. The results for all sessions were very similar, showing evidence that the existing variability among sessions had no major influence on the results.</p>
<p>Considering a distribution of 2448 total frames, where 47.0% are nasal and 53.0% are non-nasal, we observe a best error rate of 26.9% for EMG channel 3, as depicted in
<xref rid="pone.0127040.g015" ref-type="fig">Fig 15</xref>
. Sensitivity and specificity results for these additional sessions can be found in
<xref rid="pone.0127040.t008" ref-type="table">Table 8</xref>
: channel 3 presents the highest values with 76.8% and 69.0% respectively.</p>
<fig id="pone.0127040.g015" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.g015</object-id>
<label>Fig 15</label>
<caption>
<title>Classification results (mean value of the 10-fold for error rate, sensitivity and specificity) for all channels of speaker 1.</title>
<p>These results are based on four additional sessions from this speaker recorded a posteriori. Error bars show a 95% confidence interval.</p>
</caption>
<graphic xlink:href="pone.0127040.g015"></graphic>
</fig>
<table-wrap id="pone.0127040.t008" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0127040.t008</object-id>
<label>Table 8</label>
<caption>
<title>Mean sensitivity and specificity measures (%) with a 95% confidence interval for each EMG channel of speaker 1.</title>
</caption>
<alternatives>
<graphic id="pone.0127040.t008g" xlink:href="pone.0127040.t008"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">
<bold>EMG channel</bold>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Specificity</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Sensitivity</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>1</italic>
</td>
<td align="left" rowspan="1" colspan="1">61.6±4.5</td>
<td align="left" rowspan="1" colspan="1">71.9±1.7</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>2</italic>
</td>
<td align="left" rowspan="1" colspan="1">61.2±3.5</td>
<td align="left" rowspan="1" colspan="1">65.1±3.4</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>3</italic>
</td>
<td align="left" rowspan="1" colspan="1">69.0±2.0</td>
<td align="left" rowspan="1" colspan="1">76.8±1.5</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>4</italic>
</td>
<td align="left" rowspan="1" colspan="1">59.5±3.1</td>
<td align="left" rowspan="1" colspan="1">68.6±2.6</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>5</italic>
</td>
<td align="left" rowspan="1" colspan="1">55.7±3.4</td>
<td align="left" rowspan="1" colspan="1">60.8±2.0</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t008fn001">
<p>These results are based on four additional sessions from this speaker recorded a posteriori.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Taking into consideration that the dataset distribution presents a higher amount of nasal frames, if we compare the results with the ones reported in the previous sections, a slightly higher error rate can be noticed, particularly for channels 4 and 5. For EMG channel 3, the best in both conditions, the difference is very small (1.2% absolute value).</p>
</sec>
</sec>
<sec sec-type="conclusions" id="sec020">
<title>Discussion</title>
<p>The global results of this study point to the fact that the selected approach can be used to reduce the error rate caused by nasality in languages such as EP where this characteristic is particularly relevant. Looking at the results of our analysis, there is a noticeable trend that points to the electrode pairs placed below the ear between the mastoid process and the mandible in the upper neck area as being the sensors with less crosstalk and as being the most promising for detecting the myoelectric signal generated by the velum movement. In a first stage, when overlaying the aligned RT-MRI and EMG signals the obtained matching is more evident for channels 2, 3 and 4, particularly for nasal vowels in medial and final word positions. However, when looking at the close vowel case and at the vowels in an isolated context, EMG channels 3 and 4 emerge as the most likely. In the isolated context, channel 1 also presents interesting results, showing that a more accurate filtering might be necessary to obtain the desired signals from this channel. In the case of EMG channel 5, a signal matching the expected tongue movement can be noticed, making it unclear if velar information is actually being captured. Still in the visual analysis, the fact that a more evident matching is obtained for the medial and final word position suggests that it would be interesting to further analyze the differences between nasal vowels positions. The word-final context only requires an opening movement, while a word-initial position could conceivably in some cases require only a closing movement. At least one could say that any opening movement is under weak temporal constraints. The word-internal context requires a clear opening-closing movement under strict temporal constraints given by the linguistically appropriate duration between the flanking plosives. Thus it is perhaps not surprising that the latter context gives clear results. However, different contexts, ideally where no jaw movement is required, should be considered to discard or minimize eventual muscle crosstalk.</p>
<p>In our data analysis, using a different approach, we have also compared the information present in the nasal zones of the EMG signal with the velar information of the RT-MRI signal using mutual information to measure the relation between the signals. The higher results were found in EMG channels 3, 4 and 5, being aligned with our visual analysis of the signal.</p>
<p>The overall results seem to indicate that, for these speakers, velum information is actually being captured by the EMG sensors, however, it is not clear which muscles are being measured. A more detailed discussion about the classification results and the reproducibility of this study is presented in the following subsections.</p>
<sec id="sec021">
<title>Classification</title>
<p>Following our data analysis we investigated if it was possible to develop a classifier that enabled us to distinguish between nasal and non-nasal frames of an EMG signal in a realistic frame independent scenario. The results for this scenario showed that it is possible to distinguish between the two frame classes, following the results from our previous analysis, particularly in EMG channels 2, 3 and 4. Specificity and sensitivity measures found for these channels are also slightly higher when compared with the remaining channels, showing that more positive/negative results were accurately found. When looking at the results of the classifier per speaker an improvement in the results can be noticed, showing a better modelling of our classification task. In this case, EMG channel 3 presents the best results for all speakers. EMG channel 1 and 4 present substantial differences between speakers and recording sessions, showing that further exploration is required, particularly for channel 1, which, due to its position near several bone structures, may present high sensitivity regarding sensor position or anatomic structure in that area for tracking this set of muscles.</p>
<p>The statistical analysis also shows significant differences can be found between the results of EMG channel 3 and the remaining EMG channels, in line with what was found above, where EMG channel 3 emerges as the channel with the best results.</p>
<p>In our study, we have also attempted to combine multiple channels, however, this did not improve the obtained results. This can be an indication of the following: (1) the muscles’ signals captured by the EMG channels overlap, thus, it does not improve the nasal/non-nasal class separation; (2) due to the superimposition of muscles in that area, adding a new channel with information of other muscles will create a less accurate model of the classes; (3) the muscles related with velum movement are not being captured by the added channel(s).</p>
<p>In a scenario where each zone is classified based on the majority frame type (nasal or non-nasal), we find a noteworthy improvement in the accuracy rates, reaching 12.6% for all users. Although this is not a realistic scenario in the sense that an interface using surface EMG to detect velum movement would not know
<italic>a priori</italic>
the nasal zone boundary, these results allow us to understand that the use of neighboring frames and introducing frame context might help to improve the accuracy of our methodology. It is also interesting to note the higher error rate in non-nasal frames and the fact that better results are obtained not using the whole zone but only the central part of each zone.</p>
<p>An alternative approach to the developed classifier would be to detect velum movement events, since EMG activity is most likely to be found when the position of the velum needs to be changed. However, based on the results obtained in this study and the physiology of the area in question, it does not seem likely that a position without other muscle crosstalk can be found. As such, an event-based approach would need to find a way to avoid false positives originating from neighboring muscles.</p>
</sec>
<sec id="sec022">
<title>Reproducibility</title>
<p>One aspect that deserves attention concerns how inter- and intra-speaker variability might influence the gathered results. Regarding inter-speaker variability, the first issue to discuss is if the number of speakers is adequate. In this regard, we argue that, for the purpose of this exploratory study, three speakers encompass enough variability because of the following:: 1) They yield some anatomical and physiological variation that, to some extent, influence how sensors are placed, considering the landmarks and criteria we describe in section 2.4.1; 2) Since nothing was imposed on this matter, the speakers have different/varying speech rates, which introduce variability in the EMG-MRI alignment; and 3) the corpus includes nasals in different word positions and contexts, produced along one hour recording sessions, providing a large amount of nasality data per speaker. Considering these variability factors, and the results obtained for the error rates, similar among speakers, we believe that inter-speaker variability is reasonably addressed. Nevertheless, a larger number of speakers would be desirable, but a very risky option at this time, considering that there was no source in the literature to support the viability of this study and RT-MRI acquisitions are very expensive. Note that we do not intend to generalize the interpretation of the results presented in this study. The well know variability across speakers, found in EMG signals [
<xref rid="pone.0127040.ref039" ref-type="bibr">39</xref>
,
<xref rid="pone.0127040.ref040" ref-type="bibr">40</xref>
], and the number of speakers involved, precludes such conclusion. The extent to which the EMG is able to capture velum information may vary among speakers. Nevertheless, the results gathered provide enough evidence that it is possible to do it, and contribute with a set of EMG sensor positions. This allows moving to the next stage and pursue the research with a larger number of speakers. In terms of classification results we noticed a stable trend for EMG channel 3. Regarding the other channels, the differences in terms of classification results may be explained by the physiological differences between speakers. For example, the EMG channel 1 was placed in an area near to several structures (e.g. the inferior maxillary bone) that may interfere with the signal from deeper muscles. Thus, for these channels it is not possible to draw a stable conclusion.</p>
<p>On the subject of intra-speaker variability, additional sessions from a random speaker recorded a posteriori, support the previously achieved results, further evidencing that, for the speaker in question, the position selected for EMG channel 3 attains the best results. In that sense, intra-speaker variability, sometimes found in EMG sensors placed in the regions of the face and neck [
<xref rid="pone.0127040.ref040" ref-type="bibr">40</xref>
], did not have a strong impact on the results for this speaker. The fact that it was always the same experienced person placing the EMG sensors, using the same references, and the same hardware, may have contributed to lower the variability between recording sessions of the same speaker.</p>
</sec>
</sec>
<sec id="sec023">
<title>Conclusions and Future work</title>
<p>The work here presented used two distinct sources of information–surface EMG and RT-MRI–in order to address the challenge of nasality detection in EMG-based silent speech interfaces. The information extracted from the RT-MRI images allowed us to know when to expect nasal information. Thus, by synchronizing both signals, based on simultaneously recorded audio signals from the same speaker, we were able to explore the existence of useful information in the EMG signal about the velum movement. The positioning of the surface EMG electrode was based on probable locations for detecting the targeted muscles.</p>
<p>Our results point to the possibility of developing an SSI based on surface EMG for EP that detects the muscles associated with the movement of the velum, as well as providing a background for future studies in terms of sensor positioning. The results of this study show that in a real use situations error rates of 23.7% can be achieved for sensors positioned below the ear between the mastoid process and the mandible in the upper neck region, and that careful articulation and positioning of the sensors may influence nasal vowel detection results.</p>
<p>The presented outcomes indicate that the described approach can be a valuable contribution to the state-of-the-art in the area of EMG speech interfaces, and that it can be applied in parallel with other techniques. Additionally, although the methodology used in this study partially relies on RT-MRI information for scientific substantiation, a technology which requires a complex and expensive setup, the proposed solution to detect nasality is solely based on a single, non-invasive sensor of surface EMG. Thus, the development of an SSI based on EMG for EP, with language adapted sensor positioning, now seems to be a possibility.</p>
<p>For future work, we would like to extend this study by including more speakers and nasal consonants, which was not possible due to the lack of RT-MRI information for that case. We also intend to analyze other non-invasive approaches that may be able to detect nasality, to be used in parallel with surface EMG, such as ultrasonic Doppler sensing [
<xref rid="pone.0127040.ref041" ref-type="bibr">41</xref>
]. Furthermore, a solution for an accurate and replicable positioning of surface EMG sensors needs to be considered. We would also like to further explore the differences in the resulting EMG patterns of the three nasal vowels’ position (i.e. nasal vowel in initial, internal and final position) and to consider an event-based classification model, which would detect velum movement.</p>
</sec>
<sec sec-type="supplementary-material" id="sec024">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pone.0127040.s001">
<label>S1 Data</label>
<caption>
<title>Zip file with the RT-MRI velum information, surface EMG data for the five channels and the synchronized signals of all speakers.</title>
<p>(ZIP)</p>
</caption>
<media xlink:href="pone.0127040.s001.zip">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank the experiment participants.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0127040.ref001">
<label>1</label>
<mixed-citation publication-type="book">
<name>
<surname>Huang</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Acero</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Hon H-</surname>
<given-names>W</given-names>
</name>
(
<year>2001</year>
)
<chapter-title>Spoken language processing</chapter-title>
<publisher-name>Prentice Hall</publisher-name>
<publisher-loc>Englewood Cliffs</publisher-loc>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref002">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Flynn</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Jones</surname>
<given-names>E</given-names>
</name>
(
<year>2008</year>
)
<article-title>Combined speech enhancement and auditory modelling for robust distributed speech recognition</article-title>
.
<source>Speech Commun</source>
<volume>50</volume>
:
<fpage>797</fpage>
<lpage>809</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref003">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Freng</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Ramabhadran</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Hansen</surname>
<given-names>JHL</given-names>
</name>
,
<name>
<surname>Williams</surname>
<given-names>JD</given-names>
</name>
(
<year>2012</year>
)
<article-title>Trends in speech and language processing [in the spotlight]</article-title>
.
<source>Signal Process Mag IEEE</source>
<volume>29</volume>
:
<fpage>177</fpage>
<lpage>179</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref004">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Denby</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Schultz</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Honda</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Hueber</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Gilbert</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Brumberg</surname>
<given-names>JS</given-names>
</name>
(
<year>2010</year>
)
<article-title>Silent speech interfaces</article-title>
.
<source>Speech Commun</source>
<volume>52</volume>
:
<fpage>270</fpage>
<lpage>287</lpage>
. Available:
<ext-link ext-link-type="uri" xlink:href="http://linkinghub.elsevier.com/retrieve/pii/S0167639309001307">http://linkinghub.elsevier.com/retrieve/pii/S0167639309001307</ext-link>
. Accessed: 2013 Dec 4.</mixed-citation>
</ref>
<ref id="pone.0127040.ref005">
<label>5</label>
<mixed-citation publication-type="other">Wand M, Schultz T (2011) Session-independent EMG-based Speech Recognition. International Conference on Bio-Inspired Systems and Signal Processing (BIOSIGNALS 2011). pp. 295–300.</mixed-citation>
</ref>
<ref id="pone.0127040.ref006">
<label>6</label>
<mixed-citation publication-type="other">Heistermann T, Janke M, Wand M, Schultz T (2014) Spatial Artifact Detection for Multi-Channel EMG-Based Speech Recognition. Int Conf Bio-Inspired Syst Signal Process: 189–196.</mixed-citation>
</ref>
<ref id="pone.0127040.ref007">
<label>7</label>
<mixed-citation publication-type="other">Teixeira A (2000) Síntese Articulatória das Vogais Nasais do Português Europeu [Articulatory Synthesis of Nasal Vowels for European Portuguese] PhD Thesis, Universidade de Aveiro.</mixed-citation>
</ref>
<ref id="pone.0127040.ref008">
<label>8</label>
<mixed-citation publication-type="other">Almeida A (1976) The Portuguese nasal vowels: Phonetics and phonemics. In: Schmidt-Radefelt J, editor. Readings in Portuguese Linguistics. Amsterdam. pp. 348–396.</mixed-citation>
</ref>
<ref id="pone.0127040.ref009">
<label>9</label>
<mixed-citation publication-type="other">Freitas J, Teixeira A, Dias MS (2012) Towards a Silent Speech Interface for Portuguese: Surface Electromyography and the nasality challenge. International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS 2012). pp. 91–100.</mixed-citation>
</ref>
<ref id="pone.0127040.ref010">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lacerda</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Head</surname>
<given-names>B</given-names>
</name>
(
<year>1966</year>
)
<article-title>Análise de sons nasais e sons nasalizados do português</article-title>
.
<source>Rev do Laboratório Fonética Exp Coimbra</source>
<volume>6</volume>
:
<fpage>5</fpage>
<lpage>70</lpage>
.
<pub-id pub-id-type="pmid">12870069</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref011">
<label>11</label>
<mixed-citation publication-type="book">
<name>
<surname>Sampson</surname>
<given-names>R</given-names>
</name>
(
<year>1999</year>
)
<chapter-title>Nasal Vowel Evolution in Romance</chapter-title>
<publisher-loc>Oxford</publisher-loc>
:
<publisher-name>Oxford University Press</publisher-name>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref012">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Beddor</surname>
<given-names>PS</given-names>
</name>
(
<year>1993</year>
)
<article-title>The perception of nasal vowels</article-title>
.
<source>Nasals, nasalization, and the velum</source>
<volume>5</volume>
:
<fpage>171</fpage>
<lpage>196</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref013">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Fritzell</surname>
<given-names>B</given-names>
</name>
(
<year>1969</year>
)
<article-title>The velopharyngeal muscles in speech: An electromyographic and cineradiographic study</article-title>
.
<source>Acta Otolaryngolica</source>
<volume>50</volume>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref014">
<label>14</label>
<mixed-citation publication-type="book">
<name>
<surname>Hardcastle</surname>
<given-names>WJ</given-names>
</name>
(
<year>1976</year>
)
<chapter-title>Physiology of speech production: an introduction for speech scientists</chapter-title>
<publisher-name>Academic Press</publisher-name>
<publisher-loc>New York</publisher-loc>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref015">
<label>15</label>
<mixed-citation publication-type="book">
<name>
<surname>Seikel</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>King</surname>
<given-names>DW</given-names>
</name>
,
<name>
<surname>Drumright</surname>
<given-names>DG</given-names>
</name>
(
<year>2009</year>
)
<chapter-title>Anatomy and physiology for speech, language, and hearing</chapter-title>
<edition>4th ed.</edition>
<publisher-name>Delmar Learning</publisher-name>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref016">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kuehn</surname>
<given-names>DP</given-names>
</name>
,
<name>
<surname>Folkins</surname>
<given-names>JW</given-names>
</name>
,
<name>
<surname>Linville</surname>
<given-names>RN</given-names>
</name>
(
<year>1988</year>
)
<article-title>An electromyographic study of the musculus uvulae</article-title>
.
<source>Cleft Palate J</source>
<volume>25</volume>
:
<fpage>348</fpage>
<lpage>355</lpage>
.
<pub-id pub-id-type="pmid">3203466</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref017">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bell-Berti</surname>
<given-names>F</given-names>
</name>
(
<year>1976</year>
)
<article-title>An electromyographic study of velopharyngeal function in speech. J Speech</article-title>
,
<source>Lang Hear Res</source>
<volume>19</volume>
:
<fpage>225</fpage>
<lpage>240</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref018">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lubker</surname>
<given-names>JF</given-names>
</name>
(
<year>1968</year>
)
<article-title>An electromyographic-cinefluorographic investigation of velar function during normal speech production</article-title>
.
<source>Cleft Palate J</source>
<volume>5</volume>
:
<fpage>1</fpage>
<lpage>18</lpage>
.
<pub-id pub-id-type="pmid">5235694</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref019">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kuehn</surname>
<given-names>DP</given-names>
</name>
,
<name>
<surname>Folkins</surname>
<given-names>JW</given-names>
</name>
,
<name>
<surname>Cutting</surname>
<given-names>CB</given-names>
</name>
(
<year>1982</year>
)
<article-title>Relationships between muscle activity and velar position</article-title>
.
<source>Cleft Palate J</source>
<volume>19</volume>
:
<fpage>25</fpage>
<lpage>35</lpage>
.
<pub-id pub-id-type="pmid">6948629</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref020">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>McGill</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Juker</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Kropf</surname>
<given-names>P</given-names>
</name>
(
<year>1996</year>
)
<article-title>Appropriately placed surface EMG electrodes reflect deep muscle activity (psoas, quadratus lumborum, abdominal wall) in the lumbar spine</article-title>
.
<source>J Biomech</source>
<volume>29</volume>
:
<fpage>1503</fpage>
<lpage>1507</lpage>
.
<pub-id pub-id-type="pmid">8894932</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref021">
<label>21</label>
<mixed-citation publication-type="other">Teixeira A, Martins P, Oliveira C, Ferreira C, Silva A, Shosted R (2012) Real-time MRI for Portuguese: Database, methods and applications. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7243 LNAI. pp. 306–317.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-28885-2_35">10.1007/978-3-642-28885-2_35</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref022">
<label>22</label>
<mixed-citation publication-type="other">Silva S, Martins P, Oliveira C, Silva A, Teixeira A (2012) Segmentation and Analysis of the Oral and Nasal Cavities from MR Time Sequences, Image Analysis and Recognition. Proc. ICIAR, LNCS. Springer.</mixed-citation>
</ref>
<ref id="pone.0127040.ref023">
<label>23</label>
<mixed-citation publication-type="other">Plux Wireless Biosignals (n.d.). Available: http//
<ext-link ext-link-type="uri" xlink:href="http://www.plux.info/">www.plux.info/</ext-link>
. Accessed: 2014 Oct 30.</mixed-citation>
</ref>
<ref id="pone.0127040.ref024">
<label>24</label>
<mixed-citation publication-type="other">Ellis D (2003) Dynamic time warp (DTW) in Matlab.</mixed-citation>
</ref>
<ref id="pone.0127040.ref025">
<label>25</label>
<mixed-citation publication-type="other">Tiede MK, Masaki S, Vatikiotis-Bateson E (2000) Contrasts in speech articulation observed in sitting and supine conditions. Proceedings of the 5th Seminar on Speech Production, Kloster Seeon, Bavaria. pp. 25–28.</mixed-citation>
</ref>
<ref id="pone.0127040.ref026">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kitamura</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Takemoto</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Honda</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Shimada</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Fujimoto</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Syakudo</surname>
<given-names>Y</given-names>
</name>
,
<etal>et al</etal>
(
<year>2005</year>
)
<article-title>Difference in vocal tract shape between upright and supine postures: Observations by an open-type MRI scanner</article-title>
.
<source>Acoust Sci Technol</source>
<volume>26</volume>
:
<fpage>465</fpage>
<lpage>468</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref027">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Stone</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Stock</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Bunin</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Kumar</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Epstein</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Kambhamettu</surname>
<given-names>C</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>Comparison of speech production in upright and supine position</article-title>
.
<source>J Acoust Soc Am</source>
<volume>122</volume>
:
<fpage>532</fpage>
<lpage>541</lpage>
.
<pub-id pub-id-type="pmid">17614510</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref028">
<label>28</label>
<mixed-citation publication-type="other">Wrench AA, Cleland J, Scobbie JM (2011) An ultrasound protocol for comparing tongue contours: upright vs. supine. Proceedings of 17th ICPhS, Hong Kong. pp. 2161–2164.</mixed-citation>
</ref>
<ref id="pone.0127040.ref029">
<label>29</label>
<mixed-citation publication-type="book">
<name>
<surname>Engwall</surname>
<given-names>O</given-names>
</name>
(
<year>2006</year>
)
<chapter-title>Assessing MRI measurements: Effects of sustenation, gravitation and coarticulation</chapter-title>
In:
<name>
<surname>Harrington</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Tabain</surname>
<given-names>M</given-names>
</name>
, editors.
<source>Speech production: Models, Phonetic Processes and Techniques</source>
.
<publisher-loc>New York</publisher-loc>
:
<publisher-name>Psychology Press</publisher-name>
pp.
<fpage>301</fpage>
<lpage>314</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref030">
<label>30</label>
<mixed-citation publication-type="journal">
<name>
<surname>Perry</surname>
<given-names>JL</given-names>
</name>
(
<year>2011</year>
)
<article-title>Variations in velopharyngeal structures between upright and supine positions using upright magnetic resonance imaging</article-title>
.
<source>Cleft Palate-Craniofacial J</source>
<volume>48</volume>
:
<fpage>123</fpage>
<lpage>133</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1597/09-256">10.1597/09-256</ext-link>
</comment>
<pub-id pub-id-type="pmid">20500077</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref031">
<label>31</label>
<mixed-citation publication-type="journal">
<name>
<surname>Moon</surname>
<given-names>JB</given-names>
</name>
,
<name>
<surname>Canady</surname>
<given-names>JW</given-names>
</name>
(
<year>1995</year>
)
<article-title>Effects of gravity on velopharyngeal muscle activity during speech</article-title>
.
<source>Cleft palate-craniofacial J</source>
<volume>32</volume>
:
<fpage>371</fpage>
<lpage>375</lpage>
.
<pub-id pub-id-type="pmid">7578200</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref032">
<label>32</label>
<mixed-citation publication-type="other">Cover TM, Thomas JA (2005) Elements of Information Theory. 1–748 p.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1002/047174882X">10.1002/047174882X</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref033">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pereda</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Quiroga</surname>
<given-names>RQ</given-names>
</name>
,
<name>
<surname>Bhattacharya</surname>
<given-names>J</given-names>
</name>
(
<year>2005</year>
)
<article-title>Nonlinear multivariate analysis of neurophysiological signals</article-title>
.
<source>Prog Neurobiol</source>
<volume>77</volume>
:
<fpage>1</fpage>
<lpage>37</lpage>
.
<pub-id pub-id-type="pmid">16289760</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref034">
<label>34</label>
<mixed-citation publication-type="book">
<name>
<surname>Yao</surname>
<given-names>YY</given-names>
</name>
(
<year>2003</year>
)
<chapter-title>Information-theoretic measures for knowledge discovery and data mining</chapter-title>
<source>Entropy Measures, Maximum Entropy Principle and Emerging Applications</source>
.
<publisher-name>Springer</publisher-name>
pp.
<fpage>115</fpage>
<lpage>136</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref035">
<label>35</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hudgins</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Parker</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Scott</surname>
<given-names>RN</given-names>
</name>
(
<year>1993</year>
)
<article-title>A new strategy for multifunction myoelectric control</article-title>
.
<source>IEEE Trans Biomed Eng</source>
<volume>40</volume>
:
<fpage>82</fpage>
<lpage>94</lpage>
.
<pub-id pub-id-type="pmid">8468080</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref036">
<label>36</label>
<mixed-citation publication-type="other">Everitt BS, Hothorn T (2009) A Handbook of Statistical Analyses Using R. 360 p. Available:
<ext-link ext-link-type="uri" xlink:href="http://books.google.com/books?hl=en&lr=&id=5-XI_e-9LvYC&pgis=1">http://books.google.com/books?hl=en&lr=&id=5-XI_e-9LvYC&pgis=1</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref037">
<label>37</label>
<mixed-citation publication-type="other">R Development Core Team R (2011) R: A Language and Environment for Statistical Computing. R Found Stat Comput 1: 409. Available:
<ext-link ext-link-type="uri" xlink:href="http://www.r-project.org">http://www.r-project.org</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref038">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Burges</surname>
<given-names>CJC</given-names>
</name>
(
<year>1998</year>
)
<article-title>A tutorial on support vector machines for pattern recognition</article-title>
.
<source>Data Min Knowl Discov</source>
<volume>2</volume>
:
<fpage>121</fpage>
<lpage>167</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0127040.ref039">
<label>39</label>
<mixed-citation publication-type="journal">
<name>
<surname>Winter</surname>
<given-names>DA</given-names>
</name>
,
<name>
<surname>Yack</surname>
<given-names>HJ</given-names>
</name>
(
<year>1987</year>
)
<article-title>EMG profiles during normal human walking: stride-to-stride and inter-subject variability</article-title>
.
<source>Electroencephalogr Clin Neurophysiol</source>
<volume>67</volume>
:
<fpage>402</fpage>
<lpage>411</lpage>
. Available:
<ext-link ext-link-type="uri" xlink:href="http://www.sciencedirect.com/science/article/pii/0013469487900034">http://www.sciencedirect.com/science/article/pii/0013469487900034</ext-link>
. Accessed: 2014 Dec 26.
<pub-id pub-id-type="pmid">2444408</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0127040.ref040">
<label>40</label>
<mixed-citation publication-type="other">Maier-Hein L, Metze F, Schultz T, Waibel A (2005) Session independent non-audible speech recognition using surface electromyography. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2005). pp. 331–336. Available: ://000235936000064.</mixed-citation>
</ref>
<ref id="pone.0127040.ref041">
<label>41</label>
<mixed-citation publication-type="book">
<name>
<surname>Freitas</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Teixeira</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Vaz</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Dias</surname>
<given-names>MS</given-names>
</name>
(
<year>2012</year>
)
<chapter-title>Automatic Speech Recognition Based on Ultrasonic Doppler Sensing for European Portuguese</chapter-title>
<source>Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science</source>
.
<publisher-name>Springer</publisher-name>
<publisher-loc>Berlin Heidelberg</publisher-loc>
, Vol.
<volume>328</volume>
pp.
<fpage>227</fpage>
<lpage>236</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-35292-8_24">10.1007/978-3-642-35292-8_24</ext-link>
</comment>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000032 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000032 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4466523
   |texte=   Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26069968" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024