Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Intuitive visualizations of pitch and loudness in speech

Identifieur interne : 000050 ( Pmc/Corpus ); précédent : 000049; suivant : 000051

Intuitive visualizations of pitch and loudness in speech

Auteurs : Rebecca S. Schaefer ; Lilian J. Beijer ; Wiel Seuskens ; Toni C. M. Rietveld ; Makiko Sadakata

Source :

RBID : PMC:4828474

Abstract

Visualizing acoustic features of speech has proven helpful in speech therapy; however, it is as yet unclear how to create intuitive and fitting visualizations. To better understand the mappings from speech sound aspects to visual space, a large web-based experiment (n = 249) was performed to evaluate spatial parameters that may optimally represent pitch and loudness of speech. To this end, five novel animated visualizations were developed and presented in pairwise comparisons, together with a static visualization. Pitch and loudness of speech were each mapped onto either the vertical (y-axis) or the size (z-axis) dimension, or combined (with size indicating loudness and vertical position indicating pitch height) and visualized as an animation along the horizontal dimension (x-axis) over time. The results indicated that firstly, there is a general preference towards the use of the y-axis for both pitch and loudness, with pitch ranking higher than loudness in terms of fit. Secondly, the data suggest that representing both pitch and loudness combined in a single visualization is preferred over visualization in only one dimension. Finally, the z-axis, although not preferred, was evaluated as corresponding better to loudness than to pitch. This relation between sound and visual space has not been reported previously for speech sounds, and elaborates earlier findings on musical material. In addition to elucidating more general mappings between auditory and visual modalities, the findings provide us with a method of visualizing speech that may be helpful in clinical applications such as computerized speech therapy, or other feedback-based learning paradigms.


Url:
DOI: 10.3758/s13423-015-0934-0
PubMed: 26370217
PubMed Central: 4828474

Links to Exploration step

PMC:4828474

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Intuitive visualizations of pitch and loudness in speech</title>
<author>
<name sortKey="Schaefer, Rebecca S" sort="Schaefer, Rebecca S" uniqKey="Schaefer R" first="Rebecca S." last="Schaefer">Rebecca S. Schaefer</name>
<affiliation>
<nlm:aff id="Aff1">Health, Medical and Neuropsychology Unit, Institute of Psychology, Leiden University, Leiden, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">Leiden Institute for Brain and Cognition (LIBC), Leiden University, Leiden, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Beijer, Lilian J" sort="Beijer, Lilian J" uniqKey="Beijer L" first="Lilian J." last="Beijer">Lilian J. Beijer</name>
<affiliation>
<nlm:aff id="Aff3">Sint Maartenskliniek Research, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Seuskens, Wiel" sort="Seuskens, Wiel" uniqKey="Seuskens W" first="Wiel" last="Seuskens">Wiel Seuskens</name>
<affiliation>
<nlm:aff id="Aff3">Sint Maartenskliniek Research, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Donders Institute for Brain, Cognition and Behaviour, Centre for Cognition, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rietveld, Toni C M" sort="Rietveld, Toni C M" uniqKey="Rietveld T" first="Toni C. M." last="Rietveld">Toni C. M. Rietveld</name>
<affiliation>
<nlm:aff id="Aff3">Sint Maartenskliniek Research, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Department of Linguistics, Radboud University Nijmegen, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sadakata, Makiko" sort="Sadakata, Makiko" uniqKey="Sadakata M" first="Makiko" last="Sadakata">Makiko Sadakata</name>
<affiliation>
<nlm:aff id="Aff4">Donders Institute for Brain, Cognition and Behaviour, Centre for Cognition, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff6">The Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26370217</idno>
<idno type="pmc">4828474</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4828474</idno>
<idno type="RBID">PMC:4828474</idno>
<idno type="doi">10.3758/s13423-015-0934-0</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000050</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000050</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Intuitive visualizations of pitch and loudness in speech</title>
<author>
<name sortKey="Schaefer, Rebecca S" sort="Schaefer, Rebecca S" uniqKey="Schaefer R" first="Rebecca S." last="Schaefer">Rebecca S. Schaefer</name>
<affiliation>
<nlm:aff id="Aff1">Health, Medical and Neuropsychology Unit, Institute of Psychology, Leiden University, Leiden, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">Leiden Institute for Brain and Cognition (LIBC), Leiden University, Leiden, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Beijer, Lilian J" sort="Beijer, Lilian J" uniqKey="Beijer L" first="Lilian J." last="Beijer">Lilian J. Beijer</name>
<affiliation>
<nlm:aff id="Aff3">Sint Maartenskliniek Research, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Seuskens, Wiel" sort="Seuskens, Wiel" uniqKey="Seuskens W" first="Wiel" last="Seuskens">Wiel Seuskens</name>
<affiliation>
<nlm:aff id="Aff3">Sint Maartenskliniek Research, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">Donders Institute for Brain, Cognition and Behaviour, Centre for Cognition, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rietveld, Toni C M" sort="Rietveld, Toni C M" uniqKey="Rietveld T" first="Toni C. M." last="Rietveld">Toni C. M. Rietveld</name>
<affiliation>
<nlm:aff id="Aff3">Sint Maartenskliniek Research, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Department of Linguistics, Radboud University Nijmegen, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sadakata, Makiko" sort="Sadakata, Makiko" uniqKey="Sadakata M" first="Makiko" last="Sadakata">Makiko Sadakata</name>
<affiliation>
<nlm:aff id="Aff4">Donders Institute for Brain, Cognition and Behaviour, Centre for Cognition, Nijmegen, The Netherlands</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff6">The Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Psychonomic Bulletin & Review</title>
<idno type="ISSN">1069-9384</idno>
<idno type="eISSN">1531-5320</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Visualizing acoustic features of speech has proven helpful in speech therapy; however, it is as yet unclear how to create intuitive and fitting visualizations. To better understand the mappings from speech sound aspects to visual space, a large web-based experiment (
<italic>n</italic>
 = 249) was performed to evaluate spatial parameters that may optimally represent pitch and loudness of speech. To this end, five novel animated visualizations were developed and presented in pairwise comparisons, together with a static visualization. Pitch and loudness of speech were each mapped onto either the vertical (
<italic>y</italic>
-axis) or the size (
<italic>z</italic>
-axis) dimension, or combined (with size indicating loudness and vertical position indicating pitch height) and visualized as an animation along the horizontal dimension (
<italic>x</italic>
-axis) over time. The results indicated that firstly, there is a general preference towards the use of the
<italic>y</italic>
-axis for both pitch and loudness, with pitch ranking higher than loudness in terms of fit. Secondly, the data suggest that representing both pitch and loudness combined in a single visualization is preferred over visualization in only one dimension. Finally, the
<italic>z</italic>
-axis, although not preferred, was evaluated as corresponding better to loudness than to pitch. This relation between sound and visual space has not been reported previously for speech sounds, and elaborates earlier findings on musical material. In addition to elucidating more general mappings between auditory and visual modalities, the findings provide us with a method of visualizing speech that may be helpful in clinical applications such as computerized speech therapy, or other feedback-based learning paradigms.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Athanasopoulos, G" uniqKey="Athanasopoulos G">G Athanasopoulos</name>
</author>
<author>
<name sortKey="Moran, N" uniqKey="Moran N">N Moran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beijer, Lj" uniqKey="Beijer L">LJ Beijer</name>
</author>
<author>
<name sortKey="Rietveld, Acm" uniqKey="Rietveld A">ACM Rietveld</name>
</author>
<author>
<name sortKey="Hoskam, V" uniqKey="Hoskam V">V Hoskam</name>
</author>
<author>
<name sortKey="Geurts, Ach" uniqKey="Geurts A">ACH Geurts</name>
</author>
<author>
<name sortKey="De Swart, Bjm" uniqKey="De Swart B">BJM de Swart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beijer, Lj" uniqKey="Beijer L">LJ Beijer</name>
</author>
<author>
<name sortKey="Rietveld, Acm" uniqKey="Rietveld A">ACM Rietveld</name>
</author>
<author>
<name sortKey="Van Beers, Mma" uniqKey="Van Beers M">MMA van Beers</name>
</author>
<author>
<name sortKey="Slangen, Rml" uniqKey="Slangen R">RML Slangen</name>
</author>
<author>
<name sortKey="Van Den Heuvel, H" uniqKey="Van Den Heuvel H">H van den Heuvel</name>
</author>
<author>
<name sortKey="De Swart, Bjm" uniqKey="De Swart B">BJM de Swart</name>
</author>
<author>
<name sortKey="Geurts, Ach" uniqKey="Geurts A">ACH Geurts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beijer, Lj" uniqKey="Beijer L">LJ Beijer</name>
</author>
<author>
<name sortKey="Rietveld, Acm" uniqKey="Rietveld A">ACM Rietveld</name>
</author>
<author>
<name sortKey="Ruiter, Mb" uniqKey="Ruiter M">MB Ruiter</name>
</author>
<author>
<name sortKey="Geurts, Ach" uniqKey="Geurts A">ACH Geurts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beijer, Lj" uniqKey="Beijer L">LJ Beijer</name>
</author>
<author>
<name sortKey="Rietveld, Acm" uniqKey="Rietveld A">ACM Rietveld</name>
</author>
<author>
<name sortKey="Van Stiphout, Ajl" uniqKey="Van Stiphout A">AJL van Stiphout</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brandmeyer, A" uniqKey="Brandmeyer A">A Brandmeyer</name>
</author>
<author>
<name sortKey="Timmers, R" uniqKey="Timmers R">R Timmers</name>
</author>
<author>
<name sortKey="Sadakata, M" uniqKey="Sadakata M">M Sadakata</name>
</author>
<author>
<name sortKey="Desain, P" uniqKey="Desain P">P Desain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burger, B" uniqKey="Burger B">B Burger</name>
</author>
<author>
<name sortKey="Thompson, Mr" uniqKey="Thompson M">MR Thompson</name>
</author>
<author>
<name sortKey="Luck, G" uniqKey="Luck G">G Luck</name>
</author>
<author>
<name sortKey="Saarikallio, S" uniqKey="Saarikallio S">S Saarikallio</name>
</author>
<author>
<name sortKey="Toiviainen, P" uniqKey="Toiviainen P">P Toiviainen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Demenko, G" uniqKey="Demenko G">G Demenko</name>
</author>
<author>
<name sortKey="Wagner, A" uniqKey="Wagner A">A Wagner</name>
</author>
<author>
<name sortKey="Cylwik, N" uniqKey="Cylwik N">N Cylwik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dixon, S" uniqKey="Dixon S">S Dixon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dolscheid, S" uniqKey="Dolscheid S">S Dolscheid</name>
</author>
<author>
<name sortKey="Shayan, S" uniqKey="Shayan S">S Shayan</name>
</author>
<author>
<name sortKey="Majid, A" uniqKey="Majid A">A Majid</name>
</author>
<author>
<name sortKey="Casasanto, D" uniqKey="Casasanto D">D Casasanto</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eitan, Z" uniqKey="Eitan Z">Z Eitan</name>
</author>
<author>
<name sortKey="Granot, Ry" uniqKey="Granot R">RY Granot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eitan, Z" uniqKey="Eitan Z">Z Eitan</name>
</author>
<author>
<name sortKey="Timmers, R" uniqKey="Timmers R">R Timmers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gallace, A" uniqKey="Gallace A">A Gallace</name>
</author>
<author>
<name sortKey="Spence, C" uniqKey="Spence C">C Spence</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ho, Ak" uniqKey="Ho A">AK Ho</name>
</author>
<author>
<name sortKey="Bradshaw, Jl" uniqKey="Bradshaw J">JL Bradshaw</name>
</author>
<author>
<name sortKey="Iansek, R" uniqKey="Iansek R">R Iansek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoppe, D" uniqKey="Hoppe D">D Hoppe</name>
</author>
<author>
<name sortKey="Sadakata, M" uniqKey="Sadakata M">M Sadakata</name>
</author>
<author>
<name sortKey="Desain, P" uniqKey="Desain P">P Desain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ishihara, M" uniqKey="Ishihara M">M Ishihara</name>
</author>
<author>
<name sortKey="Keller, Pe" uniqKey="Keller P">PE Keller</name>
</author>
<author>
<name sortKey="Rossetti, Y" uniqKey="Rossetti Y">Y Rossetti</name>
</author>
<author>
<name sortKey="Prinz, W" uniqKey="Prinz W">W Prinz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Y" uniqKey="Kim Y">Y Kim</name>
</author>
<author>
<name sortKey="Kent, Rd" uniqKey="Kent R">RD Kent</name>
</author>
<author>
<name sortKey="Weismer, G" uniqKey="Weismer G">G Weismer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kussner, Mb" uniqKey="Kussner M">MB Küssner</name>
</author>
<author>
<name sortKey="Leech Wilkinson, D" uniqKey="Leech Wilkinson D">D Leech-Wilkinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nijs, L" uniqKey="Nijs L">L Nijs</name>
</author>
<author>
<name sortKey="Leman, M" uniqKey="Leman M">M Leman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Plomp, R" uniqKey="Plomp R">R Plomp</name>
</author>
<author>
<name sortKey="Mimpen, Am" uniqKey="Mimpen A">AM Mimpen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rossiter, D" uniqKey="Rossiter D">D Rossiter</name>
</author>
<author>
<name sortKey="Howard, Dm" uniqKey="Howard D">DM Howard</name>
</author>
<author>
<name sortKey="Decosta, M" uniqKey="Decosta M">M DeCosta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rusconi, E" uniqKey="Rusconi E">E Rusconi</name>
</author>
<author>
<name sortKey="Kwan, B" uniqKey="Kwan B">B Kwan</name>
</author>
<author>
<name sortKey="Giordano, Bl" uniqKey="Giordano B">BL Giordano</name>
</author>
<author>
<name sortKey="Umilta, C" uniqKey="Umilta C">C Umiltá</name>
</author>
<author>
<name sortKey="Butterworth, B" uniqKey="Butterworth B">B Butterworth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sadakata, M" uniqKey="Sadakata M">M Sadakata</name>
</author>
<author>
<name sortKey="Hoppe, D" uniqKey="Hoppe D">D Hoppe</name>
</author>
<author>
<name sortKey="Brandmeyer, A" uniqKey="Brandmeyer A">A Brandmeyer</name>
</author>
<author>
<name sortKey="Timmers, R" uniqKey="Timmers R">R Timmers</name>
</author>
<author>
<name sortKey="Desain, P" uniqKey="Desain P">P Desain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Scheffe, H" uniqKey="Scheffe H">H Scheffé</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmidt, Ra" uniqKey="Schmidt R">RA Schmidt</name>
</author>
<author>
<name sortKey="Lee, Td" uniqKey="Lee T">TD Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spence, C" uniqKey="Spence C">C Spence</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Timmers, R" uniqKey="Timmers R">R Timmers</name>
</author>
<author>
<name sortKey="Sadakata, M" uniqKey="Sadakata M">M Sadakata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Troche, J" uniqKey="Troche J">J Troche</name>
</author>
<author>
<name sortKey="Troche, Ms" uniqKey="Troche M">MS Troche</name>
</author>
<author>
<name sortKey="Berkowitz, R" uniqKey="Berkowitz R">R Berkowitz</name>
</author>
<author>
<name sortKey="Grossman, M" uniqKey="Grossman M">M Grossman</name>
</author>
<author>
<name sortKey="Reilly, J" uniqKey="Reilly J">J Reilly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Walker, P" uniqKey="Walker P">P Walker</name>
</author>
<author>
<name sortKey="Bremner, Jg" uniqKey="Bremner J">JG Bremner</name>
</author>
<author>
<name sortKey="Mason, U" uniqKey="Mason U">U Mason</name>
</author>
<author>
<name sortKey="Spring, J" uniqKey="Spring J">J Spring</name>
</author>
<author>
<name sortKey="Mattock, K" uniqKey="Mattock K">K Mattock</name>
</author>
<author>
<name sortKey="Slater, A" uniqKey="Slater A">A Slater</name>
</author>
<author>
<name sortKey="Johnson, Sp" uniqKey="Johnson S">SP Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Walker, R" uniqKey="Walker R">R Walker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Watanabe, A" uniqKey="Watanabe A">A Watanabe</name>
</author>
<author>
<name sortKey="Tomishige, S" uniqKey="Tomishige S">S Tomishige</name>
</author>
<author>
<name sortKey="Nakatake, M" uniqKey="Nakatake M">M Nakatake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, Ph" uniqKey="Wilson P">PH Wilson</name>
</author>
<author>
<name sortKey="Lee, K" uniqKey="Lee K">K Lee</name>
</author>
<author>
<name sortKey="Callaghan, J" uniqKey="Callaghan J">J Callaghan</name>
</author>
<author>
<name sortKey="Thorpe, Cw" uniqKey="Thorpe C">CW Thorpe</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Psychon Bull Rev</journal-id>
<journal-id journal-id-type="iso-abbrev">Psychon Bull Rev</journal-id>
<journal-title-group>
<journal-title>Psychonomic Bulletin & Review</journal-title>
</journal-title-group>
<issn pub-type="ppub">1069-9384</issn>
<issn pub-type="epub">1531-5320</issn>
<publisher>
<publisher-name>Springer US</publisher-name>
<publisher-loc>New York</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26370217</article-id>
<article-id pub-id-type="pmc">4828474</article-id>
<article-id pub-id-type="publisher-id">934</article-id>
<article-id pub-id-type="doi">10.3758/s13423-015-0934-0</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Brief Report</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Intuitive visualizations of pitch and loudness in speech</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Schaefer</surname>
<given-names>Rebecca S.</given-names>
</name>
<address>
<email>r.s.schaefer@fsw.leidenuniv.nl</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Beijer</surname>
<given-names>Lilian J.</given-names>
</name>
<xref ref-type="aff" rid="Aff3"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Seuskens</surname>
<given-names>Wiel</given-names>
</name>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rietveld</surname>
<given-names>Toni C. M.</given-names>
</name>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff5"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sadakata</surname>
<given-names>Makiko</given-names>
</name>
<xref ref-type="aff" rid="Aff4"></xref>
<xref ref-type="aff" rid="Aff6"></xref>
</contrib>
<aff id="Aff1">
<label></label>
Health, Medical and Neuropsychology Unit, Institute of Psychology, Leiden University, Leiden, The Netherlands</aff>
<aff id="Aff2">
<label></label>
Leiden Institute for Brain and Cognition (LIBC), Leiden University, Leiden, The Netherlands</aff>
<aff id="Aff3">
<label></label>
Sint Maartenskliniek Research, Nijmegen, The Netherlands</aff>
<aff id="Aff4">
<label></label>
Donders Institute for Brain, Cognition and Behaviour, Centre for Cognition, Nijmegen, The Netherlands</aff>
<aff id="Aff5">
<label></label>
Department of Linguistics, Radboud University Nijmegen, Nijmegen, The Netherlands</aff>
<aff id="Aff6">
<label></label>
The Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>14</day>
<month>9</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>14</day>
<month>9</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="ppub">
<year>2016</year>
</pub-date>
<volume>23</volume>
<fpage>548</fpage>
<lpage>555</lpage>
<permissions>
<copyright-statement>© The Author(s) 2015</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<p>Visualizing acoustic features of speech has proven helpful in speech therapy; however, it is as yet unclear how to create intuitive and fitting visualizations. To better understand the mappings from speech sound aspects to visual space, a large web-based experiment (
<italic>n</italic>
 = 249) was performed to evaluate spatial parameters that may optimally represent pitch and loudness of speech. To this end, five novel animated visualizations were developed and presented in pairwise comparisons, together with a static visualization. Pitch and loudness of speech were each mapped onto either the vertical (
<italic>y</italic>
-axis) or the size (
<italic>z</italic>
-axis) dimension, or combined (with size indicating loudness and vertical position indicating pitch height) and visualized as an animation along the horizontal dimension (
<italic>x</italic>
-axis) over time. The results indicated that firstly, there is a general preference towards the use of the
<italic>y</italic>
-axis for both pitch and loudness, with pitch ranking higher than loudness in terms of fit. Secondly, the data suggest that representing both pitch and loudness combined in a single visualization is preferred over visualization in only one dimension. Finally, the
<italic>z</italic>
-axis, although not preferred, was evaluated as corresponding better to loudness than to pitch. This relation between sound and visual space has not been reported previously for speech sounds, and elaborates earlier findings on musical material. In addition to elucidating more general mappings between auditory and visual modalities, the findings provide us with a method of visualizing speech that may be helpful in clinical applications such as computerized speech therapy, or other feedback-based learning paradigms.</p>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Audio-visual processing</kwd>
<kwd>Visualizing sound</kwd>
<kwd>Speech therapy</kwd>
<kwd>Feedback learning</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© Psychonomic Society, Inc. 2016</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1" sec-type="introduction">
<title>Introduction</title>
<p>Visualizing sounds is useful in a range of learning situations in which a visual representation can give information about the sound produced, e.g., speech (Demenko, Wagner & Clywik,
<xref ref-type="bibr" rid="CR10">2010</xref>
; Watanabe, Tomishige & Nakatake,
<xref ref-type="bibr" rid="CR38">2000</xref>
) or music (Dixon,
<xref ref-type="bibr" rid="CR11">2007</xref>
; Hoppe, Sadakata, & Desain,
<xref ref-type="bibr" rid="CR17">2006</xref>
; McLeod,
<xref ref-type="bibr" rid="CR23">2008</xref>
; Sadakata, Hoppe, Brandmeyer, Timmers, & Desain,
<xref ref-type="bibr" rid="CR28">2008</xref>
; Stowell & Plumbley,
<xref ref-type="bibr" rid="CR32">2007</xref>
). It is also increasingly common for software dealing with audio signals to support acoustic analyses by providing visualization of various parameters, such as pitch, loudness and formants (Timmers & Sadakata,
<xref ref-type="bibr" rid="CR34">2014</xref>
). In this way, additional information is included in the feedback learning process through visual presentation. The specific visualization can focus the user’s attention towards a specific aspect of the sound, informing the user of specific aspects of performance that could be improved and potentially offering a method to increase perceptual sensitivity. Learning efficacy depends crucially on the type of feedback, the complexity of the task, and the individuals’ skill level (Wilson, Lee, Callaghan & Thorpe,
<xref ref-type="bibr" rid="CR39">2008</xref>
; Rossiter & Howard,
<xref ref-type="bibr" rid="CR26">1996</xref>
; Brandmeyer, Timmers, Sadakata, & Desain,
<xref ref-type="bibr" rid="CR8">2011</xref>
). This interaction seems to hold more generally for motor skill acquisition (Schmidt & Lee,
<xref ref-type="bibr" rid="CR30">2010</xref>
) and this makes it challenging to define a general rule to optimize feedback features. Therefore, fine-tuning of feedback features for the target task and population is necessary to maximize learning.</p>
<p>Investigations into e-learning based speech therapy (EST, Beijer et al.,
<xref ref-type="bibr" rid="CR3">2010b</xref>
; Beijer, Rietveld, Hoskam, Geurts, & de Swart,
<xref ref-type="bibr" rid="CR2">2010a</xref>
) have shown that visualizing pitch height and loudness of speech helps patients in receiving meaningful computerized feedback based on personalized training goals, thus allowing efficient home-based practice. The visual feedback for this method was initially created as two separate graphs of pitch and loudness, plotted on the
<italic>y</italic>
-axis with time on the
<italic>x</italic>
-axis. Pitch and loudness are the main aspects of speech that need to be practiced in patients with Parkinson’s disease (PD), and the technique of pitch limiting voice treatment (PLVT), which aims to increase loudness while at the same time limiting an increase in vocal pitch to prevent a strained or pressed voice (de Swart, Willemse, Maassen & Horstink,
<xref ref-type="bibr" rid="CR33">2003</xref>
) is often employed. Findings from a case study (Beijer et al.,
<xref ref-type="bibr" rid="CR2">2010a</xref>
) with a patient with PD indicated that visualization needed to be improved in terms of its interpretability (see Fig. 
<xref rid="Fig1" ref-type="fig">1f</xref>
). The use of intuitive, integrated and informative visualization of speech is highly relevant to the effectiveness of the intervention; to assist patients in an independent web-based speech training process, the visualizations should be easy to understand, and make apparent sense for a particular sound. To this end, optimally intuitive mappings between visual and sound dimensions need to be established.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Stimulus visualizations. Time is always represented on the
<italic>x</italic>
-axis, and pitch and loudness are represented either on the
<italic>y</italic>
- or
<italic>z</italic>
-axes (
<bold>a</bold>
<bold>d</bold>
) or both (
<bold>e</bold>
). The original static feedback used in the e-learning based speech therapy (EST) system is also shown (
<bold>f</bold>
)</p>
</caption>
<graphic xlink:href="13423_2015_934_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p>In the current work, several visualizations of speech were evaluated in terms of fit between auditory and visual dimensions. In the literature on crossmodal correspondence, a number of consistent mappings have been reported. The term crossmodal correspondence refers to congruency between dimensions in different sensory modalities, ranging from low-level amodal properties to more abstract, high-level cognitive correspondences based on stimulus meaning (Spence,
<xref ref-type="bibr" rid="CR31">2011</xref>
). With the majority of research focusing on the mappings between visual and auditory aspects, the use of static figures and relatively short sounds is most common, and has revealed associations of pitch height with vertical position, brightness or lightness of the stimulus and more angular, or smaller shapes, and of loudness with brightness (for more detail, see review by Spence,
<xref ref-type="bibr" rid="CR31">2011</xref>
). However, these studies have generally used categorical instances of visual or auditory dimensions (i.e. high and low pitch matched with small or large shapes) rather than continuous measures and sound sequences. This is not the case for studies looking at correspondences between visual features and musical fragments, which are more like speech in the sense that a longer, continuous sound is represented visually. Previous literature on visualizing musical aspects has also shown interesting cross-modal associations. For example, the so-called spatial-musical association of response codes (SMARC) effect describes a tendency that (musically trained) listeners associate high pitch tones with the right-up corner and low pitch tones with the left-bottom corner of a two-dimensional (2D) plane (Rusconi, Kwan, Giordano, Umiltá, & Butterworth,
<xref ref-type="bibr" rid="CR27">2006</xref>
). In line with this finding, among various visual features height is one of the most prominent dimensions to be associated with musical pitch (Eitan & Timmers,
<xref ref-type="bibr" rid="CR14">2010</xref>
; Küssner & Leech-Wilkinson,
<xref ref-type="bibr" rid="CR20">2014</xref>
; Lipscomb & Kim,
<xref ref-type="bibr" rid="CR22">2004</xref>
; Walker, Bremner, Mason, Spring, Mattock, Slater, & Johnson,
<xref ref-type="bibr" rid="CR36">2009</xref>
). Similarly, loudness of musical sound has been associated with the size of visual objects (Küssner & Leech-Wilkinson,
<xref ref-type="bibr" rid="CR20">2014</xref>
; Lipscomb & Kim,
<xref ref-type="bibr" rid="CR22">2004</xref>
; Nijs & Leman,
<xref ref-type="bibr" rid="CR24">2014</xref>
). These associations make intuitive sense, as we often refer to pitch as being high or low, and loudness and size could both indicate distance from an auditory object; however, to our knowledge they have not been evaluated for speech sounds.</p>
<p>In order to evaluate these findings for speech material, we devised multiple visualizations for spoken sentences, and asked participants to rate how well they thought the visualization fit with the sound (or if they preferred that visualization as the better match). Pitch was visualized either as the positioning of circles in the vertical dimension (
<italic>y</italic>
-axis), or as the size of a circle (or
<italic>z</italic>
-axis, taking size as the third dimension); the same was done for loudness. Time was always represented on the
<italic>x</italic>
-axis (left to right), thus creating an animation as the circles appear with the speech sounds. A stable mapping for time as left to right has been established for duration (Walker,
<xref ref-type="bibr" rid="CR37">1987</xref>
), early or late clicks [spatial temporal association of response code (STEARC) effect; Ishihara, Keller, Rossetti & Prinz,
<xref ref-type="bibr" rid="CR18">2008</xref>
], as well as for musical material (Athanasopoulos & Moran,
<xref ref-type="bibr" rid="CR1">2013</xref>
). In the interest of experiment time, the only combined visualization that was used represented pitch on the
<italic>y</italic>
-axis (vertical space) and loudness in circle size. Finally, the original visualization used in EST was also included, to provide a comparison with the current system (all visualizations are shown in Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
).</p>
<p>The outcome measure, namely the perceived goodness of fit of the visualization with the sound, can be used to address a number of questions. Of these, the most relevant to speech therapy applications is which of the two important acoustic features for the PLVT method, namely pitch and loudness, fits better with the
<italic>y</italic>
- and
<italic>z</italic>
-axes (vertical position and size of visual objects). We hypothesized that the previously reported findings for music would be replicated for speech material, leading to pitch being associated most with the
<italic>y</italic>
-axis and loudness with the
<italic>z</italic>
-axis (cf. Eitan & Timmers,
<xref ref-type="bibr" rid="CR14">2010</xref>
; Küssner & Leech-Wilkinson,
<xref ref-type="bibr" rid="CR20">2014</xref>
; Lipscomb & Kim,
<xref ref-type="bibr" rid="CR22">2004</xref>
; Walker et al.
<xref ref-type="bibr" rid="CR36">2009</xref>
). A second question of interest is whether visualizing only one of the two features is enough to create a fitting impression, or if a combined visualization of the two would be evaluated as better. We predicted that visualizing pitch and loudness in combination would be the most fitting, as this is the most complete representation of the sound. Finally, by comparing the newly developed visualizations to the original EST visualization, we can compare the effect of presenting an animated visualization instead of a static image, as well as that of a single representation compared to two graphs.</p>
</sec>
<sec id="Sec2" sec-type="materials|methods">
<title>Method</title>
<sec id="Sec3">
<title>Participants</title>
<p>Participants were recruited online through global social and professional networks, mailing lists and web portals, and were given the option to select either English or Dutch as the instruction language. A total of 271 participants completed the web experiment on their own personal computer. Participants who reported neurological disorders or any uncorrected impairments in hearing or vision were excluded, leaving 249 participants (77 male and 172 female). Their age groups and choice of experiment instruction language are shown in Table
<xref rid="Tab1" ref-type="table">1</xref>
. Almost one-half (41.8 %) of the participants who opted for English instructions indicated that English was not their mother tongue.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Participants: age distribution and experiment language choice</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td>Age group (years)</td>
<td>0-20</td>
<td>21-40</td>
<td>41-60</td>
<td>61-80</td>
</tr>
<tr>
<td>Dutch version</td>
<td>1</td>
<td>67</td>
<td>54</td>
<td>25</td>
</tr>
<tr>
<td>English version</td>
<td>1</td>
<td>70</td>
<td>23</td>
<td>8</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec4">
<title>Stimuli</title>
<p>Six sentences were selected from a standard set of short Dutch sentences consisting of a noun phrase + verb phrase used in speech audiometry (Plomp & Mimpen,
<xref ref-type="bibr" rid="CR25">1979</xref>
), for example ‘In de
<italic>len</italic>
te lopen de
<italic>paar</italic>
den
<italic>heer</italic>
lijk in de wei (‘In spring, the horses run freely in the field’), with syllables in italics indicating pitch accents (associated with specific pitch movements) and potentially louder speech segments.</p>
<p>All stimulus sentences were spoken by a male speaker. The audio recordings had a sampling rate of 44.1 kHz and were high-pass filtered at 50 Hz. The visualizations were created by first estimating the pitch and loudness in the sound signal using the speech processing software ‘Praat’ (Boersma & Weenink,
<xref ref-type="bibr" rid="CR6">2005</xref>
), and removing all the zero values (which represent short silences). Then, outliers in pitch contours were removed using a 3rd-order median filter. The range of the pitch and loudness were normalized to fit the range of the visualization space (300 x 550 pixels) and the starting point on the
<italic>y</italic>
-axis was plotted at 10 % of the vertical space. The
<italic>z</italic>
-axis (or size) increases were implemented linearly.</p>
<p>Three experiment versions were created, containing two of the six selected sentences each, thus creating three different stimulus sets. Six visualizations were created for every sentence, varying the mapping to the vertical direction (
<italic>y</italic>
-axis) and size (
<italic>z</italic>
-axis) of circles appearing with the sound, shown in Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
, with the horizontal direction fixed to represent time. Two visualizations used only the
<italic>y</italic>
-axis of the figure to visualize either pitch or loudness, referred to as PitchY and LoudnessY (Fig. 
<xref rid="Fig1" ref-type="fig">1a,b</xref>
). Figure
<xref rid="Fig1" ref-type="fig">1c and d</xref>
show the same principle applied to the
<italic>z</italic>
-axis, referred to as PitchZ and LoudnessZ. Figure
<xref rid="Fig1" ref-type="fig">1e</xref>
shows a combined visualization in which pitch represented the
<italic>y</italic>
-axis and loudness the z-axis, referred to as YZ-Combined, and Fig. 
<xref rid="Fig1" ref-type="fig">1f</xref>
shows the original static feedback used in the EST system. Example animations are available as supplementary material.</p>
<p>All six visualizations were compared in all possible pairs and orders, yielding 15 combinations per sentence. Visualizing two sentences resulted in 30 comparisons per experiment version, taking 10–15 min in total. Figure
<xref rid="Fig2" ref-type="fig">2</xref>
shows a screenshot of the web interface, where relative preference for the visualization displayed on either the left or right side of the screen could be indicated on a seven-point scale. The presentation of the comparisons was randomized. The order of presenting two visualizations (left/right) was also randomized, and the color of the shapes was varied randomly (but kept stable within comparisons).
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Example experiment screenshot showing LoudnessY and YZ-Combined visualizations, with clickable rating options below the two panels</p>
</caption>
<graphic xlink:href="13423_2015_934_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
</sec>
<sec id="Sec5">
<title>Procedure</title>
<p>Participants completed the web experiment on their own personal computer, after initially testing for sufficient sound volume. The use of headphones or earphones was strongly encouraged. After selecting Dutch or English as the preferred experiment language (note that all stimulus sentences were in Dutch), participants answered some basic questions regarding their demographic information and reporting of any perceptual or neurological problems. Participants were randomized over the three experiment versions based on their experiment language, gender and age category, creating optimally matched groups for each version.</p>
<p>Thirty pairs of visualizations (15 for each sentence) were shown in turn, allowing the participant to view two animations per comparison, with the final shapes remaining on screen while participants responded at their own pace. Participants were asked to rate which one of the two visualizations best matched the sound on a seven-point scale, indicating preference for either the left or the right option (see Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
for a screenshot of the experiment). Thus, a preference score is produced, on a scale ranging from −3 (stimulus A is greatly preferred to stimulus B) via 0 to +3 (stimulus B is greatly preferred to stimulus A), yielding distances between the compared visualizations.</p>
<p>After these 30 comparisons had been completed, another short list of questions was presented, asking participants about their experience during the experiment as well as additional questions about their mother tongue and understanding of the sentences. The experiment took about 15 min in total.</p>
</sec>
<sec id="Sec6">
<title>Analysis</title>
<p>The scores of each comparison were analyzed using Scheffé’s paired comparisons test (Scheffé,
<xref ref-type="bibr" rid="CR29">1952</xref>
). This method was developed to procure a ranking of stimuli at an interval scale. Based on the variances and degrees of freedom in the data, a ‘yardstick’ is produced to determine a minimum distance in the ranking that denotes a statistically significant difference between objects (in this case visualizations), based on a chosen significance level. This method has been used previously in the study of intonation, in the evaluation of synthetic speech and in the assessment of intelligibility of dysarthric speakers (Beijer, Rietveld, Ruiter & Geurts,
<xref ref-type="bibr" rid="CR4">2014</xref>
). This analysis was performed for the entire participant group as well as for subsets of the group, yielding rankings for subgroups varying in experiment language, age groups, stimulus-set and gender. Statistical significance was tested at α = 0.01.</p>
</sec>
</sec>
<sec id="Sec7" sec-type="results">
<title>Results</title>
<p>The ranking and distances of the whole group are shown in Fig. 
<xref rid="Fig3" ref-type="fig">3</xref>
. The preference ranking showed that the YZ-Combined visualization was by far the most preferred, and the static graphs used in the original version of EST the least preferred. The two visualizations using the
<italic>y</italic>
-axis were next highest in the ranking (first pitch, then loudness), followed by the
<italic>z</italic>
-axis (first loudness, then pitch). Table
<xref rid="Tab2" ref-type="table">2</xref>
summarizes the preference scores for all comparisons. For this dataset, the yardstick
<italic>Y</italic>
.
<sub>01</sub>
indicating a significant difference between the ratings of compared pairs was calculated to be 0.0171. All difference values exceeded 0.0171, indicating that the ratings of all comparisons differed significantly (
<italic>P</italic>
 < .01). For various participant subgroups, separated for each experiment version (stimulus sentence set) or by demographic characteristics (gender, age and language), the distances varied slightly but the preference orders were all the same, with significant differences between all pairs, and are not further reported here.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>The full dataset ranking and distances according to Scheffe’s test of paired comparisons, with higher values representing increased preference. Preferred visualizations are ranked from high to low-matching as YZ-Combined, PitchY, LoudnessY, LoudnessZ, PitchZ, static graphs (EST)</p>
</caption>
<graphic xlink:href="13423_2015_934_Fig3_HTML" id="MO3"></graphic>
</fig>
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Summary of estimated differences between preference scores for all visualization comparisons (abbreviations described under ‘Stimuli’)</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th>PItchZ</th>
<th>LoudnessY</th>
<th>LoudnessZ</th>
<th>YZ-Combined</th>
<th>EST</th>
</tr>
</thead>
<tbody>
<tr>
<td>PitchY</td>
<td>0.98</td>
<td>0.155</td>
<td>0.885</td>
<td>0.185</td>
<td char="." align="char">1.697</td>
</tr>
<tr>
<td>PitchZ</td>
<td></td>
<td>0.825</td>
<td>0.095</td>
<td>1.165</td>
<td char="." align="char">0.718</td>
</tr>
<tr>
<td>LoudnessY</td>
<td></td>
<td></td>
<td>0.730</td>
<td>0.340</td>
<td char="." align="char">1.543</td>
</tr>
<tr>
<td>LoudnessZ</td>
<td></td>
<td></td>
<td></td>
<td>1.071</td>
<td char="." align="char">0.812</td>
</tr>
<tr>
<td>YZ-Combined</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td char="." align="char">1.883</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec8" sec-type="discussion">
<title>Discussion</title>
<p>The crossmodal mappings of auditory and visual parameters were evaluated for speech sounds, providing a first large-scale investigation of the relation between speech sounds and visual space, with implications for therapeutic paradigms meant to support speech therapy with visual feedback. Our results matched earlier findings of associations of visual dimensions with continuous (musical) sounds, on a large scale and with a wider age range. In terms of the visual dimensions that were used (
<italic>y</italic>
- and
<italic>z</italic>
-axes), the
<italic>y</italic>
-axis is rated to better represent both pitch and loudness aspects, with pitch rated as more fitting than loudness. However, on the (lesser-preferred)
<italic>z</italic>
-axis, loudness was judged as better fitting than pitch, which is also in line with the correspondence reported for static shapes where larger size is mapped to lower rather than higher pitch (cf. Gallace & Spence,
<xref ref-type="bibr" rid="CR15">2006</xref>
). The combined visualization was considered the best, and the static two-graph visualization was considered the worst fit. Although these findings were attained through a web-based experiment with less rigid experimental control, the relatively large test group (
<italic>n</italic>
 = 249) and the replication of the preference rankings for different genders, age groups and stimulus materials supports the robustness of this result. Furthermore, the lack of discrepancy between the rankings obtained by English and Dutch version of experiments indicates that prosodic information could be judged independently from semantic information of the sentences used, which was not available to non-Dutch speaking participants. This supports the generally reported notion that some crossmodal correspondences, especially the more low-level perceptual mappings, are found across different cultures and are considered to be universal (cf. Walker,
<xref ref-type="bibr" rid="CR37">1987</xref>
; Spence,
<xref ref-type="bibr" rid="CR31">2011</xref>
, but see also Athanasopoulos & Moran,
<xref ref-type="bibr" rid="CR1">2013</xref>
, and Küssner, Tidhar, Prior & Leech-Wilkinson (
<xref ref-type="bibr" rid="CR21">2014</xref>
) for influences of culture and training on reliability of crossmodal mappings).</p>
<p>These results essentially support but also refine our hypotheses, namely that in representing single dimensions (pitch height or loudness), the
<italic>y</italic>
-axis or vertical space is most associated with pitch, whereas size (or the
<italic>z</italic>
-axis) indeed fits best with loudness. However, for loudness the reverse does not hold, in that the general preference for the
<italic>y</italic>
-axis overrides the association of loudness with the
<italic>z</italic>
-axis, and thus LoudnessY is preferred over LoudnessZ, which was not predicted. This may be related to vertical space being used more commonly than size when visualizing a time course or changing signal, even though the classic visualization of sound waveforms (although not a veridical representation of perceptual loudness) visualize loudness much more obviously than pitch. The finding that the combined visualization was most preferred also supports our hypotheses, although in the absence of the reversed combined visualization (with loudness on the
<italic>y</italic>
-axis and pitch on the
<italic>z</italic>
-axis), this preference cannot be interpreted as a support for a specific mapping. In terms of the low preference for the static images, it must be noted that, although the aim was to provide a baseline score for the comparison between visualizations, it is likely that the simple fact that the other visualizations were animated precluded equal comparisons with the separate static graphs. However, in the interest of evaluating the EST speech therapy system as it has been developed, we opted to keep the original visualizations as the comparison. However, it is clear that single animated visualizations are much preferred over separate static images. This preference indicates that visualization of an evolving time structure, as is inherent to longer sound fragments, increases the congruence between the visual and auditory stimuli.</p>
<p>Crossmodal congruencies have been described as having several possible origins, ranging from structural correspondences, thought to be based on commonalities in neural processing, to statistical correspondences, based on consistent co-occurrence of stimulus attributes in the environment, to semantically mediated correspondences, based on common linguistic terms (see Spence,
<xref ref-type="bibr" rid="CR31">2011</xref>
, for further discussion). In the context of the current findings, there is no possibility of distinguishing adequately between these phenomena, as arguments can be made for multiple mechanisms. Behavioral findings have suggested that the cross-modal mapping between pitch and vertical orientation is innate (Walker et al.,
<xref ref-type="bibr" rid="CR36">2009</xref>
). However, a low-level perceptual correspondence may well be further strengthened by the use of the words ‘high’ and ‘low’ for pitch; both have semantic and spatial implications, see for example Dolscheid, Shayan, Majid & Casasanto, (
<xref ref-type="bibr" rid="CR12">2013</xref>
) for an elegant demonstration of how changing linguistic space-pitch metaphors can impact representations of pitch. The correspondence of large objects and loud sounds could be explained by the common occurrence of previous experience of this mapping, as well as by the inference that physically bigger (or closer) beings or objects often make louder sounds than smaller (or more distant) beings or objects. The preference for the
<italic>y</italic>
-axis over the
<italic>z</italic>
-axis for feature representation may also indicate that information is transmitted more easily in this dimension, which in this case might be related to the scale used in each dimension; in the current setting, the
<italic>y</italic>
-axis necessarily had a greater range of display.</p>
<p>There are some limitations of these results in terms of generalizing our findings to practice, for example speech therapy for PD patients as reported in the original EST study by Beijer et al. (
<xref ref-type="bibr" rid="CR2">2010a</xref>
), or other paradigms that make use of visual feedback of sound. Additional research will need to show that the current findings also hold for the targeted user groups, who may have specific attentional or perceptual deficits. For instance, previous research investigating pure tone discrimination in PD patients showed a reduced ability to notice change in frequency (or pitch) and intensity (or loudness) as compared to healthy older adults (Troche, Troche, Berkowitz, Grossman & Reilly,
<xref ref-type="bibr" rid="CR35">2012</xref>
). Results of a study into auditory speech discrimination by means of paired comparisons also showed problems in detecting different frequency levels, but not in different intensity levels (Beijer, Rietveld & van Stiphout,
<xref ref-type="bibr" rid="CR5">2011</xref>
). PD patients may also be impaired in judging loudness in their own speech (Ho, Bradshaw and Iansek,
<xref ref-type="bibr" rid="CR16">2000</xref>
). These perceptual deficits are relevant to the design of possible visualizations for speech therapy for PD, as visual feedback can be used to highlight acoustic features that provide important learning information, but are not easily perceived. Of course, other user groups or specific applications of the feedback learning paradigm may necessitate a different set of visualization criteria and paradigm features. Additionally, it should be noted that preferred visualizations are not necessarily more useful while extracting information about the sound. For example, Brandmeyer et al. (
<xref ref-type="bibr" rid="CR8">2011</xref>
) found that, although the majority of participants preferred an analytic, informative visualization of music performance, the most useful visualization in terms of learning performance was holistic, without explicit information. Now that we have established the most intuitive way is to represent pitch and loudness of speech in an animation, the next step is to validate the transfer of information about aspects of speech that need to be changed, namely, pitch and loudness. In this way, visualizations can be developed that are not only intuitive but also maximally informative.</p>
<p>The current study contributes a large-scale investigation of the preferred mappings of speech sounds to visual dimensions, which turn out to be generalizable over age groups and gender, and independent of semantic understanding or specific sentences. The results extend previous reports for the musical domain (Lipscomb & Kim,
<xref ref-type="bibr" rid="CR22">2004</xref>
; Küssner & Leech-Wilkinson,
<xref ref-type="bibr" rid="CR20">2014</xref>
), and complements work on more general associations of auditory and spatial mappings in movement (Eitan & Granot,
<xref ref-type="bibr" rid="CR13">2006</xref>
; Burger, Thompson, Luck, Saarikallio & Toiviainen,
<xref ref-type="bibr" rid="CR9">2013</xref>
). However, as Eitan & Timmers (
<xref ref-type="bibr" rid="CR14">2010</xref>
) note, it is possible that other visual and metaphorical features interact with how one represents cross-modal mapping and this should be further refined.</p>
<p>Although additional work will be necessary to ascertain the ideal parameters for the various clinical and non-clinical populations for whom this method may be relevant, animated, combined visualizations are robustly found to be most fitting for speech sounds in healthy participants. Clearly, the application is not limited to speech therapy for PD, but may be extended to other rehabilitation or learning goals. Within speech therapy contexts, this method may be extended to groups experiencing dysarthric speech resulting from varying underlying neurological problems (cf. Kim et al.,
<xref ref-type="bibr" rid="CR19">2011</xref>
), but the method may also benefit individuals with hearing disorders. Non-clinical feedback-based learning paradigms that involve speech, such as learning formants or tonal aspects in second language learning and music pedagogy applications that make use of corresponding mappings (e.g., de Bot,
<xref ref-type="bibr" rid="CR7">1983</xref>
; Hoppe, Sadakata & Desain,
<xref ref-type="bibr" rid="CR17">2006</xref>
; Nijs & Leman,
<xref ref-type="bibr" rid="CR24">2014</xref>
), could also benefit from these findings. Furthermore, the investigations of other possible mappings of sound aspects (spectral content, aspiration, attack and decay times and so on) to visual aspects (shape, color, brightness and more) offer many more possibilities to fully utilize the auditory and visual domains for feedback learning-based applications.</p>
</sec>
</body>
<back>
<ack>
<p>The authors gratefully acknowledge the support of Stichting IT Projecten Nijmegen. We want to thank Ruud Barth, Saskia Koldijk and Jered Vroon for their initial design of the visualization.</p>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Athanasopoulos</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Moran</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Cross-cultural representations of musical shapes</article-title>
<source>Empirical Musicology Review</source>
<year>2013</year>
<volume>8</volume>
<issue>3–4</issue>
<fpage>185</fpage>
<lpage>199</lpage>
</element-citation>
</ref>
<ref id="CR2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beijer</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Rietveld</surname>
<given-names>ACM</given-names>
</name>
<name>
<surname>Hoskam</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Geurts</surname>
<given-names>ACH</given-names>
</name>
<name>
<surname>de Swart</surname>
<given-names>BJM</given-names>
</name>
</person-group>
<article-title>Evaluating the feasibility and the potential efficacy of e-learning-based speech therapy (EST) as a web application for speech training in dysarthric patients with Parkinson’s disease: a case study</article-title>
<source>Telemedicine Journal and E-Health</source>
<year>2010</year>
<volume>16</volume>
<issue>6</issue>
<fpage>732</fpage>
<lpage>738</lpage>
<pub-id pub-id-type="doi">10.1089/tmj.2009.0183</pub-id>
<pub-id pub-id-type="pmid">20618088</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beijer</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Rietveld</surname>
<given-names>ACM</given-names>
</name>
<name>
<surname>van Beers</surname>
<given-names>MMA</given-names>
</name>
<name>
<surname>Slangen</surname>
<given-names>RML</given-names>
</name>
<name>
<surname>van den Heuvel</surname>
<given-names>H</given-names>
</name>
<name>
<surname>de Swart</surname>
<given-names>BJM</given-names>
</name>
<name>
<surname>Geurts</surname>
<given-names>ACH</given-names>
</name>
</person-group>
<article-title>E-learning-based speech therapy: a web application for speech training</article-title>
<source>Telemedicine Journal and E-Health</source>
<year>2010</year>
<volume>16</volume>
<issue>2</issue>
<fpage>177</fpage>
<lpage>180</lpage>
<pub-id pub-id-type="doi">10.1089/tmj.2009.0104</pub-id>
<pub-id pub-id-type="pmid">20184455</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beijer</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Rietveld</surname>
<given-names>ACM</given-names>
</name>
<name>
<surname>Ruiter</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Geurts</surname>
<given-names>ACH</given-names>
</name>
</person-group>
<article-title>Preparing an E-learning-based Speech Therapy (EST) efficacy study: identifying suitable outcome measures to detect within subject changes of speech intelligibility in dysarthric speakers</article-title>
<source>Clinical Linguistics & Phonetics</source>
<year>2014</year>
<volume>28</volume>
<issue>12</issue>
<fpage>927</fpage>
<lpage>950</lpage>
<pub-id pub-id-type="doi">10.3109/02699206.2014.936627</pub-id>
<pub-id pub-id-type="pmid">25025268</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beijer</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Rietveld</surname>
<given-names>ACM</given-names>
</name>
<name>
<surname>van Stiphout</surname>
<given-names>AJL</given-names>
</name>
</person-group>
<article-title>Auditory discrimination as a condition for E-learning based Speech Therapy: a proposal for an Auditory Discrimination Test (ADT) for adult dysarthric speakers</article-title>
<source>Journal of Communication Disorders</source>
<year>2011</year>
<volume>44</volume>
<fpage>701</fpage>
<lpage>718</lpage>
<pub-id pub-id-type="doi">10.1016/j.jcomdis.2011.05.002</pub-id>
<pub-id pub-id-type="pmid">21719027</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<mixed-citation publication-type="other">Boersma, P., & Weenink, D. (2005). Praat: doing phonetics by computer (Version 4.3.01) [Computer program]. Accessed online:
<ext-link ext-link-type="uri" xlink:href="http://www.praat.org/">http://www.praat.org/</ext-link>
</mixed-citation>
</ref>
<ref id="CR7">
<mixed-citation publication-type="other">Bot de, K. (1983). Visual feedback of intonation I: Effectiveness and induced practice behavior.
<italic>Language and Speech, 26,</italic>
331–350.</mixed-citation>
</ref>
<ref id="CR8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brandmeyer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Timmers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sadakata</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Desain</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Learning expressive percussion performance under different visual feedback conditions</article-title>
<source>Psychological Research</source>
<year>2011</year>
<volume>75</volume>
<issue>2</issue>
<fpage>107</fpage>
<lpage>121</lpage>
<pub-id pub-id-type="doi">10.1007/s00426-010-0291-6</pub-id>
<pub-id pub-id-type="pmid">20574662</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burger</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Luck</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Saarikallio</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Toiviainen</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Influences of rhythm- and timbre-related musical features on characteristics of music-induced movement</article-title>
<source>Frontiers in Psychology</source>
<year>2013</year>
<volume>4</volume>
<fpage>183</fpage>
<pub-id pub-id-type="doi">10.3389/fpsyg.2013.00183</pub-id>
<pub-id pub-id-type="pmid">23641220</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Demenko</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cylwik</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>The use of speech technology in foreign language pronunciations training</article-title>
<source>Archives of Acoustics</source>
<year>2010</year>
<volume>35</volume>
<issue>3</issue>
<fpage>309</fpage>
<lpage>329</lpage>
<pub-id pub-id-type="doi">10.2478/v10168-010-0027-z</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dixon</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Evaluation of the audio beat tracking system BeatRoot</article-title>
<source>Journal of New Music Research</source>
<year>2007</year>
<volume>36</volume>
<fpage>39</fpage>
<lpage>50</lpage>
<pub-id pub-id-type="doi">10.1080/09298210701653310</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dolscheid</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shayan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Majid</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Casasanto</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>The thickness of musical pitch: psychophysical evidence for linguistic relativity</article-title>
<source>Psychological Science</source>
<year>2013</year>
<volume>24</volume>
<issue>5</issue>
<fpage>613</fpage>
<lpage>621</lpage>
<pub-id pub-id-type="doi">10.1177/0956797612457374</pub-id>
<pub-id pub-id-type="pmid">23538914</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eitan</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Granot</surname>
<given-names>RY</given-names>
</name>
</person-group>
<article-title>How music moves: musical parameters and listeners’ images of motion</article-title>
<source>Music Perception</source>
<year>2006</year>
<volume>23</volume>
<fpage>221</fpage>
<lpage>247</lpage>
<pub-id pub-id-type="doi">10.1525/mp.2006.23.3.221</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eitan</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Timmers</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Beethoven’s last piano sonata and those who follow crocodiles: cross-domain mappings of auditory pitch in a musical context</article-title>
<source>Cognition</source>
<year>2010</year>
<volume>114</volume>
<fpage>405</fpage>
<lpage>422</lpage>
<pub-id pub-id-type="doi">10.1016/j.cognition.2009.10.013</pub-id>
<pub-id pub-id-type="pmid">20036356</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gallace</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Spence</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Multisensory synesthetic interactions in the speeded classification of visual size</article-title>
<source>Perception & Psychophysics</source>
<year>2006</year>
<volume>68</volume>
<fpage>1191</fpage>
<lpage>1203</lpage>
<pub-id pub-id-type="doi">10.3758/BF03193720</pub-id>
<pub-id pub-id-type="pmid">17355042</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ho</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Bradshaw</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Iansek</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Volume perception in Parkinsonian speech</article-title>
<source>Movement Disorders</source>
<year>2000</year>
<volume>15</volume>
<issue>6</issue>
<fpage>1125</fpage>
<lpage>1131</lpage>
<pub-id pub-id-type="doi">10.1002/1531-8257(200011)15:6<1125::AID-MDS1010>3.0.CO;2-R</pub-id>
<pub-id pub-id-type="pmid">11104195</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoppe</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sadakata</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Desain</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Development of real-time visual feedback assistance in singing training: a review</article-title>
<source>Journal of Computer Assisted Learning</source>
<year>2006</year>
<volume>22</volume>
<issue>4</issue>
<fpage>308</fpage>
<lpage>316</lpage>
<pub-id pub-id-type="doi">10.1111/j.1365-2729.2006.00178.x</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ishihara</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Keller</surname>
<given-names>PE</given-names>
</name>
<name>
<surname>Rossetti</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Prinz</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Horizontal spatial representations of time: evidence for the STEARC effect</article-title>
<source>Cortex</source>
<year>2008</year>
<volume>44</volume>
<fpage>454</fpage>
<lpage>461</lpage>
<pub-id pub-id-type="doi">10.1016/j.cortex.2007.08.010</pub-id>
<pub-id pub-id-type="pmid">18387578</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kent</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Weismer</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>An acoustic study of the relationships among neurologic disease, dysarthria type and severity of dysarthria</article-title>
<source>Journal of Speech, Language and Hearing Research</source>
<year>2011</year>
<volume>54</volume>
<fpage>417</fpage>
<lpage>429</lpage>
<pub-id pub-id-type="doi">10.1044/1092-4388(2010/10-0020)</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Küssner</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Leech-Wilkinson</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Investigating the influence of musical training on cross-modal correspondences and sensorimotor skills in a real-time drawing paradigm</article-title>
<source>Psychology of Music</source>
<year>2014</year>
<volume>42</volume>
<issue>3</issue>
<fpage>448</fpage>
<lpage>469</lpage>
<pub-id pub-id-type="doi">10.1177/0305735613482022</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<mixed-citation publication-type="other">Küssner, M. B., Tidhar, D., Prior, H. M. & Leech-Wilkinson, D. (2014). Musicians are more consistent: gestural cross-modal mappings of pitch, loudness and tempo in real-time.
<italic>Frontiers in Psychology, 5</italic>
, Art. 789. doi: 10.3389/fpsyg.2014.00789</mixed-citation>
</ref>
<ref id="CR22">
<mixed-citation publication-type="other">Lipscomb, S. D., & Kim, E. M. (2004). Perceived match between visual parameters and auditory correlates: an experimental multimedia investigation. In: Lipscomb, S., Ashley, R., Gjerdingen, R. & Webster, P. (Eds.),
<italic>Proceedings of the 8th International Conference on Music Perception and Cognition (ICMPC8),</italic>
pp. 72–75. Evanston, IL, 3–8 August, 2004. Adelaide: Causal Productions.</mixed-citation>
</ref>
<ref id="CR23">
<mixed-citation publication-type="other">McLeod, P. (2008). Fast, accurate pitch detection tools for music analysis. Unpublished PhD thesis. Department of Computer Science, University of Otago, 2008. Accessed 7 March 2012 at
<ext-link ext-link-type="uri" xlink:href="http://miracle.otago.ac.nz/tartini/papers.html">http://miracle.otago.ac.nz/tartini/papers.html</ext-link>
</mixed-citation>
</ref>
<ref id="CR24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nijs</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Leman</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Interactive technologies in the instrumental music classroom: a longitudinal study with the Music Paint Machine</article-title>
<source>Computers & Education</source>
<year>2014</year>
<volume>73</volume>
<fpage>40</fpage>
<lpage>59</lpage>
<pub-id pub-id-type="doi">10.1016/j.compedu.2013.11.008</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Plomp</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mimpen</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Improving the reliability of testing the speech reception threshold for sentences</article-title>
<source>Audiology</source>
<year>1979</year>
<volume>18</volume>
<fpage>43</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="doi">10.3109/00206097909072618</pub-id>
<pub-id pub-id-type="pmid">760724</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rossiter</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>DeCosta</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Voice development under training with and without the influence of real-time visually presented biofeedback</article-title>
<source>Journal of Acoustical Society of America</source>
<year>1996</year>
<volume>99</volume>
<fpage>3253</fpage>
<lpage>3256</lpage>
<pub-id pub-id-type="doi">10.1121/1.414872</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rusconi</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kwan</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Giordano</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Umiltá</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Butterworth</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Spatial representation of pitch height: the SMARC effect</article-title>
<source>Cognition</source>
<year>2006</year>
<volume>99</volume>
<fpage>113</fpage>
<lpage>129</lpage>
<pub-id pub-id-type="doi">10.1016/j.cognition.2005.01.004</pub-id>
<pub-id pub-id-type="pmid">15925355</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sadakata</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hoppe</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Brandmeyer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Timmers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Desain</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Real-time visual feedback for learning to perform short rhythms with expressive variations in timing and loudness</article-title>
<source>Journal of New Music Research</source>
<year>2008</year>
<volume>37</volume>
<issue>3</issue>
<fpage>207</fpage>
<lpage>220</lpage>
<pub-id pub-id-type="doi">10.1080/09298210802322401</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Scheffé</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>An analysis of variance for paired comparisons</article-title>
<source>Journal of the American Statistical Association</source>
<year>1952</year>
<volume>47</volume>
<issue>259</issue>
<fpage>381</fpage>
<lpage>400</lpage>
</element-citation>
</ref>
<ref id="CR30">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Schmidt</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>TD</given-names>
</name>
</person-group>
<source>Motor control and learning: a behavioral emphasis</source>
<year>2010</year>
<edition>5</edition>
<publisher-loc>Champaign, IL</publisher-loc>
<publisher-name>Human Kinetics</publisher-name>
</element-citation>
</ref>
<ref id="CR31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spence</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Crossmodal correspondences: a tutorial review</article-title>
<source>Attention Perception Psychophysics</source>
<year>2011</year>
<volume>73</volume>
<fpage>971</fpage>
<lpage>995</lpage>
<pub-id pub-id-type="doi">10.3758/s13414-010-0073-7</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<mixed-citation publication-type="other">Stowell D., & Plumbley, M.D. (2007). Adaptive whitening for improved real-time audio onset detection. In:
<italic>Proceedings of the International Computer Music Conference (ICMC’07)</italic>
. Vol 18. Denmark, August 2007.</mixed-citation>
</ref>
<ref id="CR33">
<mixed-citation publication-type="other">Swart de, B. J., Willemse, S. C., Maassen, B. A., & Horstink, M. W. (2003). Improvement of voicing in patients with Parkinson’s disease by speech therapy.
<italic>Neurology, 60</italic>
(3), 498–500. doi:10.1212/01.WNL.0000044480.95458.56</mixed-citation>
</ref>
<ref id="CR34">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Timmers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sadakata</surname>
<given-names>M</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Fabian</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Timmers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Schubert</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Training expressive performance by means of visual feedback: existing and potential applications of performance measurement techniques</article-title>
<source>Expression in Music Performance</source>
<year>2014</year>
<publisher-loc>Oxford</publisher-loc>
<publisher-name>Oxford University Press</publisher-name>
</element-citation>
</ref>
<ref id="CR35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Troche</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Troche</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Berkowitz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Grossman</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Reilly</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Tone discrimination as a window into acoustic perceptual deficits in Parkinson’s disease</article-title>
<source>American Journal of Speech-Language Pathology</source>
<year>2012</year>
<volume>21</volume>
<issue>3</issue>
<fpage>258</fpage>
<lpage>263</lpage>
<pub-id pub-id-type="doi">10.1044/1058-0360(2012/11-0007)</pub-id>
<pub-id pub-id-type="pmid">22442285</pub-id>
</element-citation>
</ref>
<ref id="CR36">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walker</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bremner</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Spring</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mattock</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Slater</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>SP</given-names>
</name>
</person-group>
<article-title>Preverbal infants’ sensitivity to synaesthetic cross-modality correspondences</article-title>
<source>Psychological Science</source>
<year>2009</year>
<volume>21</volume>
<fpage>21</fpage>
<lpage>25</lpage>
<pub-id pub-id-type="doi">10.1177/0956797609354734</pub-id>
<pub-id pub-id-type="pmid">20424017</pub-id>
</element-citation>
</ref>
<ref id="CR37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walker</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>The effects of culture, environment, age and musical training on choices of visual metaphors for sound</article-title>
<source>Perception & Psychophysics</source>
<year>1987</year>
<volume>42</volume>
<issue>5</issue>
<fpage>491</fpage>
<lpage>502</lpage>
<pub-id pub-id-type="doi">10.3758/BF03209757</pub-id>
<pub-id pub-id-type="pmid">2447557</pub-id>
</element-citation>
</ref>
<ref id="CR38">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Watanabe</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tomishige</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nakatake</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Speech visualization by integrating features for the hearing impaired</article-title>
<source>IEEE Transactions on Speech and Audio Processing</source>
<year>2000</year>
<volume>8</volume>
<issue>4</issue>
<fpage>454</fpage>
<lpage>466</lpage>
<pub-id pub-id-type="doi">10.1109/89.848226</pub-id>
</element-citation>
</ref>
<ref id="CR39">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilson</surname>
<given-names>PH</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Callaghan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Thorpe</surname>
<given-names>CW</given-names>
</name>
</person-group>
<article-title>Learning to sing in tune: does real-time visual feedback help?</article-title>
<source>Journal of Interdisciplinary Music Studies</source>
<year>2008</year>
<volume>2</volume>
<fpage>157</fpage>
<lpage>172</lpage>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000050 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000050 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4828474
   |texte=   Intuitive visualizations of pitch and loudness in speech
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26370217" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MusicSarreV3 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024