HapticV1, Ncbi, Merge, bibRecord, 001C36

An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

Identifieur interne : 001C36 ( Ncbi/Merge ); précédent : 001C35; suivant : 001C37

An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

Auteurs : Nicholas Altieri [États-Unis] ; James T. Townsend [États-Unis]

Source :

Frontiers in Psychology [ 1664-1078 ] ; 2011.

RBID : PMC:3180170

Abstract

Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180170

DOI: 10.3389/fpsyg.2011.00238
PubMed: 21980314
PubMed Central: 3180170

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: 001F54
to stream Pmc, to step Curation: 001F54
to stream Pmc, to step Checkpoint: 001C45

Links to Exploration step

PMC:3180170

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception</title>
<author><name sortKey="Altieri, Nicholas" sort="Altieri, Nicholas" uniqKey="Altieri N" first="Nicholas" last="Altieri">Nicholas Altieri</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><institution>Department of Psychology, The University of Oklahoma</institution>
<country>Norman, OK, USA</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Townsend, James T" sort="Townsend, James T" uniqKey="Townsend J" first="James T." last="Townsend">James T. Townsend</name>
<affiliation wicri:level="1"><nlm:aff id="aff2"><institution>Department of Psychological and Brain Sciences, Indiana University</institution>
<country>Bloomington, IN, USA</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">21980314</idno>
<idno type="pmc">3180170</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180170</idno>
<idno type="RBID">PMC:3180170</idno>
<idno type="doi">10.3389/fpsyg.2011.00238</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Pmc/Corpus">001F54</idno>
<idno type="wicri:Area/Pmc/Curation">001F54</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001C45</idno>
<idno type="wicri:Area/Ncbi/Merge">001C36</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception</title>
<author><name sortKey="Altieri, Nicholas" sort="Altieri, Nicholas" uniqKey="Altieri N" first="Nicholas" last="Altieri">Nicholas Altieri</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><institution>Department of Psychology, The University of Oklahoma</institution>
<country>Norman, OK, USA</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Townsend, James T" sort="Townsend, James T" uniqKey="Townsend J" first="James T." last="Townsend">James T. Townsend</name>
<affiliation wicri:level="1"><nlm:aff id="aff2"><institution>Department of Psychological and Brain Sciences, Indiana University</institution>
<country>Bloomington, IN, USA</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Frontiers in Psychology</title>
<idno type="eISSN">1664-1078</idno>
<imprint><date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Altieri, N" uniqKey="Altieri N">N. Altieri</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altieri, N" uniqKey="Altieri N">N. Altieri</name>
</author>
<author><name sortKey="Wenger, M J" uniqKey="Wenger M">M. J. Wenger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Arnold, D H" uniqKey="Arnold D">D. H. Arnold</name>
</author>
<author><name sortKey="Tear, M" uniqKey="Tear M">M. Tear</name>
</author>
<author><name sortKey="Schindel, R" uniqKey="Schindel R">R. Schindel</name>
</author>
<author><name sortKey="Roseboom, W" uniqKey="Roseboom W">W. Roseboom</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Barutchu, A" uniqKey="Barutchu A">A. Barutchu</name>
</author>
<author><name sortKey="Crewther, D P" uniqKey="Crewther D">D. P. Crewther</name>
</author>
<author><name sortKey="Crewther, S G" uniqKey="Crewther S">S. G. Crewther</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Barutchu, A" uniqKey="Barutchu A">A. Barutchu</name>
</author>
<author><name sortKey="Danaher, J" uniqKey="Danaher J">J. Danaher</name>
</author>
<author><name sortKey="Crewther, S G" uniqKey="Crewther S">S. G. Crewther</name>
</author>
<author><name sortKey="Innes Brown, H" uniqKey="Innes Brown H">H. Innes-Brown</name>
</author>
<author><name sortKey="Shivdasani, M N" uniqKey="Shivdasani M">M. N. Shivdasani</name>
</author>
<author><name sortKey="Paolini, A G" uniqKey="Paolini A">A. G. Paolini</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bergeson, T R" uniqKey="Bergeson T">T. R. Bergeson</name>
</author>
<author><name sortKey="Pisoni, D B" uniqKey="Pisoni D">D. B. Pisoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bernstein, L E" uniqKey="Bernstein L">L. E. Bernstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bernstein, L E" uniqKey="Bernstein L">L. E. Bernstein</name>
</author>
<author><name sortKey="Auer, E T" uniqKey="Auer E">E. T. Auer</name>
</author>
<author><name sortKey="Moore, J K" uniqKey="Moore J">J. K. Moore</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berryhill, M" uniqKey="Berryhill M">M. Berryhill</name>
</author>
<author><name sortKey="Kveraga, K" uniqKey="Kveraga K">K. Kveraga</name>
</author>
<author><name sortKey="Webb, L" uniqKey="Webb L">L. Webb</name>
</author>
<author><name sortKey="Hughes, H C" uniqKey="Hughes H">H. C. Hughes</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Braida, L D" uniqKey="Braida L">L. D. Braida</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bundesen, C" uniqKey="Bundesen C">C. Bundesen</name>
</author>
<author><name sortKey="Habekost, T" uniqKey="Habekost T">T. Habekost</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Colonius, H" uniqKey="Colonius H">H. Colonius</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Eidels, A" uniqKey="Eidels A">A. Eidels</name>
</author>
<author><name sortKey="Houpt, J" uniqKey="Houpt J">J. Houpt</name>
</author>
<author><name sortKey="Altieri, N" uniqKey="Altieri N">N. Altieri</name>
</author>
<author><name sortKey="Pei, L" uniqKey="Pei L">L. Pei</name>
</author>
<author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fournier, L R" uniqKey="Fournier L">L. R. Fournier</name>
</author>
<author><name sortKey="Eriksen, C W" uniqKey="Eriksen C">C. W. Eriksen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fowler, C A" uniqKey="Fowler C">C. A. Fowler</name>
</author>
<author><name sortKey="Dekle, D J" uniqKey="Dekle D">D. J. Dekle</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fowler, C A" uniqKey="Fowler C">C. A. Fowler</name>
</author>
<author><name sortKey="Rosenblum, L D" uniqKey="Rosenblum L">L. D. Rosenblum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grant, K W" uniqKey="Grant K">K. W. Grant</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grant, K W" uniqKey="Grant K">K. W. Grant</name>
</author>
<author><name sortKey="Walden, B E" uniqKey="Walden B">B. E. Walden</name>
</author>
<author><name sortKey="Seitz, P F" uniqKey="Seitz P">P. F. Seitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Green, K P" uniqKey="Green K">K. P. Green</name>
</author>
<author><name sortKey="Miller, J L" uniqKey="Miller J">J. L. Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grice, G R" uniqKey="Grice G">G. R. Grice</name>
</author>
<author><name sortKey="Canham, L" uniqKey="Canham L">L. Canham</name>
</author>
<author><name sortKey="Gwynne, J W" uniqKey="Gwynne J">J. W. Gwynne</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jesse, A" uniqKey="Jesse A">A. Jesse</name>
</author>
<author><name sortKey="Massaro, D W" uniqKey="Massaro D">D. W. Massaro</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Liberman, A M" uniqKey="Liberman A">A. M. Liberman</name>
</author>
<author><name sortKey="Mattingly, I G" uniqKey="Mattingly I">I. G. Mattingly</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ma, W J" uniqKey="Ma W">W. J. Ma</name>
</author>
<author><name sortKey="Zhou, X" uniqKey="Zhou X">X. Zhou</name>
</author>
<author><name sortKey="Ross, L A" uniqKey="Ross L">L. A. Ross</name>
</author>
<author><name sortKey="Foxe, J J" uniqKey="Foxe J">J. J. Foxe</name>
</author>
<author><name sortKey="Parra, L C" uniqKey="Parra L">L. C. Parra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Massaro, D W" uniqKey="Massaro D">D. W. Massaro</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Massaro, D W" uniqKey="Massaro D">D. W. Massaro</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Massaro, D W" uniqKey="Massaro D">D. W. Massaro</name>
</author>
<author><name sortKey="Cohen, M M" uniqKey="Cohen M">M. M. Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Massaro, D W" uniqKey="Massaro D">D. W. Massaro</name>
</author>
<author><name sortKey="Cohen, M M" uniqKey="Cohen M">M. M. Cohen</name>
</author>
<author><name sortKey="Smeele, P M T" uniqKey="Smeele P">P. M. T. Smeele</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mcgurk, H" uniqKey="Mcgurk H">H. McGurk</name>
</author>
<author><name sortKey="Macdonald, J W" uniqKey="Macdonald J">J. W. MacDonald</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Meredith, M A" uniqKey="Meredith M">M. A. Meredith</name>
</author>
<author><name sortKey="Stein, B E" uniqKey="Stein B">B. E. Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miller, J" uniqKey="Miller J">J. Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miller, J" uniqKey="Miller J">J. Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Molholm, S" uniqKey="Molholm S">S. Molholm</name>
</author>
<author><name sortKey="Ritter, W" uniqKey="Ritter W">W. Ritter</name>
</author>
<author><name sortKey="Javitt, D C" uniqKey="Javitt D">D. C. Javitt</name>
</author>
<author><name sortKey="Foxe, J J" uniqKey="Foxe J">J. J. Foxe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pilling, M" uniqKey="Pilling M">M. Pilling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ponton, C W" uniqKey="Ponton C">C. W. Ponton</name>
</author>
<author><name sortKey="Bernstein, L E" uniqKey="Bernstein L">L. E. Bernstein</name>
</author>
<author><name sortKey="Auer, E T" uniqKey="Auer E">E. T. Auer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Raab, D H" uniqKey="Raab D">D. H. Raab</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rosenblum, L D" uniqKey="Rosenblum L">L. D. Rosenblum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ross, L A" uniqKey="Ross L">L. A. Ross</name>
</author>
<author><name sortKey="Saint Amour, D" uniqKey="Saint Amour D">D. Saint-Amour</name>
</author>
<author><name sortKey="Leavitt, V M" uniqKey="Leavitt V">V. M. Leavitt</name>
</author>
<author><name sortKey="Javitt, D C" uniqKey="Javitt D">D. C. Javitt</name>
</author>
<author><name sortKey="Foxe, J J" uniqKey="Foxe J">J. J. Foxe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sekiyama, K" uniqKey="Sekiyama K">K. Sekiyama</name>
</author>
<author><name sortKey="Tohkura, Y" uniqKey="Tohkura Y">Y. Tohkura</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sherffert, S" uniqKey="Sherffert S">S. Sherffert</name>
</author>
<author><name sortKey="Lachs, L" uniqKey="Lachs L">L. Lachs</name>
</author>
<author><name sortKey="Hernandez, L R" uniqKey="Hernandez L">L. R. Hernandez</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sommers, M" uniqKey="Sommers M">M. Sommers</name>
</author>
<author><name sortKey="Tye Murray, N" uniqKey="Tye Murray N">N. Tye-Murray</name>
</author>
<author><name sortKey="Spehar, B" uniqKey="Spehar B">B. Spehar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sternberg, S" uniqKey="Sternberg S">S. Sternberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Stevenson, R A" uniqKey="Stevenson R">R. A. Stevenson</name>
</author>
<author><name sortKey="James, T W" uniqKey="James T">T. W. James</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sumby, W H" uniqKey="Sumby W">W. H. Sumby</name>
</author>
<author><name sortKey="Pollack, I" uniqKey="Pollack I">I. Pollack</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Summerfield, Q" uniqKey="Summerfield Q">Q. Summerfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
<author><name sortKey="Ashby, F G" uniqKey="Ashby F">F. G. Ashby</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
<author><name sortKey="Ashby, F G" uniqKey="Ashby F">F. G. Ashby</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
<author><name sortKey="Nozawa, G" uniqKey="Nozawa G">G. Nozawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
<author><name sortKey="Schweickert, R" uniqKey="Schweickert R">R. Schweickert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
<author><name sortKey="Wenger, M J" uniqKey="Wenger M">M. J. Wenger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
<author><name sortKey="Wenger, M J" uniqKey="Wenger M">M. J. Wenger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Wassenhove, V" uniqKey="Van Wassenhove V">V. van Wassenhove</name>
</author>
<author><name sortKey="Grant, K W" uniqKey="Grant K">K. W. Grant</name>
</author>
<author><name sortKey="Poeppel, D" uniqKey="Poeppel D">D. Poeppel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Walker, S" uniqKey="Walker S">S. Walker</name>
</author>
<author><name sortKey="Bruce, V" uniqKey="Bruce V">V. Bruce</name>
</author>
<author><name sortKey="O Alley, C" uniqKey="O Alley C">C. O’Malley</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wenger, M J" uniqKey="Wenger M">M. J. Wenger</name>
</author>
<author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wenger, M J" uniqKey="Wenger M">M. J. Wenger</name>
</author>
<author><name sortKey="Townsend, J T" uniqKey="Townsend J">J. T. Townsend</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Winneke, A H" uniqKey="Winneke A">A. H. Winneke</name>
</author>
<author><name sortKey="Phillips, N A" uniqKey="Phillips N">N. A. Phillips</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Front Psychol</journal-id>
<journal-id journal-id-type="publisher-id">Front. Psychology</journal-id>
<journal-title-group><journal-title>Frontiers in Psychology</journal-title>
</journal-title-group>
<issn pub-type="epub">1664-1078</issn>
<publisher><publisher-name>Frontiers Research Foundation</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">21980314</article-id>
<article-id pub-id-type="pmc">3180170</article-id>
<article-id pub-id-type="doi">10.3389/fpsyg.2011.00238</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Psychology</subject>
<subj-group><subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group><article-title>An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Altieri</surname>
<given-names>Nicholas</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">*</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Townsend</surname>
<given-names>James T.</given-names>
</name>
<xref ref-type="aff" rid="aff2"><sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup>
<institution>Department of Psychology, The University of Oklahoma</institution>
<country>Norman, OK, USA</country>
</aff>
<aff id="aff2"><sup>2</sup>
<institution>Department of Psychological and Brain Sciences, Indiana University</institution>
<country>Bloomington, IN, USA</country>
</aff>
<author-notes><fn fn-type="edited-by"><p>Edited by: Colin Davis, Royal Holloway University of London, UK</p>
</fn>
<fn fn-type="edited-by"><p>Reviewed by: Colin Davis, Royal Holloway University of London, UK; Axel Winneke, Jacobs University Bremen, Germany</p>
</fn>
<corresp id="fn001">*Correspondence: Nicholas Altieri, Department of Psychology, The University of Oklahoma, 3100 Monitor Avenue, Two Partners Place, suite 280, 1-405-325-3936, Norman, OK 73019, USA. e-mail: <email>nick.altieri@ou.edu</email>
</corresp>
<fn fn-type="other" id="fn002"><p>This article was submitted to Frontiers in Cognitive Science, a specialty of Frontiers in Psychology.</p>
</fn>
</author-notes>
<pub-date pub-type="epub"><day>26</day>
<month>9</month>
<year>2011</year>
</pub-date>
<pub-date pub-type="collection"><year>2011</year>
</pub-date>
<volume>2</volume>
<elocation-id>238</elocation-id>
<history><date date-type="received"><day>01</day>
<month>2</month>
<year>2011</year>
</date>
<date date-type="accepted"><day>30</day>
<month>8</month>
<year>2011</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2011 Altieri and Townsend.</copyright-statement>
<copyright-year>2011</copyright-year>
<license license-type="open-access" xlink:href="http://www.frontiersin.org/licenseagreement"><license-p>This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with.</license-p>
</license>
</permissions>
<abstract><p>Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio.</p>
</abstract>
<kwd-group><kwd>speech</kwd>
<kwd>multisensory integration</kwd>
<kwd>coactive</kwd>
<kwd>parallel</kwd>
<kwd>capacity</kwd>
</kwd-group>
<counts><fig-count count="5"></fig-count>
<table-count count="2"></table-count>
<equation-count count="2"></equation-count>
<ref-count count="59"></ref-count>
<page-count count="15"></page-count>
<word-count count="12940"></word-count>
</counts>
</article-meta>
</front>
<body><sec sec-type="introduction"><title>Introduction</title>
<p>When someone utilizes lip-reading to take advantage of both auditory and visual modalities, how is this accomplished? Research shows that even normal-hearing individuals benefit in accuracy from bimodal information in low-to-moderate signal-to-noise ratio conditions (e.g., Sumby and Pollack, <xref ref-type="bibr" rid="B43">1954</xref>
). Speech perception is a multimodal perceptual phenomenon that relies on auditory, visual, and even haptic information as inputs to the system where word recognition is the output (e.g., McGurk and MacDonald, <xref ref-type="bibr" rid="B28">1976</xref>
; Massaro, <xref ref-type="bibr" rid="B24">1987</xref>
; Fowler and Dekle, <xref ref-type="bibr" rid="B15">1991</xref>
). Multimodal perception has become an area of burgeoning interest in sensory and cognitive areas of psychology. Yet, the real-time processing mechanisms of lip-reading and how they relate to auditory word perception remain opaque (see Jesse and Massaro, <xref ref-type="bibr" rid="B21">2010</xref>
, for a recent study using the gating paradigm). Due to the very different methodologies used, the great mass of work in the audiovisual literature does not speak to the issues we investigate in this study<xref ref-type="fn" rid="fn1"><sup>1</sup>
</xref>
.</p>
<p>Within the domain of response times (RTs), theory-driven methodologies have been developed to identify key processing characteristics applicable to bimodal speech perception (e.g., Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
; Townsend and Wenger, <xref ref-type="bibr" rid="B52">2004a</xref>
). A problem of interest that will be described in the following sections concerns whether the information from the auditory and visual modalities is combined or “integrated” in the early stages of processing, or rather in later stages after phoneme, syllable, or even word recognition. These issues are important in building a theory of multimodal speech perception since ultimately they must be specified in any real-time processing system. However, no determinations of these processing issues have been made in the area of audiovisual speech perception and few have been made in general studies of multimodal processing. Research involving non-speech multisensory stimuli in detection tasks has shown evidence for early bimodal interactions (see Barutchu et al., <xref ref-type="bibr" rid="B4">2009</xref>
; Barutchu et al., <xref ref-type="bibr" rid="B5">2010</xref>
 for studies using children and adults). Other examples include: Miller (<xref ref-type="bibr" rid="B30">1982</xref>
, <xref ref-type="bibr" rid="B31">1986</xref>
) who used pure tones and dots, Berryhill et al., <xref ref-type="bibr" rid="B9">2007</xref>
 (see also Fournier and Eriksen, <xref ref-type="bibr" rid="B14">1990</xref>
) who used images of numerals/letters plus sounds, and Molholm et al. (<xref ref-type="bibr" rid="B32">2004</xref>
) who used still images of animals combined with vocalizations. A brief introduction to the systems factorial technology components pertinent to audiovisual speech processing will now be provided. More rigorous definitions shall appear later. Specific relationships to bimodal speech perception noted immediately thereafter.</p>
<p>First, <italic>architecture</italic>
 refers to whether bimodal channels are operating in <italic>parallel</italic>
 or in a <italic>coactive</italic>
 fashion (see Townsend and Wenger, <xref ref-type="bibr" rid="B52">2004a</xref>
; see also Miller, <xref ref-type="bibr" rid="B30">1982</xref>
)<xref ref-type="fn" rid="fn2"><sup>2</sup>
</xref>
. Also, certain types of parallel systems can also be experimentally discriminated from one another, such as <italic>separate decisions</italic>
 versus <italic>coactive</italic>
 where information is pooled into a final conduit (see Figure <xref ref-type="fig" rid="F1">1</xref>
). Certainly, the peripheral physiological tributaries transmit sensory information in parallel to begin with, but the exact mechanisms and the higher order real-time properties required for various psychological tasks, such as processing of linguistic information from different modalities, remain unknown. A schematic diagram of potential models of audiovisual speech processing is shown in Figure <xref ref-type="fig" rid="F1">1</xref>
. These include a parallel model in which auditory and visual linguistic information can be recognized separately in distinct auditory and visual pathways or channels. Second, a coactive model is shown, which assumes that auditory and visual speech information are combined and translated into a common code (and therefore, any decision is made on the combined information). Finally, Figure <xref ref-type="fig" rid="F1">1</xref>
 displays the schematics of a serial model.</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p><bold>A diagram of a parallel model (top) with an OR and AND gate (See also Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
 for a similar diagram)</bold>
. The coactive model below assumes that each channel is pooled into a common processor where evidence is accumulated prior to the decision stage. Lastly, the figure depicts a serial model, which assumes that processing does not begin on the second modality until it finishes processing on the first.</p>
</caption>
<graphic xlink:href="fpsyg-02-00238-g001"></graphic>
</fig>
<p>Another important feature of the system concerns its <italic>workload capacity</italic>
. This refers to how the system responds to an increase in workload. Just as it can be expected that processing is parallel when visual and acoustic elements of speech sounds are processed together, a natural prediction is that visual and acoustic forms of a speech sound would be processed at least as fast as in a standard parallel system (see Figure <xref ref-type="fig" rid="F1">1</xref>
), with separate decisions on separate auditory and visual channels, and perhaps faster. If the time it takes to process the bimodal information is the same as predicted by a standard parallel system, it is referred to as <italic>unlimited capacity</italic>
, and if it is faster it is referred to as <italic>super capacity</italic>
. In fact, reasonable assumptions concerning <italic>configural</italic>
 or <italic>holistic</italic>
 perception predict super capacity under conditions akin to those in the present study (Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
; Wenger and Townsend, <xref ref-type="bibr" rid="B57">2001</xref>
). Processes in which information is slower than predicted by a standard parallel process are called <italic>limited capacity</italic>
.</p>
<p>The <italic>decisional</italic>
 <italic>stopping rule</italic>
 determines whether all the items or channels must complete processing before the system terminates and arrives at a decision. When early termination of processing is possible, such as when a target contains redundant information, it is valuable to learn if people can take advantage of the opportunity – which is by no means certain under conditions where responses are made within several hundred milliseconds. This decisional component of processing is important in its own right but in addition, other facets of processing cannot be assessed if it is ignored.</p>
<p>Finally, <italic>stochastic independence versus interaction and influences on capacity</italic>
 indicates whether cross-channel interactions are present. The presence of cross-channel dependencies can be assessed in conjunction with the architecture and capacity analyses (e.g., Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
; Wenger and Townsend, <xref ref-type="bibr" rid="B56">2000</xref>
, <xref ref-type="bibr" rid="B57">2001</xref>
; Townsend and Wenger, <xref ref-type="bibr" rid="B52">2004a</xref>
,<xref ref-type="bibr" rid="B53">b</xref>
). For instance, super capacity follows from mutual facilitatory (positively correlated) interactions and from coactive processing. Either could be associated with configural perception. Limited capacity can be caused by inhibitory interactions among channels, or for example, fixed capacity although other causes are also possible (Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
).</p>
<p>Since evidence from previous speech studies demonstrates that visual information improves the accuracy in near-threshold experiments (e.g., Sumby and Pollack, <xref ref-type="bibr" rid="B43">1954</xref>
; Grant et al., <xref ref-type="bibr" rid="B18">1998</xref>
), an important question arises as to when this interaction takes place. The terminology differs somewhat, but one camp views the interaction as occurring after identification of the information from the separate modalities (e.g., Bernstein et al., <xref ref-type="bibr" rid="B8">2004</xref>
; Bernstein, <xref ref-type="bibr" rid="B7">2005</xref>
), although sensory modulation across modalities can be present before the identification of linguistic information (see van Wassenhove et al., <xref ref-type="bibr" rid="B54">2005</xref>
; Ponton et al., <xref ref-type="bibr" rid="B34">2009</xref>
). The other camp views the interaction as taking place early on, for instance, in some kind of unitary code (e.g., Summerfield, <xref ref-type="bibr" rid="B44">1987</xref>
; Massaro, <xref ref-type="bibr" rid="B25">2004</xref>
; Rosenblum, <xref ref-type="bibr" rid="B36">2005</xref>
). In keeping with the terminology of the literature, we shall refer to the former as <italic>late integration</italic>
, and the latter as <italic>early integration</italic>
 models. Although rarely stated explicitly, it appears that information from the two modalities is assumed to be processed in parallel by both models.</p>
<p>We propose that late integration interpretations might be modeled by parallel systems where identification on each channel takes place after processing on each channel is accomplished. We shall refer to these as <italic>separate decisions parallel models</italic>
 as indicated above. In contrast, early integration processing would appear to be instantiated by parallel models that merge their separate channel information before a final decision is made, perhaps on a modality-free code (see Summerfield, <xref ref-type="bibr" rid="B44">1987</xref>
, for a thorough review that is still topical). In keeping with a literature where this notion has been quantitatively investigated and as informally defined above, we shall call it <italic>coactive processing</italic>
 or simply <italic>coactivation</italic>
. We now turn to relevant theoretical accounts of audiovisual speech processing.</p>
</sec>
<sec><title>Accounts of Multisensory Speech Processing</title>
<p>As noted earlier, previous mathematical models are typically not temporal in nature and are therefore silent with regard to the inherently dynamic processing issues under study here, particularly architecture, capacity, and stopping rule. Most major debates with regard to even the type of processing, separate decisions parallel versus coactive, have taken place at a qualitative level if at all. As discussed in our exposition below, we go forward on the basis that our separate decisions parallel models are natural quantitative candidates for segregated perceptual operations followed by late integration. As observed, we propose that coactive models form a class of minimally complex models of early integration. Our methods do not address such specifics as the code used in the processing channels. Further, due to space constraints, the exposition in this section must be limited to giving the flavor of the debate and a general guide to the literature.</p>
<p>In a review of the audiovisual speech literature, Rosenblum (<xref ref-type="bibr" rid="B36">2005</xref>
) argued that the neuro-physiological underpinnings and information sharing involved in audiovisual speech perception operate by extracting amodal information from the auditory and visual components of the speech signal. This position is based on the theoretical account of speech perception assuming that the primitives of speech perception are gestural – a position taken by the motor (Liberman and Mattingly, <xref ref-type="bibr" rid="B22">1985</xref>
) and articulatory dynamic (Summerfield, <xref ref-type="bibr" rid="B44">1987</xref>
; Fowler and Rosenblum, <xref ref-type="bibr" rid="B16">1991</xref>
) theories of speech processing. Accordingly, each “…sensory modality is largely invisible to the speech perception function and the relevant information for phonetic resolution is modality-neutral” (Rosenblum, <xref ref-type="bibr" rid="B36">2005</xref>
, p. 51). Rosenblum further argued that considerable support for this position comes from evidence showing that the auditory and visual speech streams are integrated in the earliest stages of perception, prior to word recognition or phonetic categorization.</p>
<p>Green and Miller (<xref ref-type="bibr" rid="B19">1985</xref>
) carried out a behavioral study interpreted to be supportive of early audiovisual integration (see Rosenblum, <xref ref-type="bibr" rid="B36">2005</xref>
). The authors demonstrated that the visually perceived rate of articulation influences auditory segment perception. They showed that visual information about place of articulation can influence the perception of voice onset time (VOT). Participants were shown audiovisual clips of a talker saying a syllable that varied auditorially and visually on a continuum from/bi/to/pi/. The corresponding visual information was played either fast or slow. The results demonstrated that rapidly articulated syllables increased the rate at which the participants perceived/bi/relative to/pi/, a finding consistent with early interactions between the auditory and visual components. Further evidence for a recognition process that utilizes both auditory and visual cues has come from studies using Minkowski metrics comparing models of speech integration. Arnold et al. (<xref ref-type="bibr" rid="B3">2010</xref>
) fit a probability summation model, and a model assuming that auditory and visual cues are encoded as a unitary psychological process, to audiovisual identification data. The authors found that the latter model provided a superior fit. Findings such as these indicate a decision process that has access to both auditory and visual information, and combines the two sources of information in the early stages of phonetic perception.</p>
<p>Behavioral data, however, are not unequivocal on the issue of early versus late integration. Bernstein (<xref ref-type="bibr" rid="B7">2005</xref>
) cited several studies showing that integration may in fact occur at later processing stages. For instance, the introduction of large stimulus onset asynchronies between the auditory and visual modalities fails to abolish the McGurk effect (McGurk and MacDonald, <xref ref-type="bibr" rid="B28">1976</xref>
), perceptual fusions that arise from incongruent auditory and visual speech information (e.g., Massaro et al., <xref ref-type="bibr" rid="B27">1996</xref>
). This suggests that a framework assuming extensive unisensory processing can account for audiovisual fusion. Further evidence for late integration comes from studies showing that the McGurk effect varies in strength across cultures (Sekiyama and Tohkura, <xref ref-type="bibr" rid="B38">1993</xref>
) and for familiar versus unfamiliar talkers (Walker et al., <xref ref-type="bibr" rid="B55">1995</xref>
). Thus, an alternative account to the theory that amodal information is extracted and combined in the early stages of processing is the view, supported by some of the evidence cited above, that neural networks learn associations between auditory and visual information (see Bernstein et al., <xref ref-type="bibr" rid="B8">2004</xref>
). Extensive unisensory processing is believed to occur in the auditory and visual channels prior to recognition, with integration of speech specific features occurring late in the perceptual processing stages. Evidence from EEG tasks using mismatch negativity does indicate that information from visual processing areas “modulates” early auditory processing, although the integration of higher order features specific to speech does not seem to occur at this stage (Ponton et al., <xref ref-type="bibr" rid="B34">2009</xref>
; see also van Wassenhove et al., <xref ref-type="bibr" rid="B54">2005</xref>
; Pilling, <xref ref-type="bibr" rid="B33">2009</xref>
; Winneke and Phillips, <xref ref-type="bibr" rid="B58">2011</xref>
).</p>
<p>We now turn to the experimental methodology of the double factorial paradigm (DFP) that will be employed in two studies to investigate the above issues central to audiovisual integration in speech perception. The DFP can readily be used to investigate and falsify accounts of audiovisual integration that have not been tested in a direct and definitive manner. The paradigm employs reaction time based methodology engineered so as to avoid model-mimicking obstacles (Townsend, <xref ref-type="bibr" rid="B45">1971</xref>
; Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). We refer interested readers to Townsend and Wenger (<xref ref-type="bibr" rid="B52">2004a</xref>
) for a general survey and bibliography.</p>
</sec>
<sec><title>The Double Factorial Paradigm: Assessing Models of Audiovisual Processing</title>
<p>Given the theoretical distinction between coactive and parallel models of integration in audiovisual speech perception, it is important to find a way to empirically distinguish between them. To our knowledge, this has not been accomplished in the speech perception literature. The description of what we refer to as “coactive” and “parallel” models in the speech perception literature as well as serial mechanisms require specific mathematical formulation along with behavioral data if they are to be tested. The methodology can be used to directly test separate decisions parallel versus coactive parallel processing, as well as to assess workload capacity, and to identify the decisional stopping rule (Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). Evidence concerning channel independence or cross-channel audiovisual interactions contributing to the recognition process can also be garnered. Finally, there are very few assumptions needed to carry out our tests besides high accuracy. For instance, the predictions do not rely on parametric assumptions (e.g., that the data are normally distributed). The major assumptions and DFP methodology are discussed in the following section. A basic understanding of these principles should assist investigators interested in carrying out DFP style experiments in their respective research domain.</p>
<sec><title>Assumptions</title>
<sec><title>Basic experimental design</title>
<p>The basic design of the DFP involves the detection or identification of targets, usually presented in one or two <italic>channels</italic>
. The term <italic>channel</italic>
 refers to an abstract information processing construct, which normally involves the direction of attention to a particular object or modality. Consider the following design as an exemplary DFP experiment. Suppose observers participate in a simple task involving the presentation of the following stimuli: an auditory pure tone, a visual dot, and “target-absent” trials in which only a blank screen is presented. The classic DFP design often involves the presentation of four trial types: (a) single target trials where only an auditory tone is presented, (b) single target trials where only a visual dot is presented, (c) <italic>redundant target</italic>
 trials in which both a dot and the auditory tone are presented, and finally (d), target-absent trials. One common response mapping (usually referred to as an <italic>OR design</italic>
) would require observers to make a “YES” response when single target auditory, single target visual, or redundant (auditory and visual) information is presented, and a “NO” response on target-absent trials. Participants are normally instructed to make a “NO” response upon perceiving the absence of stimulus information in both modalities (experimental trials are typically initiated by the presentation of a cue such as a fixation cross). While DFP studies generally obtain both accuracy and reaction time information, reaction times constitute the crucial dependent measure used in assessments of architecture and capacity. The DFP generally requires a large number of trials in order to obtain a distribution of RTs from each of the conditions described above. As we shall see, this basic design can be readily adapted to address questions in the speech perception literature by requiring participants to identify spoken words using audiovisual, auditory-only, or visual-only information.</p>
</sec>
<sec><title>Factor 1: number of channel available</title>
<p>The first factor manipulated in the context of the DFP concerns the number of targets present, or channels available (Auditory-only/Visual-only versus AV) when the observer is making a decision. This is crucial for calculating the measure of capacity, which assesses information processing efficiency as a function of the number of channels available. In the clinical speech perception literature for instance, researchers are often concerned with how “efficiently” different clinical populations recognize words when they have both auditory and visual information available compared to when only auditory (or only visual “lip-reading”) information is available (e.g., Bergeson and Pisoni, <xref ref-type="bibr" rid="B6">2004</xref>
 for a review, and Sommers et al., <xref ref-type="bibr" rid="B40">2005</xref>
). Typically, accuracy-only measures are used when comparing the performance of clinical populations such as elderly hearing-impaired or children with cochlear implants, to young normal-hearing listeners.</p>
<p>In Experiment 1, the basic design of the DFP was adapted in such a way as to include a closed set word identification experiment in which words were presented to participants in auditory-only, visual-only, and audiovisual settings. We assessed workload capacity (described shortly) for three different auditory S/N ratios in order to investigate how integration efficiency and the nature of cross-modal interactions change under variable listening conditions.</p>
</sec>
<sec><title>Factor 2: saliency</title>
<p>The second factor manipulated in the typical DFP design is the saliency of each channel. For our purposes, the overall clarity of the auditory and visual signals can be manipulated using a high (“easy”) and low (“difficult”) level of saliency to induce faster versus slower reaction times respectively. The salience manipulation is induced to assess architecture (i.e., parallel versus coactive) without contamination through workload capacity changes (Townsend, <xref ref-type="bibr" rid="B46">1974</xref>
). When both auditory and visual information are presented, there are four possible saliency trial types: High-A & High-V (hh), High-A & Low-V (hl), Low-A & High-V (lh), and Low-A & Low-V (ll). In the context of the exemplary audiovisual tone and dot identification task described here, the dot would be presented at two levels of brightness in the redundant target (and also single target) trials, and the tone could be presented at 2 dB levels or S/N ratios (again, in both single and redundant target trials). The saliency manipulation is crucial for assessing architecture and also decision rule, but not capacity. As we shall see in the Section below on “<italic>selective influence</italic>
,” it is important to test whether the salience manipulation was “effective.”</p>
<p>In a pilot study described shortly, we adapted the basic DFP design to include a study involving a two-alternative forced-choice discrimination between the words “Base” and “Face.” Participants were presented with auditory-only, visual-only or combined audiovisual information. The information in the auditory and visual channels was presented at two different saliency levels to create the four factorial conditions in audiovisual trials (hh, hl, lh, and ll). The purpose of this study was to provide a preliminary assessment of architecture and decision rule.</p>
</sec>
<sec><title>Selective influence</title>
<p>An important assumption integral to DFP methodology is <italic>selective influence</italic>
. This is crucial when assessing architecture and decision rule. <italic>Selective influence</italic>
 refers to the fact that the salience manipulation must be “effective” in causing changes in the processing speed of a particular sub-process. For selective influence to hold, the saliency manipulation in the auditory and visual channels must have the effect of changing processing speed (either speeding up or slowing down) in that particular channel. One of the first incarnations of selective influence was within Saul Sternberg’s <italic>additive factors method</italic>
 (Sternberg, <xref ref-type="bibr" rid="B41">1969</xref>
). At that stage in its history, the theoretical basis was unclear and there was no way to verify its validity. It was not discovered until later that the influence must occur at a sufficient grade of statistical strength for architectural differences to be testable (Townsend and Ashby, <xref ref-type="bibr" rid="B49">1983</xref>
; Townsend, <xref ref-type="bibr" rid="B47">1990</xref>
, Chapter 12; Townsend and Schweickert, <xref ref-type="bibr" rid="B51">1989</xref>
). Our methodology for assessing selective influence involves checking that the empirical distribution functions from each factorial condition are ordered in such a way that the distribution corresponding to the High-A High-V (hh) condition shows “faster” RTs compared to the distribution functions corresponding to the High-A/Low-V and Low-A/High-V (hl/lh) conditions, and also that the distribution functions corresponding to the High-A/Low-V and Low-A/High-V conditions (hl/lh) indicate faster RTs than the distribution function corresponding to the Low-A and Low-V conditions (ll). The empirical cumulative distribution functions (CDFs or survivor functions) should thus follow a specific ordering (see Townsend, <xref ref-type="bibr" rid="B47">1990</xref>
; Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). For example, the CDF for the hh condition should be greater than the distributions for the hl/lh and ll conditions.</p>
</sec>
</sec>
<sec><title>Assessing architecture and decision rule</title>
<p>First, in DFP methodology, we compute a mean interaction contrast using mean reaction times from each factorial condition obtained from the audiovisual trials: <italic>M</italic>
<sub>IC</sub>
 = [RT<sub>ll</sub>
–RT<sub>lh</sub>
]–[RT<sub>hl</sub>
–RT<sub>hh</sub>
]. Additionally, DFP affords a deeper test of interactions between empirical survivor functions to yield a more fine grained interaction measure than would be provided by mean RTs alone (Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). Survivor functions are used instead of CDFs to compute architecture and decision rule since the mathematical proofs provided by Townsend and Nozawa (<xref ref-type="bibr" rid="B50">1995</xref>
) assume the computation of survivor functions. Survivor functions provide the probability that a process has not finished (i.e., that recognition has not occurred yet) by time <italic>t</italic>
. Let us call the CDF obtained from binned RTs in each condition: <italic>F</italic>
(<italic>t</italic>
) = <italic>P</italic>
(<italic>T</italic>
 ≤ <italic>t</italic>
), where <italic>T</italic>
 represents the processing time random variable, and <italic>t</italic>
 is a specific value (e.g., 500 ms). Then, the survivor function is defined as <italic>S</italic>
(<italic>t</italic>
) = 1 − <italic>F</italic>
(<italic>t</italic>
), and indicates the probability that processing does not finish until later than, say, <italic>t</italic>
 = 500 ms. The survivor interaction contrast, or <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
), is defined as <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) = [<italic>S</italic>
<sub>ll</sub>
(<italic>t</italic>
)–S<sub>lh</sub>
(<italic>t</italic>
)]–[S<sub>hl</sub>
(<italic>t</italic>
)–S<sub>hh</sub>
(<italic>t</italic>
)]. The form of this contrast is the same as the <italic>M</italic>
<sub>IC</sub>
, although the survivor interaction contrast is computed along each point in the functions. Since the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) is computed across multiple points, it produces a continuous function rather than just a single number.</p>
<p>Figure <xref ref-type="fig" rid="F2">2</xref>
 shows how the shape of the interaction contrast can be used to identify different processing architectures and decision rules. It will aid in our presentation to point out that since the area under a survivor function is equal to its mean, the <italic>M</italic>
<sub>IC</sub>
 is the integral of the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) function.</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p><bold><italic>S<sub>IC</sub>
</italic>
(<italic>t</italic>
) predictions for standard independent parallel, serial, and coactive models</bold>
. The two top panels display the predictions of the independent parallel first-terminating and exhaustive models respectively, while the middle panels display the predictions of the serial first-terminating and exhaustive models respectively. The bottom panel displays the coactive model predictions. The <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) is plotted against arbitrary time units (AU).</p>
</caption>
<graphic xlink:href="fpsyg-02-00238-g002"></graphic>
</fig>
<p>The upper left panel in Figure <xref ref-type="fig" rid="F2">2</xref>
 shows the predictions of an independent parallel processing model with a first-terminating stopping rule. The <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) function for this model is entirely positive (Proposition 1 in Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). To understand why the curve should be positive, consider how the l<bold>h</bold>
, <bold>h</bold>
l, and <bold>hh</bold>
 conditions each have at least one “fast” process due to the salience manipulation. Recognition can occur as soon as the fastest channel reaches threshold. Therefore, when the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) is computed, [S<sub>ll</sub>
(<italic>t</italic>
)–S<sub>lh</sub>
(<italic>t</italic>
)] should be greater at each time point than [S<sub>hl</sub>
(<italic>t</italic>
)–S<sub>hh</sub>
(<italic>t</italic>
)]; survivor functions with slower RTs are greater than survivor functions with faster RTs. The <italic>M</italic>
<sub>IC</sub>
 in this case should also be positive. The upper right panel in Figure <xref ref-type="fig" rid="F2">2</xref>
 depicts the predictions for the same independent parallel model but now with an exhaustive stopping rule. These models predict <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
)’s that are entirely negative (Proposition 2 from Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). In this case, the <bold>ll</bold>
, <bold>l</bold>
h, and h<bold>l</bold>
 conditions each have at least one slow process. In exhaustive models, recognition occurs only when both channels (or the slowest of the two processes) reach threshold. Therefore, when the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) is computed, [<italic>S</italic>
<sub>ll</sub>
(<italic>t</italic>
)–<italic>S</italic>
<sub>lh</sub>
(<italic>t</italic>
)] should be less than [<italic>S</italic>
<sub>hl</sub>
(<italic>t</italic>
)–<italic>S</italic>
<sub>hh</sub>
(<italic>t</italic>
)] at each time point. Again, it follows that the <italic>M</italic>
<sub>IC</sub>
 should be a negative number. Coactive models produce <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) functions that exhibit a small negative region for early time intervals, followed by a larger positive region thereafter. The reason for this predicted shape is not intuitive, and relies on a proof assuming a Poisson model provided by Townsend and Nozawa (<xref ref-type="bibr" rid="B50">1995</xref>
). The M<sub>IC</sub>
 for coactive models, like parallel first-terminating models, should also be positive due to the larger positive area in the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) function.</p>
<p>Finally, although serial processing has never to our knowledge been considered as a viable model of multisensory integration in the audiovisual speech perception literature, the test for serial processing “comes for free” with the DFP, and might prove useful in certain paradigms employing speech stimuli. When processing is serial with independent channels and a first-terminating stopping rule, the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) is flat and equal to 0 at every point of time (Proposition 3 from Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). This is because serial or additive processes predict interaction contrasts equal to 0 (Sternberg, <xref ref-type="bibr" rid="B41">1969</xref>
). Obviously, this implies that the <italic>M</italic>
<sub>IC</sub>
 should be 0 as well. On the other hand, with exhaustive serial processing and independent channels shown in the panel to the right, an S-shaped curve is predicted with a negative region for early processing times and a positive region for later processing times. The reason for the S-shaped curve in the serial exhaustive <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
), much like the coactive case, is not intuitive. We refer the interested reader to the proof in Townsend and Nozawa (<xref ref-type="bibr" rid="B50">1995</xref>
) for an explanation. Interestingly, the negative and positive regions of the curve are equal to each other in serial exhaustive models thereby forcing the <italic>M</italic>
<sub>IC</sub>
 to be 0.</p>
</sec>
<sec><title>Assessing workload capacity</title>
<p>An important feature of our methodology is its ability to assess the workload capacity of the system. <italic>Workload capacity</italic>
 measures how the number of working channels (in this case, one that is auditory-only or visual-only versus auditory and visual) affects processing efficiency at time <italic>t</italic>
. Is there a cost, benefit, or no change in efficiency when both auditory and visual channels are present relative to the conditions when only auditory or visual information is available? Capacity predictions for parallel and coactive models are shown in Figure <xref ref-type="fig" rid="F3">3</xref>
. The equation for calculating the capacity coefficient <italic>C</italic>
(<italic>t</italic>
) involves calculating the integrated hazard function <italic>H</italic>
(<italic>t</italic>
) = ∫ <italic>h</italic>
(<italic>t</italic>
)dt. A nice property of this integrated hazard function is that <italic>H</italic>
(<italic>t</italic>
) = −log(S(<italic>t</italic>
)), which provides a straightforward estimate of <italic>H</italic>
(<italic>t</italic>
). The equation from Townsend and Nozawa (<xref ref-type="bibr" rid="B50">1995</xref>
) is:</p>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p><bold>Predicted workload capacity, <italic>C</italic>
(<italic>t</italic>
), for independent parallel models (left), and coactive models (right)</bold>
. Notice that coactive model predicts extreme super capacity, while independent parallel models predict <italic>C</italic>
(<italic>t</italic>
) = 1 (which is the benchmark for efficient audiovisual processing or integration). Standard serial models (generally) predict <italic>C</italic>
(<italic>t</italic>
) = 1/2 while parallel models with negative cross-talk can readily mimic such predictions.</p>
</caption>
<graphic xlink:href="fpsyg-02-00238-g003"></graphic>
</fig>
<disp-formula id="E1"><mml:math id="M1"><mml:mi>C</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:msub><mml:mrow><mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow><mml:mstyle class="text"><mml:mtext>AV</mml:mtext>
</mml:mstyle>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-bin">∕</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">[</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow><mml:mstyle class="text"><mml:mtext>A</mml:mtext>
</mml:mstyle>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-bin">+</mml:mo>
<mml:msub><mml:mrow><mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow><mml:mstyle class="text"><mml:mtext>V</mml:mtext>
</mml:mstyle>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo class="MathClass-close">]</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>The subscripts A and V represent auditory and visual reaction time trials, usually across all levels of saliency. It is worth noting that capacity is a <italic>relative measure</italic>
, meaning that it assays the performance for workload with both channels in action relative to the predictions of a parallel system with independent channels (see Figure <xref ref-type="fig" rid="F1">1</xref>
). If the processing system is parallel with stochastically independent channels and the rate on each single channel is unaffected by increasing the number of operating channels, the system is said to be of <italic>unlimited capacity</italic>
. Any such unlimited capacity, independent channels, parallel system predicts <italic>C</italic>
(<italic>t</italic>
) = 1 for all <italic>t</italic>
 ≥ 0 because the prediction of any such system is the denominator of the above expression namely, <italic>H</italic>
<sub>A</sub>
(<italic>t</italic>
) + <italic>H</italic>
<sub>V</sub>
(<italic>t</italic>
). One benefit to computing the capacity coefficient is that it provides instantaneous information regarding whether the observer violated the assumption of independence by deviating from the benchmark of “integration efficiency” [i.e., <italic>C</italic>
(<italic>t</italic>
) = 1]. If the channels slow down as other channels are engaged (i.e., with greater workload) then it operates at <italic>limited capacity</italic>
, and <italic>C</italic>
(<italic>t</italic>
) < 1. Inhibition between channels can cause such a slowdown (Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
). If there is a benefit in processing rate, then it operates at <italic>super capacity</italic>
 (see Wenger and Townsend, <xref ref-type="bibr" rid="B57">2001</xref>
); such a scenario can be caused by facilitation between channels.</p>
<sec><title>Bounds on performance</title>
<p>An upper bound on performance for separate decisions parallel models, also known as <italic>race models</italic>
 (Raab, <xref ref-type="bibr" rid="B35">1962</xref>
) was provided by (Miller, <xref ref-type="bibr" rid="B30">1982</xref>
; see Colonius, <xref ref-type="bibr" rid="B12">1990</xref>
) in the form of the well-known <italic>race inequality</italic>
. It stipulates that in such models it must be the case that:</p>
<disp-formula id="E2"><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mstyle class="text"><mml:mtext>AV</mml:mtext>
</mml:mstyle>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">≤</mml:mo>
<mml:msub><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mstyle class="text"><mml:mtext>A</mml:mtext>
</mml:mstyle>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-bin">+</mml:mo>
<mml:msub><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mstyle class="text"><mml:mtext>V</mml:mtext>
</mml:mstyle>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>F</italic>
<sub>AV</sub>
(<italic>t</italic>
) is the CDF <italic>F</italic>
<sub>AV</sub>
(<italic>t</italic>
) = <italic>P</italic>
<sub>AV</sub>
(<italic>T</italic>
 ≤ <italic>t</italic>
) for the double target trials and <italic>F<sub>i</sub>
</italic>
 (<italic>t</italic>
) (<italic>i</italic>
 = A, V) are the corresponding statistics for the single target trials. It forms an upper limit on performance for a wide variety of parallel race models, including, but not confined to, the unlimited capacity parallel model with independent channels.</p>
<p>Although <italic>C</italic>
(<italic>t</italic>
) and the Miller bound are both assessing performance as workload changes, they are not at all identical. The capacity coefficient offers a graded comparison of data with the standard parallel model, for all time <italic>t</italic>
 whereas the bound establishes a region where performance is super capacity to the extent that a large set of parallel race models cannot cross. The Miller race inequality is informative for such values of time that <italic>F</italic>
<sub>A</sub>
(<italic>t</italic>
) + <italic>F</italic>
<sub>V</sub>
(<italic>t</italic>
) ≤ 1 but not thereafter. <italic>C</italic>
(<italic>t</italic>
) is not restricted in this fashion. Townsend and Nozawa (1985) proved that if <italic>C</italic>
(<italic>t</italic>
) > 1 for an interval early in processing (i.e., fast RTs) then the above inequality has to be violated. On the other hand, for larger RTs, <italic>C</italic>
(<italic>t</italic>
) sometimes has to be quite large to violate Miller’s bound.</p>
<p>A bound that assesses limited rather than super capacity is known as the <italic>Grice bound</italic>
. It is given by the inequality that will hold if processing is not too limited in capacity: <italic>F</italic>
<sub>AV</sub>
(<italic>t</italic>
) > MAX{<italic>F</italic>
<sub>A</sub>
(<italic>t</italic>
), <italic>F</italic>
<sub>V</sub>
(<italic>t</italic>
)} (Grice et al., <xref ref-type="bibr" rid="B20">1984</xref>
; Colonius, <xref ref-type="bibr" rid="B12">1990</xref>
). Townsend and Nozawa (<xref ref-type="bibr" rid="B50">1995</xref>
) proved that if the Grice bound is violated at time <italic>t</italic>
, then <italic>C</italic>
(<italic>t</italic>
) < 1 for that time point, much less than 1 in most cases. If <italic>H</italic>
<sub>A</sub>
(<italic>t</italic>
) = <italic>H</italic>
<sub>V</sub>
(<italic>t</italic>
) the Grice bound is achieved when <italic>C</italic>
(<italic>t</italic>
) = 1/2 (see Townsend and Ashby, <xref ref-type="bibr" rid="B49">1983</xref>
; Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
). Overall then, architecture is assessed most directly from the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) and <italic>M</italic>
<sub>IC</sub>
 results on double target trials with variation of the selective influence factors, while workload capacity is measured from the single target workloads in comparison to the double target workload through the <italic>C</italic>
(<italic>t</italic>
) function. Together, they constitute a methodology capable of determining key aspects of the attendant processing system.</p>
<p>We will demonstrate in the following studies how RT methodology proposed in this work provides a convenient means for investigating the issue of “early versus late integration” (or coactive versus parallel processing) and other questions related to the processing of bimodal speech or non-speech stimuli. We first present the results of a pilot study, which constituted an initial attempt to employ a full DFP design to investigate architecture and decision rule processing issues in speech perception. While this represents a basic application of the design, we intend for the basic principles to be applied to future studies. The primary study, Experiment 1 was designed to assess capacity and integration efficiency using a more ecological design by employing multiple signal-to-noise ratios, a larger set size, and multiple talkers.</p>
</sec>
</sec>
</sec>
<sec><title>Pilot Study</title>
<p>We carried out a pilot experiment to investigate processing architecture and decision rule in a task that required discrimination between two words (“Base” versus “Face”) using audiovisual, auditory-only, and visual-only trials. The inclusion of two spoken words in the context of a forced-choice task (see Massaro, <xref ref-type="bibr" rid="B25">2004</xref>
 for a series of two-alternative forced-choice tasks) should be simple enough to allow us to implement DFP methods while also encouraging the listener to engage in language perception. We also assessed processing capacity. Six subjects participated in this pilot study, in which they were exposed to video clips of a female talker saying the words “Base” and “Face.” Their task was to make a two-alternative forced-choice button press response corresponding to the word they thought the talker said using audiovisual information, auditory-only information (with the visual portion of the clip removed), or visual-only information (with the auditory signal removed). The saliency manipulation on the visual signal involved presenting the video at two levels of brightness, and the saliency manipulation on the auditory signal involved presenting the auditory signal at two different auditory S/N ratios. Table <xref ref-type="table" rid="T1">1</xref>
 below shows the basic experimental set-up. The Ø symbol indicates the absence of auditory or visual stimuli in a particular channel. Reaction time distributions were obtained for each trial type and each salience condition.</p>
<table-wrap id="T1" position="float"><label>Table 1</label>
<caption><p><bold>The table shows each stimulus–response category (<italic>Base</italic>
 and <italic>Face</italic>
) alongside each factorial condition</bold>
.</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left" rowspan="1" colspan="1">Auditory</th>
<th align="left" rowspan="1" colspan="1">Visual</th>
<th align="left" rowspan="1" colspan="1">Correct response</th>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">Base</td>
<td align="left" rowspan="1" colspan="1">Base</td>
<td align="left" rowspan="1" colspan="1">“Base”</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Base</td>
<td align="left" rowspan="1" colspan="1">Ø</td>
<td align="left" rowspan="1" colspan="1">“Base”</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Ø</td>
<td align="left" rowspan="1" colspan="1">Base</td>
<td align="left" rowspan="1" colspan="1">“Base”</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Face</td>
<td align="left" rowspan="1" colspan="1">Face</td>
<td align="left" rowspan="1" colspan="1">“Face”</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Face</td>
<td align="left" rowspan="1" colspan="1">Ø</td>
<td align="left" rowspan="1" colspan="1">“Face”</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Ø</td>
<td align="left" rowspan="1" colspan="1">Face</td>
<td align="left" rowspan="1" colspan="1">“Face”</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) and <italic>M</italic>
<sub>IC</sub>
 were computed for each participant, and mean accuracy scores, particularly in the redundant target condition were high (>90%). The individual survivor functions from each of the saliency conditions were computed for each participant, and they were checked for the correct orderings using the procedure described previously. Each participant obeyed the assumptions for selective influence for at least some time intervals.</p>
<p>Figure <xref ref-type="fig" rid="F4">4</xref>
 shows the results of the architecture (Figures <xref ref-type="fig" rid="F4">4</xref>
A,B) and capacity analysis for a typical participant. Each participant’s <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) was consistently overadditive across a large range of processing times. Overall, these results provide preliminary support for a parallel model of audiovisual integration with a first-terminating stopping rule. These results thus allow us to rule out parallel exhaustive processing accounts (and also serial models). Only one participant (not shown) yielded a statistically significant region of early negativity in the <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
), which was consistent with coactive processing. Interestingly, capacity was extremely limited for every participant and violated the lower bound for nearly all time points – a finding inconsistent with coactive models, which predict extreme super capacity (see Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
). Overall, the combined <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) and capacity results from this study suggest parallel first-terminating processing with cross-channel inhibition causing decrements in capacity.</p>
<fig id="F4" position="float"><label>Figure 4</label>
<caption><p><bold>(A)</bold>
 <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) function for one exemplary participant in the pilot study. This participant showed evidence for parallel first-terminating processing (as did three other participants). Only one participant produced an <italic>S</italic>
<sub>IC</sub>
(<italic>t</italic>
) consistent with another model, which was coactive processing. <bold>(B)</bold>
 The capacity results computed for the same participant. <italic>C</italic>
(<italic>t</italic>
) was computed across all saliency levels (i.e., the integrated hazard functions from the AV, A-only, and V-only conditions included RTs from each level of saliency). Each participant yielded strong evidence for extremely limited capacity, a finding inconsistent with coactivation, but consistent with a parallel model with cross-channel inhibition.</p>
</caption>
<graphic xlink:href="fpsyg-02-00238-g004"></graphic>
</fig>
<p>We expected the design of this pilot experiment to encourage participants to call on their speech facilities since higher order features of the stimuli must be utilized in order to perform the task and discriminate two words. Still, there were several limitations to this experiment due to the restricted nature of the stimulus set. Since DFP methodology typically requires a large number of responses, we restricted the experiment to a two-alternative forced-choice task using only one talker, at the expense of ecological validity. While this study was successful in offering a novel way to test outstanding assumptions in the multisensory perception literature, some of the conclusions drawn may be limited to certain restricted stimulus sets.</p>
<p>To that end, we designed an experiment to address these limitations. First, to enhance ecological validity, we employed two talkers and included eight monosyllabic words in the stimulus set – eight words was one of the set sizes employed by Sumby and Pollack (<xref ref-type="bibr" rid="B43">1954</xref>
). Second, we employed multiple auditory signal-to-noise ratios (both “low” and “high”) to examine how workload capacity/multisensory benefit changes as a function of the quality of the auditory signal. The pilot experiment utilized auditory signal-to-noise ratios to elicit both high accuracy and selective influence. Therefore, the auditory signal-to-noise ratios in that experiment were not optimal for drawing strong conclusions about multisensory enhancement because they produced accuracy levels at the high end of the performance spectrum, unlike the signal-to-noise ratios of Sumby and Pollack’s (<xref ref-type="bibr" rid="B43">1954</xref>
) study. Experiment 1 allowed us to determine which auditory signal-to-noise ratios would elicit multisensory enhancement or efficient integration as measured by processing capacity. This also assisted us in interpreting our results within the milieu of the multisensory speech literature.</p>
</sec>
<sec><title>Experiment 1</title>
<p>Experiment 1 was a speech recognition task motivated by the design features implemented in Sumby and Pollack’s (<xref ref-type="bibr" rid="B43">1954</xref>
) seminal investigation of audiovisual enhancement. Sumby and Pollack (<xref ref-type="bibr" rid="B43">1954</xref>
) investigated speech intelligibility using five different auditory signal-to-noise ratios (−30, −24, −18, −12, and −6 dB) and seven different vocabulary sizes consisting of bi-syllabic spondees. In this experiment, both RT data and accuracy scores were collected in auditory-only, visual-only, and audiovisual conditions. Three different auditory signal-to-noise ratios (were employed in this study: −18, −12 dB, and clear. The clear, low noise condition was designed to approximate optimal listening conditions such as those that would be experienced in a quiet room. A vocabulary size of eight monosyllabic words was used in this study – one of the vocabulary sizes used by Sumby and Pollack (<xref ref-type="bibr" rid="B43">1954</xref>
). While a larger set size would generally be optimal for speech perception experiments, the collection of RT data necessitated a closed set of responses with fewer choices. It is important to note that the results obtained by Sumby and Pollack (<xref ref-type="bibr" rid="B43">1954</xref>
) in the eight-word condition followed the same overall trend in terms of audiovisual gain scores as the conditions employing larger vocabularies. The set size of eight words using three signal-to-noise ratios constitutes an ideal starting point for RT based audiovisual speech analysis.</p>
</sec>
<sec sec-type="materials|methods"><title>Materials and Methods</title>
<sec><title>Participants</title>
<p>A total of 15 college aged individuals (18–23) who reported normal hearing, with normal or corrected vision served as participants. All participants were native speakers of American English. Five participants were randomly allocated to the condition with the clear auditory signal (S/N = clear), five to the S/N ratio = −12 dB condition, and five to the S/N ratio = −18 dB. Each participant was paid 10 dollars per session.</p>
</sec>
<sec><title>Materials</title>
<p>The stimulus materials included audiovisual movie clips of two different female talkers from the Hoosier Multi-Talker Database (Sherffert et al., <xref ref-type="bibr" rid="B39">1997</xref>
). Two tokens of the monosyllabic words recorded from the two female talkers selected for this study included: “Mouse,” “Job,” “Gain,” “Tile,” “Shop,” “Boat,” “Date,” and “Page.” Audio, visual, and audiovisual files were edited using Final Cut Pro HD version 4.5. The audio files were sampled at a rate of 48 kHz at a size of 16 bits. The duration of the auditory, visual, and audiovisual files ranged from 800 to 1000 ms. We selected and edited the stimuli in such a way as to minimize differences between onset of facial movement and vocalization between clips. Each video contained either two–three lead in frames (approximately 60–90 ms) before the onset of the first visual onset cue. White noise was mixed with each audio file using Adobe Audition to create an auditory S/N ratio of −12 dB SPL, and another to create an auditory S/N ratio of −18 dB SPL. A third set of auditory files was used in the clear trials (68 dB SPL), in which white noise was not mixed with the auditory files. Visual saliency was not manipulated in this study since we were not interested in directly assessing architecture via the survivor interaction contrast. Capacity was calculated separately for each auditory S/N ratio [e.g., HA<sub>Clear</sub>
V/(HA<sub>Clear</sub>
 + HV)].</p>
</sec>
<sec><title>Design and procedure</title>
<p>Participants were seated 14″ to 18″ in front of a Macintosh computer equipped with Beyer Dynamic-100 headphones. Each trial began with a fixation cross (+) appearing in the center of the computer screen followed by the stimulus. The stimuli included auditory-only, visual-only or audiovisual stimuli, which were presented in different blocks. Immediately after the presentation of the stimulus word, a dialog box appeared on the computer monitor containing eight boxes (1″ × 2″ in size) arranged in four rows and two columns, and each box was labeled with one of the eight possible stimulus words. The labels on the grid were randomized for each participant. Participants were instructed to respond as quickly and accurately as possible by using the mouse to click the box labeled with the word the thought the talker said. Reaction times were measured from stimulus onset. On auditory-only trials, participants were required to base their response on auditory information, and on visual-only trials participants were required to lip-read. Each experimental session consisted of three randomly ordered blocks per day (auditory, visual, and audiovisual stimuli), and lasted approximately 45 min. Participants also received 24 practice trials at the onset of each of the two experimental sessions that were not included in the subsequent data analysis. The experiment was divided into two sessions where each subject participated in one session per day for a total of 2 days. The experiment consisted of 400 total auditory-only trials, 400 visual-only trials, and 400 audiovisual trials, where 200 trials in each condition were spoken by each of the two talkers.</p>
</sec>
</sec>
<sec><title>Results</title>
<p>Data from three experimental conditions consisting of an auditory signal-to-noise ratio of −18 dB, S/N of −12 dB, and the clear auditory S/N ratio with a low noise auditory signal approximating optimal listening conditions are presented in Table <xref ref-type="table" rid="T2">2</xref>
 respectively. Each portion of the table displays the proportion correct, the mean RT, and audiovisual gain scores for each participant across conditions. One of the conventional measures of gain, labeled “Gain <sub>V</sub>
,” assesses the overall contribution of visual information in terms of accuracy and is measured by AV–A (see Massaro and Cohen, <xref ref-type="bibr" rid="B26">2000</xref>
; Grant, <xref ref-type="bibr" rid="B17">2002</xref>
; Bergeson and Pisoni, <xref ref-type="bibr" rid="B6">2004</xref>
). We also included a measure of auditory gain (Gain <sub>A</sub>
), measured AV–V, which essentially measures the benefit afforded by auditory information. The measures labeled “Gain (RT<sub>V</sub>
)” and “Gain (RT<sub>A</sub>
)” denote the amount of facilitation that occurs, in terms of processing time, when visual (A<sub>RT</sub>
–AV<sub>RT</sub>
) or auditory information (V<sub>RT</sub>
–AV<sub>RT</sub>
) is added to the signal. Finally, a measure of gain expressed as the amount of “information transmitted” (IT) was included (AV–A)/(100 − A).</p>
<table-wrap id="T2" position="float"><label>Table 2</label>
<caption><p><bold>Mean accuracy scores for the auditory-only (A), visual-only (V), and audiovisual conditions (AV)</bold>
.</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">Sub1</th>
<th align="left" rowspan="1" colspan="1">Sub2</th>
<th align="left" rowspan="1" colspan="1">Sub3</th>
<th align="left" rowspan="1" colspan="1">Sub4</th>
<th align="left" rowspan="1" colspan="1">Sub5</th>
<th align="left" rowspan="1" colspan="1">Mean</th>
<th align="left" rowspan="1" colspan="1">SD</th>
</tr>
</thead>
<tbody><tr><td colspan="8" align="left" rowspan="1"><bold>RESULTS FOR S/N RATIO = −18 dB</bold>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">0.31</td>
<td align="left" rowspan="1" colspan="1">0.38</td>
<td align="left" rowspan="1" colspan="1">0.29</td>
<td align="left" rowspan="1" colspan="1">0.35</td>
<td align="left" rowspan="1" colspan="1">0.45</td>
<td align="left" rowspan="1" colspan="1">0.36</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">V</td>
<td align="left" rowspan="1" colspan="1">0.85</td>
<td align="left" rowspan="1" colspan="1">0.93</td>
<td align="left" rowspan="1" colspan="1">0.79</td>
<td align="left" rowspan="1" colspan="1">0.88</td>
<td align="left" rowspan="1" colspan="1">0.90</td>
<td align="left" rowspan="1" colspan="1">0.87</td>
<td align="left" rowspan="1" colspan="1">0.05</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">AV</td>
<td align="left" rowspan="1" colspan="1">0.85</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.87</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
<td align="left" rowspan="1" colspan="1">0.96</td>
<td align="left" rowspan="1" colspan="1">0.92</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">A(RT)</td>
<td align="left" rowspan="1" colspan="1">1093</td>
<td align="left" rowspan="1" colspan="1">527</td>
<td align="left" rowspan="1" colspan="1">580</td>
<td align="left" rowspan="1" colspan="1">849</td>
<td align="left" rowspan="1" colspan="1">1186</td>
<td align="left" rowspan="1" colspan="1">848</td>
<td align="left" rowspan="1" colspan="1">296</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">V(RT)</td>
<td align="left" rowspan="1" colspan="1">661</td>
<td align="left" rowspan="1" colspan="1">398</td>
<td align="left" rowspan="1" colspan="1">509</td>
<td align="left" rowspan="1" colspan="1">721</td>
<td align="left" rowspan="1" colspan="1">910</td>
<td align="left" rowspan="1" colspan="1">639</td>
<td align="left" rowspan="1" colspan="1">197</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">AV(RT)</td>
<td align="left" rowspan="1" colspan="1">555</td>
<td align="left" rowspan="1" colspan="1">319</td>
<td align="left" rowspan="1" colspan="1">467</td>
<td align="left" rowspan="1" colspan="1">471</td>
<td align="left" rowspan="1" colspan="1">547</td>
<td align="left" rowspan="1" colspan="1">472</td>
<td align="left" rowspan="1" colspan="1">95</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain<sub>V</sub>
</td>
<td align="left" rowspan="1" colspan="1">0.54</td>
<td align="left" rowspan="1" colspan="1">0.62</td>
<td align="left" rowspan="1" colspan="1">0.58</td>
<td align="left" rowspan="1" colspan="1">0.56</td>
<td align="left" rowspan="1" colspan="1">0.51</td>
<td align="left" rowspan="1" colspan="1">0.56</td>
<td align="left" rowspan="1" colspan="1">0.04</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain<sub>A</sub>
</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">0.08</td>
<td align="left" rowspan="1" colspan="1">0.03</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">0.05</td>
<td align="left" rowspan="1" colspan="1">0.03</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain (RT<sub>V</sub>
)</td>
<td align="left" rowspan="1" colspan="1">539</td>
<td align="left" rowspan="1" colspan="1">208</td>
<td align="left" rowspan="1" colspan="1">113</td>
<td align="left" rowspan="1" colspan="1">377</td>
<td align="left" rowspan="1" colspan="1">639</td>
<td align="left" rowspan="1" colspan="1">375</td>
<td align="left" rowspan="1" colspan="1">219</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain (RT<sub>A</sub>
)</td>
<td align="left" rowspan="1" colspan="1">106</td>
<td align="left" rowspan="1" colspan="1">79</td>
<td align="left" rowspan="1" colspan="1">42</td>
<td align="left" rowspan="1" colspan="1">250</td>
<td align="left" rowspan="1" colspan="1">363</td>
<td align="left" rowspan="1" colspan="1">168</td>
<td align="left" rowspan="1" colspan="1">135</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">IT</td>
<td align="left" rowspan="1" colspan="1">0.78</td>
<td align="left" rowspan="1" colspan="1">0.98</td>
<td align="left" rowspan="1" colspan="1">0.82</td>
<td align="left" rowspan="1" colspan="1">0.86</td>
<td align="left" rowspan="1" colspan="1">0.92</td>
<td align="left" rowspan="1" colspan="1">0.87</td>
<td align="left" rowspan="1" colspan="1">0.08</td>
</tr>
<tr><td colspan="8" align="left" rowspan="1"><bold>RESULTS FOR S/N RATIO = −12 dB</bold>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">0.79</td>
<td align="left" rowspan="1" colspan="1">0.64</td>
<td align="left" rowspan="1" colspan="1">0.76</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
<td align="left" rowspan="1" colspan="1">0.80</td>
<td align="left" rowspan="1" colspan="1">0.12</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">V</td>
<td align="left" rowspan="1" colspan="1">0.92</td>
<td align="left" rowspan="1" colspan="1">0.83</td>
<td align="left" rowspan="1" colspan="1">0.72</td>
<td align="left" rowspan="1" colspan="1">0.81</td>
<td align="left" rowspan="1" colspan="1">0.87</td>
<td align="left" rowspan="1" colspan="1">0.83</td>
<td align="left" rowspan="1" colspan="1">0.09</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">AV</td>
<td align="left" rowspan="1" colspan="1">0.98</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.87</td>
<td align="left" rowspan="1" colspan="1">0.97</td>
<td align="left" rowspan="1" colspan="1">0.98</td>
<td align="left" rowspan="1" colspan="1">0.95</td>
<td align="left" rowspan="1" colspan="1">0.09</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">A(RT)</td>
<td align="left" rowspan="1" colspan="1">881</td>
<td align="left" rowspan="1" colspan="1">767</td>
<td align="left" rowspan="1" colspan="1">764</td>
<td align="left" rowspan="1" colspan="1">665</td>
<td align="left" rowspan="1" colspan="1">1035</td>
<td align="left" rowspan="1" colspan="1">822</td>
<td align="left" rowspan="1" colspan="1">177</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">V(RT)</td>
<td align="left" rowspan="1" colspan="1">658</td>
<td align="left" rowspan="1" colspan="1">596</td>
<td align="left" rowspan="1" colspan="1">557</td>
<td align="left" rowspan="1" colspan="1">798</td>
<td align="left" rowspan="1" colspan="1">986</td>
<td align="left" rowspan="1" colspan="1">719</td>
<td align="left" rowspan="1" colspan="1">168</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">AV(RT)</td>
<td align="left" rowspan="1" colspan="1">595</td>
<td align="left" rowspan="1" colspan="1">507</td>
<td align="left" rowspan="1" colspan="1">457</td>
<td align="left" rowspan="1" colspan="1">654</td>
<td align="left" rowspan="1" colspan="1">705</td>
<td align="left" rowspan="1" colspan="1">584</td>
<td align="left" rowspan="1" colspan="1">91</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain<sub>V</sub>
</td>
<td align="left" rowspan="1" colspan="1">0.19</td>
<td align="left" rowspan="1" colspan="1">0.35</td>
<td align="left" rowspan="1" colspan="1">0.12</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">0.068</td>
<td align="left" rowspan="1" colspan="1">0.16</td>
<td align="left" rowspan="1" colspan="1">0.11</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain<sub>A</sub>
</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">0.16</td>
<td align="left" rowspan="1" colspan="1">0.15</td>
<td align="left" rowspan="1" colspan="1">0.16</td>
<td align="left" rowspan="1" colspan="1">0.11</td>
<td align="left" rowspan="1" colspan="1">0.13</td>
<td align="left" rowspan="1" colspan="1">0.04</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain (RT<sub>V</sub>
)</td>
<td align="left" rowspan="1" colspan="1">539</td>
<td align="left" rowspan="1" colspan="1">208</td>
<td align="left" rowspan="1" colspan="1">113</td>
<td align="left" rowspan="1" colspan="1">377</td>
<td align="left" rowspan="1" colspan="1">639</td>
<td align="left" rowspan="1" colspan="1">375</td>
<td align="left" rowspan="1" colspan="1">219</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain (RT<sub>A</sub>
)</td>
<td align="left" rowspan="1" colspan="1">63</td>
<td align="left" rowspan="1" colspan="1">89</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">135</td>
<td align="left" rowspan="1" colspan="1">281</td>
<td align="left" rowspan="1" colspan="1">134</td>
<td align="left" rowspan="1" colspan="1">86</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">IT</td>
<td align="left" rowspan="1" colspan="1">0.90</td>
<td align="left" rowspan="1" colspan="1">0.97</td>
<td align="left" rowspan="1" colspan="1">0.46</td>
<td align="left" rowspan="1" colspan="1">0.66</td>
<td align="left" rowspan="1" colspan="1">0.78</td>
<td align="left" rowspan="1" colspan="1">0.67</td>
<td align="left" rowspan="1" colspan="1">0.27</td>
</tr>
<tr><td colspan="8" align="left" rowspan="1"><bold>RESULTS FOR THE CLEAR S/N RATIO</bold>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.998</td>
<td align="left" rowspan="1" colspan="1">0.998</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.004</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">V</td>
<td align="left" rowspan="1" colspan="1">0.90</td>
<td align="left" rowspan="1" colspan="1">0.62</td>
<td align="left" rowspan="1" colspan="1">0.82</td>
<td align="left" rowspan="1" colspan="1">0.80</td>
<td align="left" rowspan="1" colspan="1">0.85</td>
<td align="left" rowspan="1" colspan="1">0.79</td>
<td align="left" rowspan="1" colspan="1">0.11</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">AV</td>
<td align="left" rowspan="1" colspan="1">0.988</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.995</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.99</td>
<td align="left" rowspan="1" colspan="1">0.006</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">A(RT)</td>
<td align="left" rowspan="1" colspan="1">784</td>
<td align="left" rowspan="1" colspan="1">706</td>
<td align="left" rowspan="1" colspan="1">734</td>
<td align="left" rowspan="1" colspan="1">704</td>
<td align="left" rowspan="1" colspan="1">599</td>
<td align="left" rowspan="1" colspan="1">706</td>
<td align="left" rowspan="1" colspan="1">67</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">V(RT)</td>
<td align="left" rowspan="1" colspan="1">869</td>
<td align="left" rowspan="1" colspan="1">1108</td>
<td align="left" rowspan="1" colspan="1">650</td>
<td align="left" rowspan="1" colspan="1">963</td>
<td align="left" rowspan="1" colspan="1">772</td>
<td align="left" rowspan="1" colspan="1">872</td>
<td align="left" rowspan="1" colspan="1">176</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">AV(RT)</td>
<td align="left" rowspan="1" colspan="1">740</td>
<td align="left" rowspan="1" colspan="1">733</td>
<td align="left" rowspan="1" colspan="1">686</td>
<td align="left" rowspan="1" colspan="1">725</td>
<td align="left" rowspan="1" colspan="1">597</td>
<td align="left" rowspan="1" colspan="1">696</td>
<td align="left" rowspan="1" colspan="1">59</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain<sub>V</sub>
</td>
<td align="left" rowspan="1" colspan="1">−0.003</td>
<td align="left" rowspan="1" colspan="1">0.002</td>
<td align="left" rowspan="1" colspan="1">−0.012</td>
<td align="left" rowspan="1" colspan="1">0.005</td>
<td align="left" rowspan="1" colspan="1">0.002</td>
<td align="left" rowspan="1" colspan="1">−0.003</td>
<td align="left" rowspan="1" colspan="1">0.006</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain<sub>A</sub>
</td>
<td align="left" rowspan="1" colspan="1">0.09</td>
<td align="left" rowspan="1" colspan="1">0.38</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">0.19</td>
<td align="left" rowspan="1" colspan="1">0.14</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">0.12</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain (RT<sub>V</sub>
)</td>
<td align="left" rowspan="1" colspan="1">44</td>
<td align="left" rowspan="1" colspan="1">−27</td>
<td align="left" rowspan="1" colspan="1">48</td>
<td align="left" rowspan="1" colspan="1">−21</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">35</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Gain (RT<sub>A</sub>
)</td>
<td align="left" rowspan="1" colspan="1">129</td>
<td align="left" rowspan="1" colspan="1">375</td>
<td align="left" rowspan="1" colspan="1">−36</td>
<td align="left" rowspan="1" colspan="1">238</td>
<td align="left" rowspan="1" colspan="1">175</td>
<td align="left" rowspan="1" colspan="1">176</td>
<td align="left" rowspan="1" colspan="1">150</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">IT</td>
<td align="left" rowspan="1" colspan="1">−0.30</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.50</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0.09</td>
<td align="left" rowspan="1" colspan="1">0.59</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The gain scores (AV–A) and (AV–V), the RT gains (A<sub>RT</sub>
–AV<sub>RT</sub>
) and (V<sub>RT</sub>
–AV<sub>RT</sub>
), and the information transmitted (AV–A)/(100 − A) (see Sumby and Pollack, <xref ref-type="bibr" rid="B43">1954</xref>
). The results from Table <xref ref-type="table" rid="T2">2</xref>
 indicate that accuracy scores in the auditory channel increased as the S/N ratio improved (mean −18 dB = 0.36, mean −12 dB = 0.78, mean clear ≈ 1.0; <italic>F</italic>
(2, 13) = 81.9, <italic>p</italic>
 < 0.0001). The mean proportion correct in the audiovisual condition did not significantly differ between conditions. Nonetheless, a non-significant trend to this effect was observed (mean −18 dB = 0.92, mean −12 dB = 0.99, mean clear = 1.0; <italic>F</italic>
(2, 13) = 1.97, <italic>p</italic>
 < 0.20). Finally, the mean proportion correct for the visual-only condition (which was not degraded) across conditions was 0.83. Sumby and Pollack (<xref ref-type="bibr" rid="B43">1954</xref>
) did not technically employ a visual-only condition, but instead included a condition in which the auditory signal-to-noise ratio was −30 dB. This provided some highly degraded auditory information.</p>
<p>Overall, RTs decreased as the auditory S/N ratio increased in the auditory-only condition. We did not observe significant differences between experimental conditions in terms of RT [mean −18 dB = 847 ms, mean −12 dB = 771 ms, mean clear = 705 ms; <italic>F</italic>
(2, 13) < 1]. This result could be due to lack of power since variability in RT scores is generally greater than accuracy scores. The mean RT for the visual-only condition was 735 ms. Interestingly, the analysis of mean processing times in the audiovisual condition revealed that RTs decreased as the auditory signal-to-noise ratio increased [mean −18 dB = 471 ms, mean −12 dB = 584 ms, mean clear = 696 ms; <italic>F</italic>
(2, 13) = 8.93, <italic>p</italic>
 < 0.01]. Hence, the audiovisual stimuli were processed more quickly and more accurately as the quality of the auditory information increased.</p>
<p>Interestingly, audiovisual significant gains were observed across experimental conditions. Not surprisingly, audiovisual gain scores decreased as the auditory signal-to-noise ratio improved due to a ceiling effect for percent correct [mean −18 dB = 0.56, mean −12 dB = 0.15, mean clear ≈ 0; <italic>F</italic>
(2, 13) = 80.10, <italic>p</italic>
 < 0.0001]. Similarly, the observed gain for RT was significant as well (mean −18 = 375 ms, mean −12 = 187 ms, mean clear = 10 ms; <italic>F</italic>
(2, 13) = 15.60, <italic>p</italic>
 < 0.0005). Finally, audiovisual gain measured expressed as AV-A/(100-A) noticeably improved as the auditory signal-to-noise ratio decreased (mean −18 dB = 0.87, mean −12 dB = 0.67, mean clear = 0.09; <italic>F</italic>
(2, 13) = 83.3, <italic>p</italic>
 < 0.0001). This result provides further evidence that a degraded auditory signal combined with information obtained from lip-reading facilitates audiovisual integration abilities.</p>
<sec><title>Capacity analysis</title>
<p>Figure <xref ref-type="fig" rid="F5">5</xref>
 displays the capacity results for each participant and auditory signal-to-noise ratio. In each panel, Participants 1, 2, and 3 are arranged sequentially in the top row, followed by 4 and 5 in the bottom row. The qualitative pattern of experimental results revealed that processing capacity/efficiency changed as a function of auditory signal-to-noise ratio. More specifically, the benefit or gain in efficiency in terms of the capacity coefficient decreased as the auditory signal-to-noise ratio improved from −18 to −12 dB. The pattern of the capacity data effectively demonstrated that visual information aids processing efficiency to a greater degree when the auditory input is degraded or otherwise less intelligible. One reason for this observed enhancement might be due to the fact that auditory and visual signals often provide complimentary information (See Summerfield, <xref ref-type="bibr" rid="B44">1987</xref>
; Grant et al., <xref ref-type="bibr" rid="B18">1998</xref>
). Information about place of articulation, for example, is available from lip-reading even though this information becomes increasingly degraded in the auditory domain under difficult listening conditions.</p>
<fig id="F5" position="float"><label>Figure 5</label>
<caption><p><bold>The capacity coefficient <italic>C</italic>
(<italic>t</italic>
) for each participant across all three experimental conditions</bold>
. The top <bold>(A)</bold>
, shows <italic>C</italic>
(<italic>t</italic>
) for five participants in the condition where the auditory S/N ratio was −18 dB. Each participant, except for Participant 3, evidenced super capacity (violating the bound <italic>C</italic>
(<italic>t</italic>
) = 1, or upper Miller Bound in capacity space; Eidels et al., <xref ref-type="bibr" rid="B13">2011</xref>
). The legend shows that <italic>C</italic>
(<italic>t</italic>
) is denoted by the dots, the upper bound by the solid curve, and the lower bound by the dashed line. <bold>(B)</bold>
 shows <italic>C</italic>
(<italic>t</italic>
) for five participants in the condition with an auditory S/N ratio of −12 dB, and the bottom <bold>(C)</bold>
 shows <italic>C</italic>
(<italic>t</italic>
) for five participants in the condition without any degradation of the auditory signal.</p>
</caption>
<graphic xlink:href="fpsyg-02-00238-g005"></graphic>
</fig>
<p>In the −18 dB condition, four out of five participants exhibited super capacity for multiple time points. As can be observed in the individual panels, the data points approached the upper Miller bound for super capacity in several cases (denoted by the curved solid line; Miller, <xref ref-type="bibr" rid="B30">1982</xref>
; Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
). Violations of the Miller bound are typically characteristic of coactive information processing (Miller, <xref ref-type="bibr" rid="B30">1982</xref>
, <xref ref-type="bibr" rid="B31">1986</xref>
; Townsend and Nozawa, <xref ref-type="bibr" rid="B50">1995</xref>
; Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
) although recent evidence has shown that certain classes of parallel linear dynamic models and Poisson summation models with facilitatory cross-talk between can produce similar violations of the bound. Simulations have demonstrated magnitude of these violations is typically less than the magnitude predicted by coactive summation models (Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
; Eidels et al., <xref ref-type="bibr" rid="B13">2011</xref>
).</p>
<p>The data shown in Figure <xref ref-type="fig" rid="F5">5</xref>
B shows that as the signal-to-noise ratio improved from a S/N ratio of −18 to −12 dB the <italic>C</italic>
(<italic>t</italic>
) appeared to become more limited as is evident by the overall drop in the capacity coefficient. <italic>C</italic>
(<italic>t</italic>
) for participants 2, 3, and 5 was super capacity for some processing times, although the <italic>C</italic>
(<italic>t</italic>
) was greater than 1 for extended time intervals only for Participant 5. While the −12 dB condition failed to produce any systematic violations of the Grice bound, <italic>C</italic>
(<italic>t</italic>
) generally hovered around 1/2 region and the Grice bound during later processing stages. Conversely, the <italic>C</italic>
(<italic>t</italic>
) for each participant in the −18 dB condition was typically equal to or greater than 1, and was consistently greater than the Grice bound.</p>
<p>Contrary to the findings observed in the −18 and −12 dB S/N ratio conditions, the <italic>C</italic>
(<italic>t</italic>
) data shown in Figure <xref ref-type="fig" rid="F5">5</xref>
C evidenced multiple violations of the Grice bound for limited capacity in Participants 2, 3, 4, and 5, with <italic>C</italic>
(<italic>t</italic>
) being approximately equal to the bound in Participant 1’s data set. As in Experiment 1, this condition failed to produce any sizeable violations of the Miller bound for super capacity. Overall, processing capacity was extremely limited in this condition providing further evidence that the addition of visual information does not significantly contribute, but instead can detract from efficient use of available auditory speech information in some cases. Exposing hearing-impaired listeners to this latter experimental condition might reveal a pattern of results similar to the −12 and −18 dB conditions in which some individuals, perhaps highly efficient lip readers, utilize the auditory and visual speech information sources more efficiently than others.</p>
<p>Taken together, the results suggest that integration becomes enhanced only when the auditory S/N ratio is low; an observation in line with the law of inverse effectiveness (see Meredith and Stein, <xref ref-type="bibr" rid="B29">1983</xref>
; Stevenson and James, <xref ref-type="bibr" rid="B42">2009</xref>
; although cf. Ma et al., <xref ref-type="bibr" rid="B23">2009</xref>
). The results in Table <xref ref-type="table" rid="T2">2</xref>
 suggest that visual gain (mean = 0.56) was most noticeable in the −18 dB condition. Second, audiovisual enhancement, measured by <italic>C</italic>
(<italic>t</italic>
) was greatest in this condition as well.</p>
<p>One may object that since the auditory S/N ratio was so low in the −18 dB condition, that it was functionally visual-only. Several observations can be offered as evidence against this. First, auditory-only recognition was above chance (12.5%) for each participant. Moreover, Table <xref ref-type="table" rid="T2">2</xref>
 revealed that the auditory gain AV–V was statistically greater than 0, with a mean of 5% [<italic>t</italic>
(df = 4) = 3.29, <italic>p</italic>
 < 0.05]. We wish to emphasize that in terms of real-time processing, that the <italic>C</italic>
(<italic>t</italic>
) and RT analysis effectively demonstrated that the level of gain from unisensory to multisensory processing is greater than would be predicted by a parallel race model [the A gain in terms of RT was also significantly different from 0; <italic>t</italic>
(df = 4) = 2.8, <italic>p</italic>
 < 0.05)]. Overall, these observations are essentially in agreement with previous literature indicating that auditory integration occurs for auditory S/N ratios as low as −24 dB (Ross et al., <xref ref-type="bibr" rid="B37">2007</xref>
).</p>
<p>Another potential objection to our interpretation that <italic>C</italic>
(<italic>t</italic>
) shows more efficient integration under degraded listening conditions is that language processing skills may have differed across group. In particular, visual-only accuracy scores suggest that the participants in the −18 dB listening condition (mean = 0.87) may have been better lip readers than the −12 dB (mean = 0.83) or clear condition (mean = 0.79) participants. However, the <italic>t</italic>
-test comparing V-only performance between the −18 and −12 dB conditions was non-significant [<italic>t</italic>
(8) = 0.97, <italic>p</italic>
 = 0.36], as was the <italic>t</italic>
-test comparing performance between the −18 dB and clear condition [<italic>t</italic>
(8) = 1.30, <italic>p</italic>
 = 0.23]. Furthermore, as we shall see, correlations between visual (or auditory) accuracy and peak capacity values were non-significant. Future studies can also address these concerns by including a larger sample of participants and by using a within-subject design.</p>
<p>How well does the capacity measure relate to other measures of language perception and integration? Bivariate correlations were carried out in order to obtain preliminary evidence for the hypothesis that lip-reading ability or visual gain is associated with processing capacity scores. Correlations for the participants were obtained for the maximum capacity value max{<italic>C</italic>
(<italic>t</italic>
)}, the mean accuracy scores, RTs, and enhancement scores as well as information transmitted. The question is which factors obtained in this study, including the availability of sensory information serve as predictors for integration efficiency as measured by the capacity coefficient? The next analyses should provide groundwork for investigating cognitive factors associated with audiovisual gain in future studies.</p>
<p>Thus, the Pearson correlations for the accuracy scores turned out to be marginally significant. Auditory accuracy shares a slight negative correlation with processing capacity (<italic>r</italic>
 = −0.39, <italic>p</italic>
 = 0.077), while visual-only accuracy scores and workload capacity also evidenced a marginal correlation (<italic>r</italic>
 = 0.37, <italic>p</italic>
 = 0.087). This might suggest that higher capacity values are achieved in individuals with lower auditory recognition accuracy but better lip-reading skills, although more evidence is required for this conclusion. The hypothesis that higher capacity values are associated with a greater degree of information transmitted was also supported by the correlation analysis (<italic>r</italic>
 = 0.51, <italic>p</italic>
 < 0.05). While these results are very preliminary due to the small sample size, the emerging picture suggests a negative linkage between traditional measures of audiovisual gain and integration efficiency as measured by workload capacity.</p>
</sec>
</sec>
<sec sec-type="discussion"><title>Discussion</title>
<p>Research into lip-reading and auditory speech perception has demonstrated that the former can noticeably improve accuracy in the latter in a noisy environment (e.g., Sumby and Pollack, <xref ref-type="bibr" rid="B43">1954</xref>
; see also Grant et al., <xref ref-type="bibr" rid="B18">1998</xref>
; and Bergeson and Pisoni, <xref ref-type="bibr" rid="B6">2004</xref>
). Yet, very little was or, for that matter, is known concerning their interactions in the RT domain and nothing seems to be on record with regard to their underlying dynamic mechanisms, such as architecture and stopping rule, across time. Jesse and Massaro (<xref ref-type="bibr" rid="B21">2010</xref>
) did observe evidence for early interactions between visually salient visemes (such as stop consonants) and auditory information in a gating task. Although the authors’ methodology did not assay issues related to parallel versus coactive processing, their findings appear to be consistent with our data supporting an interactive parallel account of audiovisual integration. The DFP methodology provides converging or supplementary information to other data sets where evidence for interactions might emerge, in addition to valuable information concerning systems level attributes such as architecture, decision rule, and capacity.</p>
<p>Measures of architecture and capacity were used to identify a number of mechanisms critical to integration processes. Two fundamental types of parallel systems, separate decisions versus coactive processing capture the first-order aspects of “late stage” versus “early stage” sound–vision integration (e.g., Summerfield, <xref ref-type="bibr" rid="B44">1987</xref>
; Massaro, <xref ref-type="bibr" rid="B25">2004</xref>
; Bernstein, <xref ref-type="bibr" rid="B7">2005</xref>
; Rosenblum, <xref ref-type="bibr" rid="B36">2005</xref>
; van Wassenhove et al., <xref ref-type="bibr" rid="B54">2005</xref>
). The architecture and capacity assessments from the pilot study yielded consistent conclusions: The architectural and capacity analyses support some variety of parallel processing with a first-terminating decision rule. These results, particularly capacity, were robust occurring for all individuals.</p>
<p>One limitation of the pilot study was the presence of only two words in the stimulus set, and the selection of signal-to-noise ratios that allowed us to achieve both high accuracy and selective influence. This set-up allowed us to give a basic assessment of architecture and decision rule, providing us converging evidence to studies that have indicated the presence of parallel separate decisions integration in speech perception (e.g., Bernstein, <xref ref-type="bibr" rid="B7">2005</xref>
; Jesse and Massaro, <xref ref-type="bibr" rid="B21">2010</xref>
). Our design however, came at the expense of not providing an ecologically valid way to assess audiovisual gain and workload capacity. Still, the observation of limited capacity suggests that auditory and visual channels interact, in an inhibitory manner, at certain S/N ratios.</p>
<p>To that end, we designed an experiment to include lower auditory signal-to-noise ratios, a larger set size, as well as multiple talkers. We did this in order to be able to connect our workload capacity results with previous studies in the literature assessing audiovisual gain (e.g., Sumby and Pollack, <xref ref-type="bibr" rid="B43">1954</xref>
). The results from Experiment 1 showed once again that at high S/N ratios (in this case, the clear S/N ratio), capacity was extremely limited. For lower S/N ratios, we observed super capacity for some subjects along with higher audiovisual gain values in the accuracy domain as expected. The workload capacity values observed in this experiment were consistent with the predictions of separate decisions parallel models with interactions (see Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
; Eidels et al., <xref ref-type="bibr" rid="B13">2011</xref>
).</p>
<p>Interestingly, audiovisual literature suggests that integration tends to become more efficient as S/N ratio decreases, and becomes less efficient when the clarity of the auditory signal decreases too much (Ross et al., <xref ref-type="bibr" rid="B37">2007</xref>
), contrary to a strict principle of inverse effectiveness (Meredith and Stein, <xref ref-type="bibr" rid="B29">1983</xref>
), where the prediction would be that integration is a monotonic decreasing function of S/N ratio. This finding appears to especially hold true when the stimulus set size is finite (Ma et al., <xref ref-type="bibr" rid="B23">2009</xref>
). Perhaps there is a S/N window for optimal audiovisual integration, which Ross et al. (<xref ref-type="bibr" rid="B37">2007</xref>
) reported to fall around −10 to −12 dB and ∼−18 dB in our study (smaller set sizes tend to shift the window to lower S/N ratios). Future research will be necessary to explore more deeply, the relation between behavioral and neural measures of audiovisual integration efficiency, although recent research investigating the relation between ERPs and RTs measures has been carried out (Altieri and Wenger, <xref ref-type="bibr" rid="B2">2011</xref>
; Winneke and Phillips, <xref ref-type="bibr" rid="B58">2011</xref>
).</p>
</sec>
<sec><title>Conclusion</title>
<sec><title>Architecture</title>
<p>The emerging picture is of a separate decisions parallel system rather than coactive parallelism. This characteristic favors the concept of late rather than early integration. The pilot study showed that the system is capable of taking advantage of the opportunity to conclude processing with the winner of a race, again for every individual. This inference follows from our architectural and stopping rule analyses. Although it may seem obvious that early termination could occur, it may be recalled that early in the research on rapid short-term memory search it was argued that such brief searches might propel exhaustive processing (e.g., Sternberg, <xref ref-type="bibr" rid="B41">1969</xref>
).</p>
</sec>
<sec><title>Capacity</title>
<p>There are two straightforward causes of the differences in capacity when S/N ratio was manipulated. The first is differences in resources, such as attention that is distributed across the operating channels (e.g., Townsend, <xref ref-type="bibr" rid="B46">1974</xref>
; Townsend and Ashby, <xref ref-type="bibr" rid="B48">1978</xref>
; Bundesen and Habekost, <xref ref-type="bibr" rid="B11">2009</xref>
). Although such an account cannot be ruled out definitively at this point, it seems somewhat improbable that available resources would diminish as S/N ratio changes. A second possibility, and the one favored here, is the presence of facilitatory/inhibitory connections between auditory and visual channels. Cross-channel connections in information processing models are known to effectively diminish or increase efficiency as measured by capacity (Townsend and Wenger, <xref ref-type="bibr" rid="B53">2004b</xref>
; Eidels et al., <xref ref-type="bibr" rid="B13">2011</xref>
).</p>
<p>Tending to bolster that hypothesis, studies using auditory, visual, plus synchronous, and asynchronous audiovisual speech stimuli have shown that the ERP signal resulting from the audiovisual stimuli in the synchronous condition is depressed compared to the ERP arising from the unimodal (A-only and V-only) stimuli (Pilling, <xref ref-type="bibr" rid="B33">2009</xref>
). Ponton et al. (<xref ref-type="bibr" rid="B34">2009</xref>
) used mismatch negativity with EEG and found evidence that feedback from (phonetic) processing in visual brain regions influences auditory processing (see also van Wassenhove et al., <xref ref-type="bibr" rid="B54">2005</xref>
; Winneke and Phillips, <xref ref-type="bibr" rid="B58">2011</xref>
). In a combined RT and ERP study assessing audiovisual integration in younger normal-hearing and older adults, Winneke and Phillips (<xref ref-type="bibr" rid="B58">2011</xref>
) carried out an audiovisual speech discrimination task requiring two-alternative forced-choice responses to spoken words. The auditory S/N ratio was adjusted for each participant in order to equate performance across age groups. Similar to van Wassenhove et al. (<xref ref-type="bibr" rid="B54">2005</xref>
), the authors observed that early N1 and P1 AV ERP peak amplitudes (i.e., occurring upon the onset, or prior to phonetic recognition) for the audiovisual condition were reduced compared to the A-only plus V-only ERP peak amplitudes. Interestingly, this amplitude reduction was slightly greater for older compared to younger adults. An analysis of reaction time data averaged across individual participants further revealed that audiovisual trials produced faster reaction times compared to the unisensory trials, as evidenced by violations of race model predictions in both age groups. Both the reaction time and EEG results provided evidence that neural dynamic interactions between brain regions influence audiovisual integration in speech perception. In fact, the violation of the race model inequality suggests a role for facilitatory interactions. A potentially fruitful direction for future research would be to further investigate the relation between integration efficiency as measured by RTs [i.e., <italic>C</italic>
(<italic>t</italic>
)], and audiovisual versus unisensory peak amplitudes in the ERP signal. Using capacity and EEG to investigate individual differences should also prove to be beneficial (e.g., Altieri and Wenger, <xref ref-type="bibr" rid="B2">2011</xref>
).</p>
<p>Findings such as these add support to the hypothesis that inhibitory/excitatory mechanisms operate between brain regions, even when super-threshold stimuli that yield high accuracy are used (e.g., van Wassenhove et al., <xref ref-type="bibr" rid="B54">2005</xref>
; Pilling, <xref ref-type="bibr" rid="B33">2009</xref>
). The juxtaposition of studies finding evidence for facilitation with the capacity and parallel processing results found here, suggests that bimodal speech perception may vary in fundamental ways at different accuracy levels. While the link between information processing measures (i.e., architecture and capacity) and neural processing remains tenuous, future experiments using DFP methods in conjunction with EEG or fMRI can better investigate the neural underpinnings of efficient audiovisual integration.</p>
<p>One final caveat is that most previous studies employed tasks in which the participants were instructed to report what they “heard” without being asked to report specifically what they “saw.” Such tasks, though natural ecologically, are a bit unusual in basic sensory and perceptual research. In the typical laboratory paradigm the task is either “selective attention” or “divided attention.” In the former, the participant is instructed to focus on the attended signal and ignore, to the extent possible, the unattended stimuli. In the latter, the participants usually must indicate through some type or report procedure that they are indeed paying attention to both stimulus sources. Interestingly, recent work has shown that when participants are instructed to pay attention to just what they hear in a focused (i.e., selective) attention version of the paradigm which includes incongruent visual (McGurk) distractors, the distracting information inhibits processing in the time domain (Altieri, <xref ref-type="bibr" rid="B1">2010</xref>
). We believe that much can be learned about multimodal speech perception through this program.</p>
</sec>
</sec>
<sec><title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back><ack><p>This study was supported by the National Institute of Health (Grant No. DC-00111) and the National Institute of Health Speech Training Grant (No. DC-00012) and by NIMH 057717-07 and AFOSR FA9550-07-1-0078 grants to James T. Townsend. We would like to acknowledge Jeremy Loebach and Luis Hernandez of the Speech Research Laboratory, members of James T. Townsend’s Laboratory at Indiana University, and three anonymous reviewers for their helpful insights. Portions of this study were presented at the Annual Meeting of the Society for Mathematical Psychology held in Washington, DC in 2008, and were included in the first author’s doctoral dissertation.</p>
</ack>
<fn-group><fn id="fn1"><p><sup>1</sup>
The bulk of research combining experimentation and modeling is due to Massaro and colleagues (e.g., 1987; 2004). In addition, influential models of bimodal speech perception have been put forth by Braida (<xref ref-type="bibr" rid="B10">1991</xref>
) and Grant et al. (<xref ref-type="bibr" rid="B18">1998</xref>
) (see also Grant, <xref ref-type="bibr" rid="B17">2002</xref>
). However, these models were not designed for and therefore cannot adjudicate the critical issues examined here, although Massaro (<xref ref-type="bibr" rid="B25">2004</xref>
) addressed the issue of “convergent” (coactive) versus “non-convergent” (parallel) audiovisual integration in a qualitative manner.</p>
</fn>
<fn id="fn2"><p><sup>2</sup>
The term “architecture” is used here in a general sense and does not rule out the possibility that processing might be parallel in one instance yet serial under different task conditions.</p>
</fn>
</fn-group>
<ref-list><title>References</title>
<ref id="B1"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Altieri</surname>
<given-names>N.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <source>Toward a Unified Theory of Audiovisual Integration in Speech Perception</source>
. Doctoral Dissertation, Indiana University, <publisher-loc>Bloomington, IN</publisher-loc>
</mixed-citation>
</ref>
<ref id="B2"><mixed-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Altieri</surname>
<given-names>N.</given-names>
</name>
<name><surname>Wenger</surname>
<given-names>M. J.</given-names>
</name>
</person-group>
 (<year>2011</year>
). <article-title>“Neural and information processing measures of audiovisual integration,”</article-title>
 in <conf-name>Conference of the Vision Sciences Society, Poster Presentation</conf-name>
, <publisher-loc>Naples, FL</publisher-loc>
</mixed-citation>
</ref>
<ref id="B3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Arnold</surname>
<given-names>D. H.</given-names>
</name>
<name><surname>Tear</surname>
<given-names>M.</given-names>
</name>
<name><surname>Schindel</surname>
<given-names>R.</given-names>
</name>
<name><surname>Roseboom</surname>
<given-names>W.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>Audio-visual speech cue combination</article-title>
. <source>PLoS ONE</source>
 <volume>5</volume>
, <fpage>e10217</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0010217</pub-id>
<pub-id pub-id-type="pmid">20419130</pub-id>
</mixed-citation>
</ref>
<ref id="B4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Barutchu</surname>
<given-names>A.</given-names>
</name>
<name><surname>Crewther</surname>
<given-names>D. P.</given-names>
</name>
<name><surname>Crewther</surname>
<given-names>S. G.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>The race that precedes coactivation: development of multisensory facilitation in children</article-title>
. <source>Dev. Sci.</source>
 <volume>12</volume>
, <fpage>464</fpage>
–<lpage>473</lpage>
<pub-id pub-id-type="doi">10.1111/j.1467-7687.2008.00782.x</pub-id>
<pub-id pub-id-type="pmid">19371371</pub-id>
</mixed-citation>
</ref>
<ref id="B5"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Barutchu</surname>
<given-names>A.</given-names>
</name>
<name><surname>Danaher</surname>
<given-names>J.</given-names>
</name>
<name><surname>Crewther</surname>
<given-names>S. G.</given-names>
</name>
<name><surname>Innes-Brown</surname>
<given-names>H.</given-names>
</name>
<name><surname>Shivdasani</surname>
<given-names>M. N.</given-names>
</name>
<name><surname>Paolini</surname>
<given-names>A. G.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>Audiovisual integration in noise by children and adults</article-title>
. <source>J. Exp. Child. Psychol.</source>
 <volume>105</volume>
, <fpage>38</fpage>
–<lpage>50</lpage>
<pub-id pub-id-type="doi">10.1016/j.jecp.2009.08.005</pub-id>
<pub-id pub-id-type="pmid">19822327</pub-id>
</mixed-citation>
</ref>
<ref id="B6"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Bergeson</surname>
<given-names>T. R.</given-names>
</name>
<name><surname>Pisoni</surname>
<given-names>D. B.</given-names>
</name>
</person-group>
 (<year>2004</year>
). <article-title>“Audiovisual speech perception in deaf adults and children following cochlear implantation,”</article-title>
 in <source>The Handbook of Multisensory Processes</source>
, eds <person-group person-group-type="editor"><name><surname>Calvert</surname>
<given-names>G. A.</given-names>
</name>
<name><surname>Spence</surname>
<given-names>C.</given-names>
</name>
<name><surname>Stein</surname>
<given-names>B. E.</given-names>
</name>
</person-group>
 (<publisher-loc>Cambridge, MA</publisher-loc>
: <publisher-name>The MIT Press</publisher-name>
), <fpage>153</fpage>
–<lpage>176</lpage>
</mixed-citation>
</ref>
<ref id="B7"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Bernstein</surname>
<given-names>L. E.</given-names>
</name>
</person-group>
 (<year>2005</year>
). <article-title>“Phonetic perception by the speech perceiving brain,”</article-title>
 in <source>The Handbook of Speech Perception</source>
, eds <person-group person-group-type="editor"><name><surname>Pisoni</surname>
<given-names>D. B.</given-names>
</name>
<name><surname>Remez</surname>
<given-names>R. E.</given-names>
</name>
</person-group>
 (<publisher-loc>Malden, MA</publisher-loc>
: <publisher-name>Blackwell Publishing</publisher-name>
), <fpage>79</fpage>
–<lpage>98</lpage>
</mixed-citation>
</ref>
<ref id="B8"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Bernstein</surname>
<given-names>L. E.</given-names>
</name>
<name><surname>Auer</surname>
<given-names>E. T.</given-names>
</name>
<name><surname>Moore</surname>
<given-names>J. K.</given-names>
</name>
</person-group>
 (<year>2004</year>
). <article-title>“Audiovisual speech binding: convergence or association?”</article-title>
 in <source>Handbook of Multisensory Processing</source>
, eds <person-group person-group-type="editor"><name><surname>Calvert</surname>
<given-names>G. A.</given-names>
</name>
<name><surname>Spence</surname>
<given-names>C.</given-names>
</name>
<name><surname>Stein</surname>
<given-names>B. E.</given-names>
</name>
</person-group>
 (<publisher-loc>Cambridge, MA</publisher-loc>
: <publisher-name>MIT Press</publisher-name>
), <fpage>203</fpage>
–<lpage>223</lpage>
</mixed-citation>
</ref>
<ref id="B9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Berryhill</surname>
<given-names>M.</given-names>
</name>
<name><surname>Kveraga</surname>
<given-names>K.</given-names>
</name>
<name><surname>Webb</surname>
<given-names>L.</given-names>
</name>
<name><surname>Hughes</surname>
<given-names>H. C.</given-names>
</name>
</person-group>
 (<year>2007</year>
). <article-title>Multimodal access to verbal name codes</article-title>
. <source>Percept. Psychophys.</source>
 <volume>69</volume>
, <fpage>628</fpage>
–<lpage>640</lpage>
<pub-id pub-id-type="doi">10.3758/BF03193920</pub-id>
<pub-id pub-id-type="pmid">17727116</pub-id>
</mixed-citation>
</ref>
<ref id="B10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braida</surname>
<given-names>L. D.</given-names>
</name>
</person-group>
 (<year>1991</year>
). <article-title>Crossmodal integration in the identification of consonant segments</article-title>
. <source>Q. J. Exp. Psychol.</source>
 <volume>43A</volume>
, <fpage>647</fpage>
–<lpage>677</lpage>
</mixed-citation>
</ref>
<ref id="B11"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Bundesen</surname>
<given-names>C.</given-names>
</name>
<name><surname>Habekost</surname>
<given-names>T.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <source>Principles of Visual Attention: Linking Mind and Brain</source>
. <publisher-loc>Oxford</publisher-loc>
: <publisher-name>Oxford University Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Colonius</surname>
<given-names>H.</given-names>
</name>
</person-group>
 (<year>1990</year>
). <article-title>Possibly dependent probability summation of reaction time</article-title>
. <source>J. Math. Psychol.</source>
 <volume>34</volume>
, <fpage>253</fpage>
–<lpage>275</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2496(90)90032-5</pub-id>
</mixed-citation>
</ref>
<ref id="B13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eidels</surname>
<given-names>A.</given-names>
</name>
<name><surname>Houpt</surname>
<given-names>J.</given-names>
</name>
<name><surname>Altieri</surname>
<given-names>N.</given-names>
</name>
<name><surname>Pei</surname>
<given-names>L.</given-names>
</name>
<name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<year>2011</year>
). <article-title>Nice guys finish fast and bad guys finish last: A theory of interactive parallel processing</article-title>
. <source>J. Math. Psychol.</source>
 <volume>55</volume>
, <fpage>176</fpage>
–<lpage>190</lpage>
<pub-id pub-id-type="doi">10.1016/j.jmp.2010.11.003</pub-id>
<pub-id pub-id-type="pmid">21516183</pub-id>
</mixed-citation>
</ref>
<ref id="B14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fournier</surname>
<given-names>L. R.</given-names>
</name>
<name><surname>Eriksen</surname>
<given-names>C. W.</given-names>
</name>
</person-group>
 (<year>1990</year>
). <article-title>Coactivation in the perception of redundant targets</article-title>
. <source>J. Exp. Psychol. Hum. Percept. Perform.</source>
 <volume>16</volume>
, <fpage>538</fpage>
–<lpage>550</lpage>
<pub-id pub-id-type="doi">10.1037/0096-1523.16.3.538</pub-id>
<pub-id pub-id-type="pmid">2144569</pub-id>
</mixed-citation>
</ref>
<ref id="B15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fowler</surname>
<given-names>C. A.</given-names>
</name>
<name><surname>Dekle</surname>
<given-names>D. J.</given-names>
</name>
</person-group>
 (<year>1991</year>
). <article-title>Listening with eye and hand: cross-modal contributions to speech perception</article-title>
. <source>J. Exp. Psychol. Hum. Percept. Perform</source>
 <volume>17</volume>
, <fpage>816</fpage>
–<lpage>828</lpage>
<pub-id pub-id-type="doi">10.1037/0096-1523.17.3.816</pub-id>
<pub-id pub-id-type="pmid">1834793</pub-id>
</mixed-citation>
</ref>
<ref id="B16"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Fowler</surname>
<given-names>C. A.</given-names>
</name>
<name><surname>Rosenblum</surname>
<given-names>L. D.</given-names>
</name>
</person-group>
 (<year>1991</year>
). <article-title>“Perception of the phonetic gesture,”</article-title>
 in <source>Modularity and the Motor Theory of Speech Perception</source>
, eds <person-group person-group-type="editor"><name><surname>Mattingly</surname>
<given-names>I. G.</given-names>
</name>
<name><surname>Studdert-Kennedy</surname>
<given-names>M.</given-names>
</name>
</person-group>
 (<publisher-loc>Hillsdale, NJ</publisher-loc>
: <publisher-name>Lawrence Erlbaum</publisher-name>
), <fpage>33</fpage>
–<lpage>59</lpage>
</mixed-citation>
</ref>
<ref id="B17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Grant</surname>
<given-names>K. W.</given-names>
</name>
</person-group>
 (<year>2002</year>
). <article-title>Measures of auditory-visual integration for speech understanding: a theoretical perspective</article-title>
. <source>J. Acoust. Soc. Am.</source>
 <volume>112</volume>
, <fpage>30</fpage>
–<lpage>33</lpage>
<pub-id pub-id-type="doi">10.1121/1.1482076</pub-id>
<pub-id pub-id-type="pmid">12141356</pub-id>
</mixed-citation>
</ref>
<ref id="B18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Grant</surname>
<given-names>K. W.</given-names>
</name>
<name><surname>Walden</surname>
<given-names>B. E.</given-names>
</name>
<name><surname>Seitz</surname>
<given-names>P. F.</given-names>
</name>
</person-group>
 (<year>1998</year>
). <article-title>Auditory-visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition, and auditory-visual integration</article-title>
. <source>J. Acoust. Soc. Am.</source>
 <volume>103</volume>
, <fpage>2677</fpage>
–<lpage>2690</lpage>
<pub-id pub-id-type="doi">10.1121/1.422788</pub-id>
<pub-id pub-id-type="pmid">9604361</pub-id>
</mixed-citation>
</ref>
<ref id="B19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Green</surname>
<given-names>K. P.</given-names>
</name>
<name><surname>Miller</surname>
<given-names>J. L.</given-names>
</name>
</person-group>
 (<year>1985</year>
). <article-title>On the role of visual rate information in phonetic perception</article-title>
. <source>Percept. Psychophys.</source>
 <volume>38</volume>
, <fpage>269</fpage>
–<lpage>276</lpage>
<pub-id pub-id-type="doi">10.3758/BF03198847</pub-id>
<pub-id pub-id-type="pmid">4088819</pub-id>
</mixed-citation>
</ref>
<ref id="B20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Grice</surname>
<given-names>G. R.</given-names>
</name>
<name><surname>Canham</surname>
<given-names>L.</given-names>
</name>
<name><surname>Gwynne</surname>
<given-names>J. W.</given-names>
</name>
</person-group>
 (<year>1984</year>
). <article-title>Absence of a redundant-signals effect in a reaction time task with divided attention</article-title>
. <source>Percept. Psychophys.</source>
 <volume>36</volume>
, <fpage>565</fpage>
–<lpage>570</lpage>
<pub-id pub-id-type="doi">10.3758/BF03207517</pub-id>
<pub-id pub-id-type="pmid">6535102</pub-id>
</mixed-citation>
</ref>
<ref id="B21"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jesse</surname>
<given-names>A.</given-names>
</name>
<name><surname>Massaro</surname>
<given-names>D. W.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>The temporal distribution of information in audiovisual spoken-word identification</article-title>
. <source>Atten. Percept. Psychophys.</source>
 <volume>72</volume>
, <fpage>209</fpage>
–<lpage>225</lpage>
<pub-id pub-id-type="doi">10.3758/APP.72.1.209</pub-id>
<pub-id pub-id-type="pmid">20045890</pub-id>
</mixed-citation>
</ref>
<ref id="B22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liberman</surname>
<given-names>A. M.</given-names>
</name>
<name><surname>Mattingly</surname>
<given-names>I. G.</given-names>
</name>
</person-group>
 (<year>1985</year>
). <article-title>The motor theory of speech perception</article-title>
. <source>Cognition</source>
 <volume>21</volume>
, <fpage>1</fpage>
–<lpage>36</lpage>
<pub-id pub-id-type="doi">10.1016/0010-0277(85)90021-6</pub-id>
<pub-id pub-id-type="pmid">4075760</pub-id>
</mixed-citation>
</ref>
<ref id="B23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname>
<given-names>W. J.</given-names>
</name>
<name><surname>Zhou</surname>
<given-names>X.</given-names>
</name>
<name><surname>Ross</surname>
<given-names>L. A.</given-names>
</name>
<name><surname>Foxe</surname>
<given-names>J. J.</given-names>
</name>
<name><surname>Parra</surname>
<given-names>L. C.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Lip reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space</article-title>
. <source>PLoS ONE</source>
 <volume>4</volume>
, <fpage>e4638</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0004638</pub-id>
<pub-id pub-id-type="pmid">19259259</pub-id>
</mixed-citation>
</ref>
<ref id="B24"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Massaro</surname>
<given-names>D. W.</given-names>
</name>
</person-group>
 (<year>1987</year>
). <article-title>“Speech perception by ear and eye,”</article-title>
 in <source>Hearing by Eye: The Psychology of Lip-Reading</source>
, eds <person-group person-group-type="editor"><name><surname>Dodd</surname>
<given-names>B.</given-names>
</name>
<name><surname>Campbell</surname>
<given-names>R.</given-names>
</name>
</person-group>
 (<publisher-loc>Hillsdale, NJ</publisher-loc>
: <publisher-name>Lawrence Erlbaum</publisher-name>
), <fpage>53</fpage>
–<lpage>83</lpage>
</mixed-citation>
</ref>
<ref id="B25"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Massaro</surname>
<given-names>D. W.</given-names>
</name>
</person-group>
 (<year>2004</year>
). <article-title>“From multisensory integration to talking heads and language learning,”</article-title>
 in <source>The Handbook of Multisensory Processes</source>
, eds <person-group person-group-type="editor"><name><surname>Calvert</surname>
<given-names>G. A.</given-names>
</name>
<name><surname>Spence</surname>
<given-names>C.</given-names>
</name>
<name><surname>Stein</surname>
<given-names>B. E.</given-names>
</name>
</person-group>
 (<publisher-loc>Cambridge, MA</publisher-loc>
: <publisher-name>The MIT Press</publisher-name>
), <fpage>153</fpage>
–<lpage>176</lpage>
</mixed-citation>
</ref>
<ref id="B26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Massaro</surname>
<given-names>D. W.</given-names>
</name>
<name><surname>Cohen</surname>
<given-names>M. M.</given-names>
</name>
</person-group>
 (<year>2000</year>
). <article-title>Tests of auditory-visual integration efficiency within the framework of the fuzzy logical model of perception</article-title>
. <source>J. Acoust. Soc. Am.</source>
 <volume>108</volume>
, <fpage>784</fpage>
–<lpage>789</lpage>
<pub-id pub-id-type="doi">10.1121/1.429611</pub-id>
<pub-id pub-id-type="pmid">10955645</pub-id>
</mixed-citation>
</ref>
<ref id="B27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Massaro</surname>
<given-names>D. W.</given-names>
</name>
<name><surname>Cohen</surname>
<given-names>M. M.</given-names>
</name>
<name><surname>Smeele</surname>
<given-names>P. M. T.</given-names>
</name>
</person-group>
 (<year>1996</year>
). <article-title>Perception of asynchronous and conflicting visible and auditory speech</article-title>
. <source>J. Acoust. Soc. Am.</source>
 <volume>100</volume>
, <fpage>1777</fpage>
–<lpage>1786</lpage>
<pub-id pub-id-type="doi">10.1121/1.417398</pub-id>
<pub-id pub-id-type="pmid">8817903</pub-id>
</mixed-citation>
</ref>
<ref id="B28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McGurk</surname>
<given-names>H.</given-names>
</name>
<name><surname>MacDonald</surname>
<given-names>J. W.</given-names>
</name>
</person-group>
 (<year>1976</year>
). <article-title>Hearing lips and seeing voices</article-title>
. <source>Nature</source>
 <volume>264</volume>
, <fpage>746</fpage>
–<lpage>748</lpage>
<pub-id pub-id-type="doi">10.1038/264746a0</pub-id>
<pub-id pub-id-type="pmid">1012311</pub-id>
</mixed-citation>
</ref>
<ref id="B29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Meredith</surname>
<given-names>M. A.</given-names>
</name>
<name><surname>Stein</surname>
<given-names>B. E.</given-names>
</name>
</person-group>
 (<year>1983</year>
). <article-title>Interactions among converging sensory inputs in the superior colliculus</article-title>
. <source>Science</source>
 <volume>221</volume>
, <fpage>389</fpage>
–<lpage>391</lpage>
<pub-id pub-id-type="doi">10.1126/science.6867718</pub-id>
<pub-id pub-id-type="pmid">6867718</pub-id>
</mixed-citation>
</ref>
<ref id="B30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname>
<given-names>J.</given-names>
</name>
</person-group>
 (<year>1982</year>
). <article-title>Divided attention: evidence for coactivation with redundant signals</article-title>
. <source>Cogn. Psychol.</source>
 <volume>14</volume>
, <fpage>247</fpage>
–<lpage>279</lpage>
<pub-id pub-id-type="doi">10.1016/0010-0285(82)90010-X</pub-id>
<pub-id pub-id-type="pmid">7083803</pub-id>
</mixed-citation>
</ref>
<ref id="B31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname>
<given-names>J.</given-names>
</name>
</person-group>
 (<year>1986</year>
). <article-title>Time course of coactivation in bimodal divided attention</article-title>
. <source>Percept. Psychophys.</source>
 <volume>40</volume>
, <fpage>331</fpage>
–<lpage>343</lpage>
<pub-id pub-id-type="doi">10.3758/BF03203025</pub-id>
<pub-id pub-id-type="pmid">3786102</pub-id>
</mixed-citation>
</ref>
<ref id="B32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Molholm</surname>
<given-names>S.</given-names>
</name>
<name><surname>Ritter</surname>
<given-names>W.</given-names>
</name>
<name><surname>Javitt</surname>
<given-names>D. C.</given-names>
</name>
<name><surname>Foxe</surname>
<given-names>J. J.</given-names>
</name>
</person-group>
 (<year>2004</year>
). <article-title>Multisensory visual–auditory object recognition in humans: a high-density electrical mapping study</article-title>
. <source>Cereb. Cortex</source>
 <volume>14</volume>
, <fpage>452</fpage>
–<lpage>465</lpage>
<pub-id pub-id-type="doi">10.1093/cercor/bhh007</pub-id>
<pub-id pub-id-type="pmid">15028649</pub-id>
</mixed-citation>
</ref>
<ref id="B33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pilling</surname>
<given-names>M.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Auditory event-related potentials (ERPs) in audiovisual speech perception</article-title>
. <source>J. Speech Lang. Hear. Res.</source>
 <volume>52</volume>
, <fpage>1073</fpage>
–<lpage>1081</lpage>
<pub-id pub-id-type="doi">10.1044/1092-4388(2009/07-0276)</pub-id>
<pub-id pub-id-type="pmid">19641083</pub-id>
</mixed-citation>
</ref>
<ref id="B34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ponton</surname>
<given-names>C. W.</given-names>
</name>
<name><surname>Bernstein</surname>
<given-names>L. E.</given-names>
</name>
<name><surname>Auer</surname>
<given-names>E. T.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Mismatch negativity with visual-only and audiovisual speech</article-title>
. <source>Brain Topogr.</source>
 <volume>21</volume>
, <fpage>207</fpage>
–<lpage>215</lpage>
<pub-id pub-id-type="doi">10.1007/s10548-009-0094-5</pub-id>
<pub-id pub-id-type="pmid">19404730</pub-id>
</mixed-citation>
</ref>
<ref id="B35"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Raab</surname>
<given-names>D. H.</given-names>
</name>
</person-group>
 (<year>1962</year>
). <article-title>Statistical facilitation of simple reaction times</article-title>
. <source>Trans. N. Y. Acad. Sci.</source>
 <volume>24</volume>
, <fpage>574</fpage>
–<lpage>590</lpage>
<pub-id pub-id-type="pmid">14489538</pub-id>
</mixed-citation>
</ref>
<ref id="B36"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Rosenblum</surname>
<given-names>L. D.</given-names>
</name>
</person-group>
 (<year>2005</year>
). <article-title>“Primacy of multimodal speech perception,”</article-title>
 in <source>The Handbook of Speech Perception</source>
, eds <person-group person-group-type="editor"><name><surname>Pisoni</surname>
<given-names>D. B.</given-names>
</name>
<name><surname>Remez</surname>
<given-names>R. E.</given-names>
</name>
</person-group>
 (<publisher-loc>Malden, MA</publisher-loc>
: <publisher-name>Blackwell Publishing</publisher-name>
), <fpage>51</fpage>
–<lpage>78</lpage>
</mixed-citation>
</ref>
<ref id="B37"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ross</surname>
<given-names>L. A.</given-names>
</name>
<name><surname>Saint-Amour</surname>
<given-names>D.</given-names>
</name>
<name><surname>Leavitt</surname>
<given-names>V. M.</given-names>
</name>
<name><surname>Javitt</surname>
<given-names>D. C.</given-names>
</name>
<name><surname>Foxe</surname>
<given-names>J. J.</given-names>
</name>
</person-group>
 (<year>2007</year>
). <article-title>Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments</article-title>
. <source>Cereb. Cortex</source>
 <volume>17</volume>
, <fpage>1147</fpage>
–<lpage>1153</lpage>
<pub-id pub-id-type="doi">10.1093/cercor/bhl024</pub-id>
<pub-id pub-id-type="pmid">16785256</pub-id>
</mixed-citation>
</ref>
<ref id="B38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sekiyama</surname>
<given-names>K.</given-names>
</name>
<name><surname>Tohkura</surname>
<given-names>Y.</given-names>
</name>
</person-group>
 (<year>1993</year>
). <article-title>Inter-language differences in the influence of visual cues in speech perception</article-title>
. <source>J. Phon.</source>
 <volume>21</volume>
, <fpage>427</fpage>
–<lpage>444</lpage>
</mixed-citation>
</ref>
<ref id="B39"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Sherffert</surname>
<given-names>S.</given-names>
</name>
<name><surname>Lachs</surname>
<given-names>L.</given-names>
</name>
<name><surname>Hernandez</surname>
<given-names>L. R.</given-names>
</name>
</person-group>
 (<year>1997</year>
). <article-title>“The Hoosier audiovisual multi-talker database,”</article-title>
 In <source>Research on Spoken Language Processing Progress Report No. 21</source>
, <publisher-loc>Bloomington, IN</publisher-loc>
: <publisher-name>Speech Research Laboratory, Psychology (Department), Indiana University</publisher-name>
</mixed-citation>
</ref>
<ref id="B40"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sommers</surname>
<given-names>M.</given-names>
</name>
<name><surname>Tye-Murray</surname>
<given-names>N.</given-names>
</name>
<name><surname>Spehar</surname>
<given-names>B.</given-names>
</name>
</person-group>
 (<year>2005</year>
). <article-title>Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults</article-title>
. <source>Ear Hear.</source>
 <volume>26</volume>
, <fpage>263</fpage>
–<lpage>275</lpage>
<pub-id pub-id-type="doi">10.1097/00003446-200506000-00003</pub-id>
<pub-id pub-id-type="pmid">15937408</pub-id>
</mixed-citation>
</ref>
<ref id="B41"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sternberg</surname>
<given-names>S.</given-names>
</name>
</person-group>
 (<year>1969</year>
). <article-title>The discovery of processing stages: extensions of Donder’s method</article-title>
. <source>Acta Psychol. (Amst.)</source>
 <volume>30</volume>
, <fpage>276</fpage>
–<lpage>315</lpage>
<pub-id pub-id-type="doi">10.1016/0001-6918(69)90055-9</pub-id>
</mixed-citation>
</ref>
<ref id="B42"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname>
<given-names>R. A.</given-names>
</name>
<name><surname>James</surname>
<given-names>T. W.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Audiovisual integration in human superior temporal sulcus: inverse effectiveness and the neural processing of speech and object recognition</article-title>
. <source>Neuroimage</source>
 <volume>44</volume>
, <fpage>1210</fpage>
–<lpage>1223</lpage>
<pub-id pub-id-type="doi">10.1016/j.neuroimage.2008.09.034</pub-id>
<pub-id pub-id-type="pmid">18973818</pub-id>
</mixed-citation>
</ref>
<ref id="B43"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sumby</surname>
<given-names>W. H.</given-names>
</name>
<name><surname>Pollack</surname>
<given-names>I.</given-names>
</name>
</person-group>
 (<year>1954</year>
). <article-title>Visual contribution to speech intelligibility in noise</article-title>
. <source>J. Acoust. Soc. Am.</source>
 <volume>26</volume>
, <fpage>12</fpage>
–<lpage>15</lpage>
<pub-id pub-id-type="doi">10.1121/1.1907309</pub-id>
</mixed-citation>
</ref>
<ref id="B44"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Summerfield</surname>
<given-names>Q.</given-names>
</name>
</person-group>
 (<year>1987</year>
). <article-title>“Some preliminaries to a comprehensive account of audio-visual speech perception,”</article-title>
 in <source>The Psychology of Lip-Reading</source>
, eds <person-group person-group-type="editor"><name><surname>Dodd</surname>
<given-names>B.</given-names>
</name>
<name><surname>Campbell</surname>
<given-names>R.</given-names>
</name>
</person-group>
 (<publisher-loc>Hillsdale, NJ</publisher-loc>
: <publisher-name>LEA</publisher-name>
), <fpage>3</fpage>
–<lpage>50</lpage>
</mixed-citation>
</ref>
<ref id="B45"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<year>1971</year>
). <article-title>A note on the identifiability of parallel and serial processes</article-title>
. <source>Percept. Psychophys.</source>
 <volume>10</volume>
, <fpage>161</fpage>
–<lpage>163</lpage>
<pub-id pub-id-type="doi">10.3758/BF03205778</pub-id>
</mixed-citation>
</ref>
<ref id="B46"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<year>1974</year>
). <article-title>“Issues and models concerning the processing of a finite number of inputs,”</article-title>
 in <source>Human Information Processing: Tutorials in Performance and Cognition</source>
, ed. <person-group person-group-type="editor"><name><surname>Kantowitz</surname>
<given-names>B. H.</given-names>
</name>
</person-group>
 (<publisher-loc>Hillsdale, NJ</publisher-loc>
: <publisher-name>Erlbaum Press</publisher-name>
), <fpage>133</fpage>
–<lpage>168</lpage>
</mixed-citation>
</ref>
<ref id="B47"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<year>1990</year>
). <article-title>Truth and consequences of ordinal differences in statistical distributions: toward a theory of hierarchical inference</article-title>
. <source>Psychol. Bull.</source>
 <volume>108</volume>
, <fpage>551</fpage>
–<lpage>567</lpage>
<pub-id pub-id-type="doi">10.1037/0033-2909.108.3.551</pub-id>
<pub-id pub-id-type="pmid">2270240</pub-id>
</mixed-citation>
</ref>
<ref id="B48"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
<name><surname>Ashby</surname>
<given-names>F. G.</given-names>
</name>
</person-group>
 (<year>1978</year>
). <article-title>“Methods of modeling capacity in simple processing systems,”</article-title>
 in <source>Cognitive Theory</source>
, Vol. <volume>3</volume>
, eds <person-group person-group-type="editor"><name><surname>Castellan</surname>
<given-names>J.</given-names>
</name>
<name><surname>Restle</surname>
<given-names>F.</given-names>
</name>
</person-group>
 (<publisher-loc>Hillsdale, NJ</publisher-loc>
: <publisher-name>Erlbaum Associates</publisher-name>
), <fpage>200</fpage>
–<lpage>239</lpage>
</mixed-citation>
</ref>
<ref id="B49"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
<name><surname>Ashby</surname>
<given-names>F. G.</given-names>
</name>
</person-group>
 (<year>1983</year>
). <source>The Stochastic Modeling of Elementary Psychological Processes</source>
. <publisher-loc>Cambridge</publisher-loc>
: <publisher-name>Cambridge University Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B50"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
<name><surname>Nozawa</surname>
<given-names>G.</given-names>
</name>
</person-group>
 (<year>1995</year>
). <article-title>Spatio-temporal properties of elementary perception: an investigation of parallel, serial, and coactive theories</article-title>
. <source>J. Math. Psychol.</source>
 <volume>39</volume>
, <fpage>321</fpage>
–<lpage>359</lpage>
<pub-id pub-id-type="doi">10.1006/jmps.1995.1033</pub-id>
</mixed-citation>
</ref>
<ref id="B51"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
<name><surname>Schweickert</surname>
<given-names>R.</given-names>
</name>
</person-group>
 (<year>1989</year>
). <article-title>Toward the trichotomy method: laying the foundation of stochastic mental networks</article-title>
. <source>J. Math. Psychol.</source>
 <volume>33</volume>
, <fpage>309</fpage>
–<lpage>327</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2496(89)90012-6</pub-id>
</mixed-citation>
</ref>
<ref id="B52"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
<name><surname>Wenger</surname>
<given-names>M. J.</given-names>
</name>
</person-group>
 (<year>2004a</year>
). <article-title>The serial-parallel dilemma: a case study in a linkage of theory and method</article-title>
. <source>Psychon. Bull. Rev.</source>
 <volume>11</volume>
, <fpage>391</fpage>
–<lpage>418</lpage>
<pub-id pub-id-type="doi">10.3758/BF03196588</pub-id>
<pub-id pub-id-type="pmid">15376788</pub-id>
</mixed-citation>
</ref>
<ref id="B53"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
<name><surname>Wenger</surname>
<given-names>M. J.</given-names>
</name>
</person-group>
 (<year>2004b</year>
). <article-title>A theory of interactive parallel processing: new capacity measures and predictions for a response time inequality series</article-title>
. <source>Psychol. Rev.</source>
 <volume>111</volume>
, <fpage>1003</fpage>
–<lpage>1035</lpage>
<pub-id pub-id-type="doi">10.1037/0033-295X.111.4.1003</pub-id>
<pub-id pub-id-type="pmid">15482071</pub-id>
</mixed-citation>
</ref>
<ref id="B54"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Wassenhove</surname>
<given-names>V.</given-names>
</name>
<name><surname>Grant</surname>
<given-names>K. W.</given-names>
</name>
<name><surname>Poeppel</surname>
<given-names>D.</given-names>
</name>
</person-group>
 (<year>2005</year>
). <article-title>Visual speech speeds up the neural processing of auditory speech</article-title>
. <source>Proc. Natl. Acad. Sci. U.S.A.</source>
 <volume>102</volume>
, <fpage>1181</fpage>
–<lpage>1186</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0408949102</pub-id>
<pub-id pub-id-type="pmid">15647358</pub-id>
</mixed-citation>
</ref>
<ref id="B55"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Walker</surname>
<given-names>S.</given-names>
</name>
<name><surname>Bruce</surname>
<given-names>V.</given-names>
</name>
<name><surname>O’Malley</surname>
<given-names>C.</given-names>
</name>
</person-group>
 (<year>1995</year>
). <article-title>Facial identity and facial speech processing: familiar faces and voices in the McGurk effect</article-title>
. <source>Percept. Psychophys.</source>
 <volume>59</volume>
, <fpage>1124</fpage>
–<lpage>1133</lpage>
<pub-id pub-id-type="doi">10.3758/BF03208369</pub-id>
<pub-id pub-id-type="pmid">8539088</pub-id>
</mixed-citation>
</ref>
<ref id="B56"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wenger</surname>
<given-names>M. J.</given-names>
</name>
<name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<year>2000</year>
). <article-title>Basic response time tools for studying general processing capacity in attention, perception, and cognition</article-title>
. <source>J. Gen. Psychol.</source>
 <volume>127</volume>
, <fpage>67</fpage>
–<lpage>99</lpage>
<pub-id pub-id-type="doi">10.1080/00221300009598571</pub-id>
<pub-id pub-id-type="pmid">10695952</pub-id>
</mixed-citation>
</ref>
<ref id="B57"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Wenger</surname>
<given-names>M. J.</given-names>
</name>
<name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<year>2001</year>
). <article-title>“Faces as gestalt stimuli: process characteristics,”</article-title>
 in <source>Computational, Geometric, and Process Perspectives on Facial Cognition</source>
, eds <person-group person-group-type="editor"><name><surname>Wenger</surname>
<given-names>M. J.</given-names>
</name>
<name><surname>Townsend</surname>
<given-names>J. T.</given-names>
</name>
</person-group>
 (<publisher-loc>Mahwah, NJ</publisher-loc>
: <publisher-name>Erlbaum Press</publisher-name>
), <fpage>229</fpage>
–<lpage>284</lpage>
</mixed-citation>
</ref>
<ref id="B58"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Winneke</surname>
<given-names>A. H.</given-names>
</name>
<name><surname>Phillips</surname>
<given-names>N. A.</given-names>
</name>
</person-group>
 (<year>2011</year>
). <article-title>Does audiovisual speech offer a fountain of youth for old ears? An event-related brain potential study of age differences in audiovisual speech perception</article-title>
. <source>Psychol. Aging</source>
 <volume>26</volume>
, <fpage>427</fpage>
–<lpage>438</lpage>
<pub-id pub-id-type="doi">10.1037/a0021683</pub-id>
<pub-id pub-id-type="pmid">21443357</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
<affiliations><list><country><li>États-Unis</li>
</country>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Altieri, Nicholas" sort="Altieri, Nicholas" uniqKey="Altieri N" first="Nicholas" last="Altieri">Nicholas Altieri</name>
</noRegion>
<name sortKey="Townsend, James T" sort="Townsend, James T" uniqKey="Townsend J" first="James T." last="Townsend">James T. Townsend</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/HapticV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001C36 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001C36 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    HapticV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:3180170
   |texte=   An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:21980314" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a HapticV1

This area was generated with Dilib version V0.6.23.
Data generation: Mon Jun 13 01:09:46 2016. Site generation: Wed Mar 6 09:54:07 2024

	Serveur d'exploration sur les dispositifs haptiques
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur les dispositifs haptiques

An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki