HapticV1, Ncbi, Merge, bibRecord, 003163

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood

Identifieur interne : 003163 ( Ncbi/Merge ); précédent : 003162; suivant : 003164

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood

Auteurs : Pedram Daee [Iran] ; Maryam S. Mirian [Iran] ; Majid Nili Ahmadabadi [Iran]

Source :

PLoS ONE [ 1932-6203 ] ; 2014.

RBID : PMC:4110011

Abstract

In a multisensory task, human adults integrate information from different sensory modalities -behaviorally in an optimal Bayesian fashion- while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities -i.e. selection- at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110011

DOI: 10.1371/journal.pone.0103143
PubMed: 25058591
PubMed Central: 4110011

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: 000338
to stream Pmc, to step Curation: 000338
to stream Pmc, to step Checkpoint: 000995

Links to Exploration step

PMC:4110011

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood</title>
<author><name sortKey="Daee, Pedram" sort="Daee, Pedram" uniqKey="Daee P" first="Pedram" last="Daee">Pedram Daee</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Mirian, Maryam S" sort="Mirian, Maryam S" uniqKey="Mirian M" first="Maryam S." last="Mirian">Maryam S. Mirian</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ahmadabadi, Majid Nili" sort="Ahmadabadi, Majid Nili" uniqKey="Ahmadabadi M" first="Majid Nili" last="Ahmadabadi">Majid Nili Ahmadabadi</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff2"><addr-line>School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">25058591</idno>
<idno type="pmc">4110011</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110011</idno>
<idno type="RBID">PMC:4110011</idno>
<idno type="doi">10.1371/journal.pone.0103143</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000338</idno>
<idno type="wicri:Area/Pmc/Curation">000338</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000995</idno>
<idno type="wicri:Area/Ncbi/Merge">003163</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood</title>
<author><name sortKey="Daee, Pedram" sort="Daee, Pedram" uniqKey="Daee P" first="Pedram" last="Daee">Pedram Daee</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Mirian, Maryam S" sort="Mirian, Maryam S" uniqKey="Mirian M" first="Maryam S." last="Mirian">Maryam S. Mirian</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ahmadabadi, Majid Nili" sort="Ahmadabadi, Majid Nili" uniqKey="Ahmadabadi M" first="Majid Nili" last="Ahmadabadi">Majid Nili Ahmadabadi</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff2"><addr-line>School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran</addr-line>
</nlm:aff>
<country xml:lang="fr">Iran</country>
<wicri:regionArea>School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>In a multisensory task, human adults integrate information from different sensory modalities -behaviorally in an optimal Bayesian fashion- while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities -i.e. selection- at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Ernst, Mo" uniqKey="Ernst M">MO Ernst</name>
</author>
<author><name sortKey="Banks, Ms" uniqKey="Banks M">MS Banks</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alais, D" uniqKey="Alais D">D Alais</name>
</author>
<author><name sortKey="Burr, D" uniqKey="Burr D">D Burr</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Gori, M" uniqKey="Gori M">M Gori</name>
</author>
<author><name sortKey="Del Viva, M" uniqKey="Del Viva M">M Del Viva</name>
</author>
<author><name sortKey="Sandini, G" uniqKey="Sandini G">G Sandini</name>
</author>
<author><name sortKey="Burr, Dc" uniqKey="Burr D">DC Burr</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nardini, M" uniqKey="Nardini M">M Nardini</name>
</author>
<author><name sortKey="Jones, P" uniqKey="Jones P">P Jones</name>
</author>
<author><name sortKey="Bedford, R" uniqKey="Bedford R">R Bedford</name>
</author>
<author><name sortKey="Braddick, O" uniqKey="Braddick O">O Braddick</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nardini, M" uniqKey="Nardini M">M Nardini</name>
</author>
<author><name sortKey="Bedford, R" uniqKey="Bedford R">R Bedford</name>
</author>
<author><name sortKey="Mareschal, D" uniqKey="Mareschal D">D Mareschal</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ernst, Mo" uniqKey="Ernst M">MO Ernst</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Rangel, A" uniqKey="Rangel A">A Rangel</name>
</author>
<author><name sortKey="Camerer, C" uniqKey="Camerer C">C Camerer</name>
</author>
<author><name sortKey="Montague, Pr" uniqKey="Montague P">PR Montague</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Weisswange, Th" uniqKey="Weisswange T">TH Weisswange</name>
</author>
<author><name sortKey="Rothkopf, Ca" uniqKey="Rothkopf C">CA Rothkopf</name>
</author>
<author><name sortKey="Rodemann, T" uniqKey="Rodemann T">T Rodemann</name>
</author>
<author><name sortKey="Triesch, J" uniqKey="Triesch J">J Triesch</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Firouzi, H" uniqKey="Firouzi H">H Firouzi</name>
</author>
<author><name sortKey="Ahmadabadi, Mn" uniqKey="Ahmadabadi M">MN Ahmadabadi</name>
</author>
<author><name sortKey="Araabi, Bn" uniqKey="Araabi B">BN Araabi</name>
</author>
<author><name sortKey="Amizadeh, S" uniqKey="Amizadeh S">S Amizadeh</name>
</author>
<author><name sortKey="Mirian, Ms" uniqKey="Mirian M">MS Mirian</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mirian, Ms" uniqKey="Mirian M">MS Mirian</name>
</author>
<author><name sortKey="Ahmadabadi, Mn" uniqKey="Ahmadabadi M">MN Ahmadabadi</name>
</author>
<author><name sortKey="Araabi, Bn" uniqKey="Araabi B">BN Araabi</name>
</author>
<author><name sortKey="Siegwart, Rr" uniqKey="Siegwart R">RR Siegwart</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Whitehead, Sd" uniqKey="Whitehead S">SD Whitehead</name>
</author>
<author><name sortKey="Ballard, Dh" uniqKey="Ballard D">DH Ballard</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Audibert, J Y" uniqKey="Audibert J">J-Y Audibert</name>
</author>
<author><name sortKey="Munos, R" uniqKey="Munos R">R Munos</name>
</author>
<author><name sortKey="Szepesvari, C" uniqKey="Szepesvari C">C Szepesvári</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lai, T" uniqKey="Lai T">T Lai</name>
</author>
<author><name sortKey="Robbins, H" uniqKey="Robbins H">H Robbins</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Auer, P" uniqKey="Auer P">P Auer</name>
</author>
<author><name sortKey="Cesa Bianchi, N" uniqKey="Cesa Bianchi N">N Cesa-Bianchi</name>
</author>
<author><name sortKey="Fischer, P" uniqKey="Fischer P">P Fischer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Battaglia, Pw" uniqKey="Battaglia P">PW Battaglia</name>
</author>
<author><name sortKey="Jacobs, Ra" uniqKey="Jacobs R">RA Jacobs</name>
</author>
<author><name sortKey="Aslin, Rn" uniqKey="Aslin R">RN Aslin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gori, M" uniqKey="Gori M">M Gori</name>
</author>
<author><name sortKey="Sandini, G" uniqKey="Sandini G">G Sandini</name>
</author>
<author><name sortKey="Burr, D" uniqKey="Burr D">D Burr</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wallace, Mt" uniqKey="Wallace M">MT Wallace</name>
</author>
<author><name sortKey="Stein, Be" uniqKey="Stein B">BE Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kording, Kp" uniqKey="Kording K">KP Körding</name>
</author>
<author><name sortKey="Beierholm, U" uniqKey="Beierholm U">U Beierholm</name>
</author>
<author><name sortKey="Ma, Wj" uniqKey="Ma W">WJ Ma</name>
</author>
<author><name sortKey="Quartz, S" uniqKey="Quartz S">S Quartz</name>
</author>
<author><name sortKey="Tenenbaum, Jb" uniqKey="Tenenbaum J">JB Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dayan, P" uniqKey="Dayan P">P Dayan</name>
</author>
<author><name sortKey="J Yu, A" uniqKey="J Yu A">A J Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Narain, D" uniqKey="Narain D">D Narain</name>
</author>
<author><name sortKey="Van Beers, Rj" uniqKey="Van Beers R">RJ van Beers</name>
</author>
<author><name sortKey="Smeets, Jbj" uniqKey="Smeets J">JBJ Smeets</name>
</author>
<author><name sortKey="Brenner, E" uniqKey="Brenner E">E Brenner</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group><journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher><publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">25058591</article-id>
<article-id pub-id-type="pmc">4110011</article-id>
<article-id pub-id-type="publisher-id">PONE-D-14-12469</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0103143</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v2"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Cognitive Science</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
<subj-group><subject>Human Learning</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group><subject>Artificial Intelligence</subject>
<subj-group><subject>Machine Learning</subject>
<subj-group><subject>Machine Learning Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group><subject>Sensory Perception</subject>
<subj-group><subject>Sensory Cues</subject>
</subj-group>
</subj-group>
<subj-group><subject>Learning and Memory</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2"><subject>Computer and Information Sciences</subject>
</subj-group>
</article-categories>
<title-group><article-title>Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood</article-title>
<alt-title alt-title-type="running-head">The Transition from Sensory Selection to Sensory Integration in Humans</alt-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Daee</surname>
<given-names>Pedram</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor1"><sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Mirian</surname>
<given-names>Maryam S.</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Ahmadabadi</surname>
<given-names>Majid Nili</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1"><label>1</label>
<addr-line>Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran</addr-line>
</aff>
<aff id="aff2"><label>2</label>
<addr-line>School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran</addr-line>
</aff>
<contrib-group><contrib contrib-type="editor"><name><surname>van Beers</surname>
<given-names>Robert J.</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1"><addr-line>VU University Amsterdam, Netherlands</addr-line>
</aff>
<author-notes><corresp id="cor1">* E-mail: <email>pedram.daee@gmail.com</email>
</corresp>
<fn fn-type="conflict"><p><bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con"><p>Conceived and designed the experiments: PD MNA. Performed the experiments: PD. Analyzed the data: PD MNA MSM. Contributed to the writing of the manuscript: PD MNA MSM. Developed the model: PD MNA MSM.</p>
</fn>
</author-notes>
<pub-date pub-type="collection"><year>2014</year>
</pub-date>
<pub-date pub-type="epub"><day>24</day>
<month>7</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="ecorrected"><day>3</day>
<month>11</month>
<year>2014</year>
</pub-date>
<volume>9</volume>
<issue>7</issue>
<elocation-id>e103143</elocation-id>
<history><date date-type="received"><day>19</day>
<month>3</month>
<year>2014</year>
</date>
<date date-type="accepted"><day>27</day>
<month>6</month>
<year>2014</year>
</date>
</history>
<permissions><copyright-year>2014</copyright-year>
<copyright-holder>Daee et al</copyright-holder>
<license><license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<abstract><p>In a multisensory task, human adults integrate information from different sensory modalities -behaviorally in an optimal Bayesian fashion- while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities -i.e. selection- at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon.</p>
</abstract>
<funding-group><funding-statement>The authors have no funding or support to report.</funding-statement>
</funding-group>
<counts><page-count count="13"></page-count>
</counts>
<custom-meta-group><custom-meta id="data-availability"><meta-name>Data Availability</meta-name>
<meta-value>The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes><title>Data Availability</title>
<p>The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper.</p>
</notes>
</front>
<body><sec id="s1"><title>Introduction</title>
<p>To make an appropriate decision, our brain has to perceive the current state of the environment. However, even our best senses are noisy and can only provide an uncertain estimate of the underlying state. The biological solution for achieving the best perception is integration of uncertain individual estimates.</p>
<p>Human adults integrate sensory information, both across and within different modalities, with seemingly the purpose of reducing the uncertainty of their perception. The overwhelming majority of behavioral studies have shown that this uncertainty reduction happens in a statistically optimal fashion <xref rid="pone.0103143-Ernst1" ref-type="bibr">[1]</xref>
, <xref rid="pone.0103143-Alais1" ref-type="bibr">[2]</xref>
. One way to model this optimal integration is employing the Bayesian framework. In this framework and under some assumptions, the integration procedure is modeled by a weighted average of the individual sensors' estimates. Each sensor's weight is proportional to its relative reliability; i.e. inverse of its uncertainty. It can be shown that the reliability of the integrated estimate is higher than that of any individual's estimate.</p>
<p>Nevertheless, many behavioral studies indicate that this optimal behavior, and in some cases even its neural foundations, are not present at birth. Furthermore, it is only in the later stages of development that multisensory functions appear and take the main role in multisensory decision makings; see <xref rid="pone.0103143-Burr1" ref-type="bibr">[3]</xref>
 for a comprehensive review. An increasing number of studies in different sensory modalities on adults and children have shown that, unlike adults, children make their judgments based only on one of the available sources of information. Some instances of this sensory selection behavior have been observed in visual and haptic modalities for size and orientation discrimination <xref rid="pone.0103143-Gori1" ref-type="bibr">[4]</xref>
, visual landmarks and self-motion information for navigation <xref rid="pone.0103143-Nardini1" ref-type="bibr">[5]</xref>
, and visual stereoscopic and texture information for estimating surface slant <xref rid="pone.0103143-Nardini2" ref-type="bibr">[6]</xref>
.</p>
<p>The interesting open questions here are “Why does optimal integration occur so late?” <xref rid="pone.0103143-Ernst2" ref-type="bibr">[7]</xref>
, why there is a tendency in sensory selection in children, and finally, how and based on what measures does the transition from sensory selection at childhood to sensory integration at adulthood happen. While there are a considerable number of hypotheses regarding the reasons behind these phenomena (see <xref rid="pone.0103143-Nardini2" ref-type="bibr">[6]</xref>
, <xref rid="pone.0103143-Burr1" ref-type="bibr">[3]</xref>
, <xref rid="pone.0103143-Ernst2" ref-type="bibr">[7]</xref>
), to our knowledge, no existing study has addressed these three questions with a unified computational model. The primary aim of this research is to investigate the computational advantages of the transition from sensory selection at early ages toward multisensory integration at adulthood. The second goal is to check if the above three questions can be addressed by a single computational model.</p>
<p>We hypothesize that this selection and integration are emergent behavior of a single reward maximization system. To verify our hypothesis, we propose a mathematically sound and general reward dependent learning framework (see <xref ref-type="sec" rid="s2">Method</xref>
) and test it in a multisensory localization task (see <xref ref-type="sec" rid="s3">Experiments and Results</xref>
). The learning method is value-based <xref rid="pone.0103143-Sutton1" ref-type="bibr">[8]</xref>
<xref rid="pone.0103143-Rangel1" ref-type="bibr">[9]</xref>
 and progress of learning in the framework corresponds to development of the agent over age. This choice is natural as there are supporting studies indicating that the multisensory integration is not innate and there should be a learning mechanism behind its development (see <xref rid="pone.0103143-Burr1" ref-type="bibr">[3]</xref>
, <xref rid="pone.0103143-Weisswange1" ref-type="bibr">[10]</xref>
). Furthermore, this framework does not require most of the strict mathematical assumptions that are building blocks of the conventional Bayesian framework, which are widely used to explain multisensory integration.</p>
</sec>
<sec sec-type="methods" id="s2"><title>Method</title>
<p>Consider an agent with <italic>k</italic>
 sensors <inline-formula><inline-graphic xlink:href="pone.0103143.e001.jpg"></inline-graphic>
</inline-formula>
, where <inline-formula><inline-graphic xlink:href="pone.0103143.e002.jpg"></inline-graphic>
</inline-formula>
 is the observation space of the <inline-formula><inline-graphic xlink:href="pone.0103143.e003.jpg"></inline-graphic>
</inline-formula>
 sensor. Furthermore, assume that the environment is fully observable in the Cartesian product of the observation spaces, i.e. <inline-formula><inline-graphic xlink:href="pone.0103143.e004.jpg"></inline-graphic>
</inline-formula>
. At each time step, the agent should choose an action from its action set <inline-formula><inline-graphic xlink:href="pone.0103143.e005.jpg"></inline-graphic>
</inline-formula>
 according to the perceptual input (state) <inline-formula><inline-graphic xlink:href="pone.0103143.e006.jpg"></inline-graphic>
</inline-formula>
, where <inline-formula><inline-graphic xlink:href="pone.0103143.e007.jpg"></inline-graphic>
</inline-formula>
 is the current reading of the <inline-formula><inline-graphic xlink:href="pone.0103143.e008.jpg"></inline-graphic>
</inline-formula>
 sensor. After performing the action, the agent receives an immediate reinforcement signal (reward) <inline-formula><inline-graphic xlink:href="pone.0103143.e009.jpg"></inline-graphic>
</inline-formula>
 from the environment. It is assumed that all the reward distributions, corresponding to the state-action pairs, are unknown with support in <inline-formula><inline-graphic xlink:href="pone.0103143.e010.jpg"></inline-graphic>
</inline-formula>
. The goal of the agent is to maximize the total amount of reward it receives over its lifetime. To achieve this goal, the agent should learn the appropriate action in response to members of the joint sensory space <inline-formula><inline-graphic xlink:href="pone.0103143.e011.jpg"></inline-graphic>
</inline-formula>
.</p>
<p>The primary challenge here is that the state space <inline-formula><inline-graphic xlink:href="pone.0103143.e012.jpg"></inline-graphic>
</inline-formula>
 is high dimensional. Therefore, to learn the best action corresponding to each member of <inline-formula><inline-graphic xlink:href="pone.0103143.e013.jpg"></inline-graphic>
</inline-formula>
, a large number of experiences (samples) is needed. This problem is known as the curse of dimensionality. One way to tackle this problem is to use the experiences in the subspaces of <inline-formula><inline-graphic xlink:href="pone.0103143.e014.jpg"></inline-graphic>
</inline-formula>
”, such as <italic>O<sup>i</sup>
</italic>
, for decision making <xref rid="pone.0103143-Firouzi1" ref-type="bibr">[11]</xref>
, <xref rid="pone.0103143-Mirian1" ref-type="bibr">[12]</xref>
. However, the environment in the eyes of <italic>O<sup>i</sup>
</italic>
 is partially observable, which creates a many-to-one mapping between real states of the environment and observations in <italic>O<sup>i</sup>
</italic>
. This problem is known as Perceptual Aliasing (PA) <xref rid="pone.0103143-Whitehead1" ref-type="bibr">[13]</xref>
 and is avoided in general. Nevertheless, PA might be beneficiary in learning a task <xref rid="pone.0103143-Firouzi1" ref-type="bibr">[11]</xref>
, since it can partially free the learner from the curse of dimensionality if states sharing the same <italic>o<sup>i</sup>
</italic>
 have similar optimal policies. PA might be helpful at the early stages of learning as well, where learning a moderately rewarding policy over <italic>O<sup>i</sup>
</italic>
 is faster than learning a policy with the same reward over the joint space <italic>S</italic>
. In these two cases, learning in the subspaces results in generalization of experiences. In contrast, PA can be very undesirable when functionally different states of the environment, i.e. states with very different policies, are mapped to a same observation in <italic>O<sup>i</sup>
</italic>
. This case of PA turns the accumulated experience in that subspace into “garbage” <xref rid="pone.0103143-Mccallum1" ref-type="bibr">[14]</xref>
. <xref ref-type="fig" rid="pone-0103143-g001">Figure 1</xref>
 illustrates these concepts in a simple example. Our proposed statistical test (see Generalization Test) has the ability to detect different cases of perceptual aliasing that are illustrated in the figure.</p>
<fig id="pone-0103143-g001" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g001</object-id>
<label>Figure 1</label>
<caption><title>Different types of perceptual aliasing in subspaces.</title>
<p><inline-formula><inline-graphic xlink:href="pone.0103143.e015.jpg"></inline-graphic>
</inline-formula>
 represents the observation set of the <italic>i<sup>th</sup>
</italic>
 sensor for i = 1, 2. <inline-formula><inline-graphic xlink:href="pone.0103143.e016.jpg"></inline-graphic>
</inline-formula>
 is the state set and <italic>A = </italic>
{○,□,Δ} is the action set of the agent. <inline-formula><inline-graphic xlink:href="pone.0103143.e017.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e018.jpg"></inline-graphic>
</inline-formula>
 are the best and the worst actions in the given state, respectively. Accumulated experience in <inline-formula><inline-graphic xlink:href="pone.0103143.e019.jpg"></inline-graphic>
</inline-formula>
 is a perfect generalization for <inline-formula><inline-graphic xlink:href="pone.0103143.e020.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e021.jpg"></inline-graphic>
</inline-formula>
, since these two states have the same optimal policy and <inline-formula><inline-graphic xlink:href="pone.0103143.e022.jpg"></inline-graphic>
</inline-formula>
 is common between them. In contrast, accumulated experience in <inline-formula><inline-graphic xlink:href="pone.0103143.e023.jpg"></inline-graphic>
</inline-formula>
 is garbage information because functionally different states are mapped to the same observation. The situation for <inline-formula><inline-graphic xlink:href="pone.0103143.e024.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e025.jpg"></inline-graphic>
</inline-formula>
 is a little different. Only for the best action in <inline-formula><inline-graphic xlink:href="pone.0103143.e026.jpg"></inline-graphic>
</inline-formula>
 and the worst action in <inline-formula><inline-graphic xlink:href="pone.0103143.e027.jpg"></inline-graphic>
</inline-formula>
 we have the generalization, however, for the other action this is not the case.</p>
</caption>
<graphic xlink:href="pone.0103143.g001"></graphic>
</fig>
<p>In order to benefit from PA and to avoid its harms, a statistical test is proposed to discriminate estimates of the expected reward which are instances of generalization (beneficial cases of PA) from garbage information. The proposed test is in part inspired from McCallum's work on learning with incomplete perception <xref rid="pone.0103143-Mccallum2" ref-type="bibr">[15]</xref>
. Then, a selection policy for choosing the most reliable source of information is employed. Finally, according to the selected information, a decision making policy has been introduced which considers the exploration and exploitation trade-off. A schematic overview of the proposed method, including the Generalization Test (G Test) and the Decision Making phase, is illustrated in <xref ref-type="fig" rid="pone-0103143-g002">Figure 2</xref>
. In the following subsections, the proposed multisensory learning and decision making method is explained in detail.</p>
<fig id="pone-0103143-g002" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g002</object-id>
<label>Figure 2</label>
<caption><title>A schematic overview of the proposed framework for multisensory learning and decision making.</title>
<p><italic>s = (o<sup>1</sup>
,o<sup>2</sup>
,…,o<sup>k</sup>
)</italic>
 is the perceptual input, <inline-formula><inline-graphic xlink:href="pone.0103143.e028.jpg"></inline-graphic>
</inline-formula>
 is the current reading of the <italic>i<sup>th</sup>
</italic>
 sensor, and <inline-formula><inline-graphic xlink:href="pone.0103143.e029.jpg"></inline-graphic>
</inline-formula>
 is the learning block of the <italic>i<sup>th</sup>
</italic>
 sensor. For each action and based on the previously received rewards, each learning block calculates a confidence interval (<inline-formula><inline-graphic xlink:href="pone.0103143.e030.jpg"></inline-graphic>
</inline-formula>
) on the mean of the reward distribution corresponding to the given observation and action pair. The proposed Generalization Test (G Test), tests the generalization ability of the individual source against the joint space. In case that an individual source passes the G Test, its confidence interval will be considered in the decision making phase. In decision making phase, an appropriate action based on the given intervals will be selected which considers the exploration and exploitation trade-off.</p>
</caption>
<graphic xlink:href="pone.0103143.g002"></graphic>
</fig>
<p>In general, there are two approaches for learning a task, learning through labeled samples and learning by interaction. State estimation in a supervised setting requires having the specifications of the states at hand. Nevertheless, in reality we should learn the states either directly or through learning the optimal policy. In the problem at hand, the agent begins its life in a tabula rasa state and there is no information available regarding the observation models of sensors and the relation between the agent's sensory space <inline-formula><inline-graphic xlink:href="pone.0103143.e031.jpg"></inline-graphic>
</inline-formula>
 and its action space <inline-formula><inline-graphic xlink:href="pone.0103143.e032.jpg"></inline-graphic>
</inline-formula>
. Furthermore, the only teacher that the agent can interact with is the environment. Therefore, only through interactions with the environment, the agent can learn to act properly. In this problem we are not interested in learning the observation models of individual sensors nor do we have the necessary sources of feedback to do this. Therefore, this problem is different from the conventional supervised learning where a teacher provides a set of labeled data, and the agent needs only to learn the observation models of sensors and perform a state estimation task.</p>
<sec id="s2a"><title>1. Modeling</title>
<p>The actual value of choosing action <inline-formula><inline-graphic xlink:href="pone.0103143.e033.jpg"></inline-graphic>
</inline-formula>
 when the agent is in state <italic>s</italic>
  =  (<italic>o<sup>1</sup>
, o<sup>2</sup>
,…,o<sup>k</sup>
</italic>
) is denoted as <inline-formula><inline-graphic xlink:href="pone.0103143.e034.jpg"></inline-graphic>
</inline-formula>
, and its estimated value as <inline-formula><inline-graphic xlink:href="pone.0103143.e035.jpg"></inline-graphic>
</inline-formula>
. All the estimated values (Q-values) are represented in a <inline-formula><inline-graphic xlink:href="pone.0103143.e036.jpg"></inline-graphic>
</inline-formula>
 dimensional table, known as Q-table. Q-values are updated after each time step using <disp-formula id="pone.0103143.e037"><graphic xlink:href="pone.0103143.e037.jpg" position="anchor" orientation="portrait"></graphic>
</disp-formula>
where <inline-formula><inline-graphic xlink:href="pone.0103143.e038.jpg"></inline-graphic>
</inline-formula>
 is the reward received after performing <inline-formula><inline-graphic xlink:href="pone.0103143.e039.jpg"></inline-graphic>
</inline-formula>
 in <inline-formula><inline-graphic xlink:href="pone.0103143.e040.jpg"></inline-graphic>
</inline-formula>
, and <inline-formula><inline-graphic xlink:href="pone.0103143.e041.jpg"></inline-graphic>
</inline-formula>
 is the learning rate for the given state and action. We assume that the reward distributions are fixed throughout the learning; i.e. the environment is stationary. In stationary environments, it is rational to employ <inline-formula><inline-graphic xlink:href="pone.0103143.e042.jpg"></inline-graphic>
</inline-formula>
, where <inline-formula><inline-graphic xlink:href="pone.0103143.e043.jpg"></inline-graphic>
</inline-formula>
 is the sample size, i.e. the number of times that action <italic>a</italic>
 is performed in state <italic>s</italic>
. By using this learning rate, the above equation becomes identical to the incremental update formula for computing the average reward <xref rid="pone.0103143-Sutton1" ref-type="bibr">[8]</xref>
. Therefore, Q-values are the sample means and <inline-formula><inline-graphic xlink:href="pone.0103143.e044.jpg"></inline-graphic>
</inline-formula>
s are the actual means of the underlying reward distributions.</p>
<p>As it will be explained in the following sections, we need confidence intervals on <inline-formula><inline-graphic xlink:href="pone.0103143.e045.jpg"></inline-graphic>
</inline-formula>
 s for our generalization test and decision making method. For a moderately large number of samples, we can create a confidence interval on <inline-formula><inline-graphic xlink:href="pone.0103143.e046.jpg"></inline-graphic>
</inline-formula>
 using the following bound <xref rid="pone.0103143-Casella1" ref-type="bibr">[16]</xref>
:<disp-formula id="pone.0103143.e047"><graphic xlink:href="pone.0103143.e047.jpg" position="anchor" orientation="portrait"></graphic>
<label>(1)</label>
</disp-formula>
</p>
<p>In (1) <inline-formula><inline-graphic xlink:href="pone.0103143.e048.jpg"></inline-graphic>
</inline-formula>
 is the Student t distribution with <inline-formula><inline-graphic xlink:href="pone.0103143.e049.jpg"></inline-graphic>
</inline-formula>
 degrees of freedom. The parameter <inline-formula><inline-graphic xlink:href="pone.0103143.e050.jpg"></inline-graphic>
</inline-formula>
 controls the confidence that <inline-formula><inline-graphic xlink:href="pone.0103143.e051.jpg"></inline-graphic>
</inline-formula>
 will fall inside the confidence interval. Finally, the value <inline-formula><inline-graphic xlink:href="pone.0103143.e052.jpg"></inline-graphic>
</inline-formula>
 is the estimated standard deviation of the underlying reward distribution defined by<disp-formula id="pone.0103143.e053"><graphic xlink:href="pone.0103143.e053.jpg" position="anchor" orientation="portrait"></graphic>
</disp-formula>
where <inline-formula><inline-graphic xlink:href="pone.0103143.e054.jpg"></inline-graphic>
</inline-formula>
 is the sum of the rewards and <inline-formula><inline-graphic xlink:href="pone.0103143.e055.jpg"></inline-graphic>
</inline-formula>
 is the sum of the squares of the rewards received by performing <inline-formula><inline-graphic xlink:href="pone.0103143.e056.jpg"></inline-graphic>
</inline-formula>
 in <italic>s</italic>
.</p>
<p>The confidence interval in (1) is mathematically valid when either the number of samples (<inline-formula><inline-graphic xlink:href="pone.0103143.e057.jpg"></inline-graphic>
</inline-formula>
) is moderately large or when the reward distribution is Normal (Gaussian). Although these conditions may seem rather restricting, in our experience, bound (1) works reasonably well in most practical cases.</p>
<p>When the sample size is not sufficiently large or the reward distribution is not Gaussian, we may use Chebyshev's inequality to calculate the confidence interval. To do so, we need the true standard deviation of the reward distribution, which is not available in general. However, defining the reward distribution in the interval <inline-formula><inline-graphic xlink:href="pone.0103143.e058.jpg"></inline-graphic>
</inline-formula>
, the maximum possible value for the variance is <inline-formula><inline-graphic xlink:href="pone.0103143.e059.jpg"></inline-graphic>
</inline-formula>
. Then a very conservative Chebyshev's inequality is<disp-formula id="pone.0103143.e060"><graphic xlink:href="pone.0103143.e060.jpg" position="anchor" orientation="portrait"></graphic>
<label>(2)</label>
</disp-formula>
</p>
<p>Although bounds (1) and (2) are similar in essence, bound (2) is very conservative but independent of the reward distribution. Conservativeness of (2) has roots in not taking into account the type of the reward distribution and its estimated variance. This lack of prior assumptions will result in extremely conservative intervals in cases that the variances are very small or even zero. In situations like these, it is better to employ the “variance-aware” inequality proposed in <xref rid="pone.0103143-Audibert1" ref-type="bibr">[17]</xref>
:<disp-formula id="pone.0103143.e061"><graphic xlink:href="pone.0103143.e061.jpg" position="anchor" orientation="portrait"></graphic>
<label>(3)</label>
</disp-formula>
</p>
<p>In this study, we are mainly interested in the <italic>length</italic>
 of the confidence intervals and their <italic>relative length</italic>
 to each other. Generally, by visiting new samples, the length of all the intervals in bounds (1), (2), and (3) diminishes gradually. Therefore, as we will see in the following sections, all the mentioned intervals are applicable in our algorithm. In <xref ref-type="sec" rid="s4">Discussions and Conclusions</xref>
 section, a discussion on a number of practical points concerning these bounds is provided.</p>
<p>For individual sensors, <inline-formula><inline-graphic xlink:href="pone.0103143.e062.jpg"></inline-graphic>
</inline-formula>
 denotes the actual mean and <inline-formula><inline-graphic xlink:href="pone.0103143.e063.jpg"></inline-graphic>
</inline-formula>
 denotes the sample mean of reward, received by performing action <inline-formula><inline-graphic xlink:href="pone.0103143.e064.jpg"></inline-graphic>
</inline-formula>
 when the <italic>i<sup>th</sup>
</italic>
 sensor's observation is <inline-formula><inline-graphic xlink:href="pone.0103143.e065.jpg"></inline-graphic>
</inline-formula>
. We can create a confidence interval on <inline-formula><inline-graphic xlink:href="pone.0103143.e066.jpg"></inline-graphic>
</inline-formula>
 by using the same procedure and only replacing the following variables in bounds (1), (2), or (3):<disp-formula id="pone.0103143.e067"><graphic xlink:href="pone.0103143.e067.jpg" position="anchor" orientation="portrait"></graphic>
<label>(4)</label>
</disp-formula>
<disp-formula id="pone.0103143.e068"><graphic xlink:href="pone.0103143.e068.jpg" position="anchor" orientation="portrait"></graphic>
<label>(5)</label>
</disp-formula>
</p>
<p>The above equations express the marginal values for the <italic>i<sup>th</sup>
</italic>
 sensor.</p>
<p>In order to calculate <inline-formula><inline-graphic xlink:href="pone.0103143.e069.jpg"></inline-graphic>
</inline-formula>
 we also need to calculate two more terms:<disp-formula id="pone.0103143.e070"><graphic xlink:href="pone.0103143.e070.jpg" position="anchor" orientation="portrait"></graphic>
<label>(6)</label>
</disp-formula>
<disp-formula id="pone.0103143.e071"><graphic xlink:href="pone.0103143.e071.jpg" position="anchor" orientation="portrait"></graphic>
<label>(7)</label>
</disp-formula>
</p>
<p>Calculation of (4)–(7) does not need extra learning trials because, these variables are calculated by marginalization of statistics of the joint space <inline-formula><inline-graphic xlink:href="pone.0103143.e072.jpg"></inline-graphic>
</inline-formula>
.</p>
</sec>
<sec id="s2b"><title>2. Generalization Test</title>
<p>A statistical test is proposed to answer the following question:</p>
<p>Is perceptual aliasing in <inline-formula><inline-graphic xlink:href="pone.0103143.e073.jpg"></inline-graphic>
</inline-formula>
, a beneficial case of generalization for action <inline-formula><inline-graphic xlink:href="pone.0103143.e074.jpg"></inline-graphic>
</inline-formula>
,or a harmful case of “garbage” information?</p>
<p>Based on our modeling, we can restate the question as “is <inline-formula><inline-graphic xlink:href="pone.0103143.e075.jpg"></inline-graphic>
</inline-formula>
 a reasonable representation of <inline-formula><inline-graphic xlink:href="pone.0103143.e076.jpg"></inline-graphic>
</inline-formula>
?”, where <inline-formula><inline-graphic xlink:href="pone.0103143.e077.jpg"></inline-graphic>
</inline-formula>
 is the current observation of the <italic>i<sup>th</sup>
</italic>
 sensor and <italic>s  =  (o<sup>1</sup>
,o<sup>2</sup>
,…,o<sup>k</sup>
)</italic>
. However, as previously mentioned, <inline-formula><inline-graphic xlink:href="pone.0103143.e078.jpg"></inline-graphic>
</inline-formula>
 s are unknown. As such, we use their confidence intervals by employing either bounds (1), (2), or (3). We denote the confidence interval on <inline-formula><inline-graphic xlink:href="pone.0103143.e079.jpg"></inline-graphic>
</inline-formula>
 as <inline-formula><inline-graphic xlink:href="pone.0103143.e080.jpg"></inline-graphic>
</inline-formula>
 and confidence interval on <inline-formula><inline-graphic xlink:href="pone.0103143.e081.jpg"></inline-graphic>
</inline-formula>
 as <inline-formula><inline-graphic xlink:href="pone.0103143.e082.jpg"></inline-graphic>
</inline-formula>
.</p>
<p>To validate the generalization ability of <inline-formula><inline-graphic xlink:href="pone.0103143.e083.jpg"></inline-graphic>
</inline-formula>
, we need to test whether <inline-formula><inline-graphic xlink:href="pone.0103143.e084.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e085.jpg"></inline-graphic>
</inline-formula>
 are estimating the same value (<inline-formula><inline-graphic xlink:href="pone.0103143.e086.jpg"></inline-graphic>
</inline-formula>
). However, due to perceptual aliasing (many-to-one mapping), <inline-formula><inline-graphic xlink:href="pone.0103143.e087.jpg"></inline-graphic>
</inline-formula>
 has also experienced all the rewards used in the calculation of <inline-formula><inline-graphic xlink:href="pone.0103143.e088.jpg"></inline-graphic>
</inline-formula>
. Hence, checking the significance of their difference does not provide useful information. The proposed idea here is to extract the common experiences between <inline-formula><inline-graphic xlink:href="pone.0103143.e089.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e090.jpg"></inline-graphic>
</inline-formula>
, and then perform a statistical test on the residuals of <inline-formula><inline-graphic xlink:href="pone.0103143.e091.jpg"></inline-graphic>
</inline-formula>
, and <inline-formula><inline-graphic xlink:href="pone.0103143.e092.jpg"></inline-graphic>
</inline-formula>
. The procedure of extracting common experiences from <inline-formula><inline-graphic xlink:href="pone.0103143.e093.jpg"></inline-graphic>
</inline-formula>
 is as follows:<disp-formula id="pone.0103143.e094"><graphic xlink:href="pone.0103143.e094.jpg" position="anchor" orientation="portrait"></graphic>
<label>(8)</label>
</disp-formula>
<disp-formula id="pone.0103143.e095"><graphic xlink:href="pone.0103143.e095.jpg" position="anchor" orientation="portrait"></graphic>
<label>(9)</label>
</disp-formula>
<disp-formula id="pone.0103143.e096"><graphic xlink:href="pone.0103143.e096.jpg" position="anchor" orientation="portrait"></graphic>
<label>(10)</label>
</disp-formula>
<disp-formula id="pone.0103143.e097"><graphic xlink:href="pone.0103143.e097.jpg" position="anchor" orientation="portrait"></graphic>
<label>(11)</label>
</disp-formula>
</p>
<p>By using the variables on the left side of the above equations, a new confidence interval <inline-formula><inline-graphic xlink:href="pone.0103143.e098.jpg"></inline-graphic>
</inline-formula>
 can be created using any of bounds (1), (2), or (3). For each action, <inline-formula><inline-graphic xlink:href="pone.0103143.e099.jpg"></inline-graphic>
</inline-formula>
 represents the intervallic estimate of the mean of a reward distribution created from experiences in the current observation of the <italic>i<sup>th</sup>
</italic>
 sensor, minus the experiences in the current state of the environment. If there exists an intersection between <inline-formula><inline-graphic xlink:href="pone.0103143.e100.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e101.jpg"></inline-graphic>
</inline-formula>
, then there is a good chance that <inline-formula><inline-graphic xlink:href="pone.0103143.e102.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e103.jpg"></inline-graphic>
</inline-formula>
 are estimating the similar expected value of rewards (<inline-formula><inline-graphic xlink:href="pone.0103143.e104.jpg"></inline-graphic>
</inline-formula>
). In other words, it means that the perceptual aliasing in <inline-formula><inline-graphic xlink:href="pone.0103143.e105.jpg"></inline-graphic>
</inline-formula>
 is a case of generalization. The proposed test states that at each time step for action <inline-formula><inline-graphic xlink:href="pone.0103143.e106.jpg"></inline-graphic>
</inline-formula>
:<disp-formula id="pone.0103143.e107"><graphic xlink:href="pone.0103143.e107.jpg" position="anchor" orientation="portrait"></graphic>
<label>(12)</label>
</disp-formula>
</p>
<p>Based on (12), we can expect the following behavior in different stages of learning:</p>
<list list-type="bullet"><list-item><p>During initial steps of learning (when sample size is very small), <inline-formula><inline-graphic xlink:href="pone.0103143.e108.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e109.jpg"></inline-graphic>
</inline-formula>
 both have large confidence intervals. Consequently, <inline-formula><inline-graphic xlink:href="pone.0103143.e110.jpg"></inline-graphic>
</inline-formula>
 will be able to pass the proposed test in most time steps. Due to the low uncertainty in <inline-formula><inline-graphic xlink:href="pone.0103143.e111.jpg"></inline-graphic>
</inline-formula>
, this behavior is desirable during initial steps.</p>
</list-item>
<list-item><p>By gaining new samples, both <inline-formula><inline-graphic xlink:href="pone.0103143.e112.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e113.jpg"></inline-graphic>
</inline-formula>
 shrink. Therefore, the <inline-formula><inline-graphic xlink:href="pone.0103143.e114.jpg"></inline-graphic>
</inline-formula>
 sensor will be able to pass the test only if its experiences are a good generalization of <inline-formula><inline-graphic xlink:href="pone.0103143.e115.jpg"></inline-graphic>
</inline-formula>
's experience.</p>
</list-item>
<list-item><p>As the sample size for <inline-formula><inline-graphic xlink:href="pone.0103143.e116.jpg"></inline-graphic>
</inline-formula>
 increases, its interval becomes smaller and smaller to a degree where it dwindles to only contain <inline-formula><inline-graphic xlink:href="pone.0103143.e117.jpg"></inline-graphic>
</inline-formula>
. The same thing happens for <inline-formula><inline-graphic xlink:href="pone.0103143.e118.jpg"></inline-graphic>
</inline-formula>
 but it will converge to a different point. As a result, the test will reject all the individual sensors.</p>
</list-item>
</list>
</sec>
<sec id="s2c"><title>3. Decision Policy</title>
<p>As mentioned earlier, the agent starts with no prior information about the environment and the task at hand. Consequently, throughout the learning it faces the dilemma of gaining new experience by choosing one of the less explored decisions or exploiting the past experiences by selecting one of the well-rewarded decisions. This problem is known as the exploration versus exploitation trade-off <xref rid="pone.0103143-Sutton1" ref-type="bibr">[8]</xref>
.</p>
<p>At each state <inline-formula><inline-graphic xlink:href="pone.0103143.e119.jpg"></inline-graphic>
</inline-formula>
, it can be assumed that there are <inline-formula><inline-graphic xlink:href="pone.0103143.e120.jpg"></inline-graphic>
</inline-formula>
 unknown reward distributions which correspond to each action in the action set <inline-formula><inline-graphic xlink:href="pone.0103143.e121.jpg"></inline-graphic>
</inline-formula>
. The best action <inline-formula><inline-graphic xlink:href="pone.0103143.e122.jpg"></inline-graphic>
</inline-formula>
 is the one corresponding to the distribution with the greatest mean, i.e. <inline-formula><inline-graphic xlink:href="pone.0103143.e123.jpg"></inline-graphic>
</inline-formula>
. However, <inline-formula><inline-graphic xlink:href="pone.0103143.e124.jpg"></inline-graphic>
</inline-formula>
 s are unknown and the agent should make the decision based on their estimates. A good decision policy should consider both the Q-value (sample mean statistic) and the uncertainty regarding its expected value. The value of the sample mean controls the exploitative selections, while its uncertainty controls the explorative decisions. Clearly, the uncertainty of the sample mean tends to zero as the number of samples tends to infinity, resulting in a smooth transition from exploration to exploitation as the number of samples increases.</p>
<p>A well-studied family of decision policies, which considers these two criteria, works based on the idea of creating an upper confidence interval on the mean of each reward distribution. Based on the calculated upper bounds, the decision policy selects the action with the greatest upper confidence interval <xref rid="pone.0103143-Lai1" ref-type="bibr">[18]</xref>
. This idea is known as “optimism in face of uncertainty principle.” It has been proved that variations of these decision policies, such as UCB1 <xref rid="pone.0103143-Auer1" ref-type="bibr">[19]</xref>
, achieve logarithmic expected regret, i.e. the expected loss due to the fact that the agent does not always choose the optimal action, uniformly over the total number of samples of the given state. This amount of regret is the smallest possible expected regret, up to a constant factor. Fortunately, in the proposed approach we have already employed confidence intervals on the means of the reward distributions. The only difference in our problem is that we have a set of confidence intervals, instead of one, for each action. Therefore, we need to integrate available confidence intervals to one, and then employ the mentioned idea.</p>
<p>One can devise various methods for integrating a set of intervals. However, in this study we are interested in finding, specifically, the source of information that has the greatest impact on the final decision. As a result, we reduce the integration problem to selection of one of the available intervals as the representative interval for the given action. We propose two methods for this interval selection. The first method works by the idea of selecting the Most Optimistic Source (MOS), while the second method chooses the Least Uncertain Source (LUS). Details of these methods are as follows:</p>
<p>At each state <inline-formula><inline-graphic xlink:href="pone.0103143.e125.jpg"></inline-graphic>
</inline-formula>
 and for each action <inline-formula><inline-graphic xlink:href="pone.0103143.e126.jpg"></inline-graphic>
</inline-formula>
, given a set of confidence intervals of individual sensors which were able to pass the previously mentioned test (12), the MOS method selects the interval with the greatest upper bound. The LUS method, on the other hand, selects the interval with the shortest length. The upper bound value of the selected interval will be used as the representative value for action <inline-formula><inline-graphic xlink:href="pone.0103143.e127.jpg"></inline-graphic>
</inline-formula>
. However, if this value is greater than <inline-formula><inline-graphic xlink:href="pone.0103143.e128.jpg"></inline-graphic>
</inline-formula>
's upper bound, then <inline-formula><inline-graphic xlink:href="pone.0103143.e129.jpg"></inline-graphic>
</inline-formula>
's upper bound will be used as the representative value. The reason behind this constraint is that, regardless of its great uncertainty, <inline-formula><inline-graphic xlink:href="pone.0103143.e130.jpg"></inline-graphic>
</inline-formula>
 is still the most reliable (with lowest aliasing) source of information regarding the actual mean of the underlying reward distribution. Therefore, any value greater than <inline-formula><inline-graphic xlink:href="pone.0103143.e131.jpg"></inline-graphic>
</inline-formula>
's upper bound is unrealistically optimistic. The idea behind LUS is that shorter intervals indicate lower uncertainty, and it is always desirable to attend the least uncertain source of information for decision making. The pseudo-codes of the MOS and LUS methods are shown in <xref ref-type="table" rid="pone-0103143-t001">Table 1</xref>
 and <xref ref-type="table" rid="pone-0103143-t002">Table 2</xref>
. For bound <inline-formula><inline-graphic xlink:href="pone.0103143.e132.jpg"></inline-graphic>
</inline-formula>
, the notations <inline-formula><inline-graphic xlink:href="pone.0103143.e133.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e134.jpg"></inline-graphic>
</inline-formula>
 represent the upper bound and lower bound values of <inline-formula><inline-graphic xlink:href="pone.0103143.e135.jpg"></inline-graphic>
</inline-formula>
, respectively.</p>
<table-wrap id="pone-0103143-t001" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.t001</object-id>
<label>Table 1</label>
<caption><title>The function that implements MOS method.</title>
</caption>
<alternatives><graphic id="pone-0103143-t001-1" xlink:href="pone.0103143.t001"></graphic>
<table frame="hsides" rules="groups"><colgroup span="1"><col align="left" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead><tr><td colspan="2" align="left" rowspan="1">function MOS(<italic>M, Accepted</italic>
)</td>
</tr>
<tr><td colspan="2" align="left" rowspan="1">Input: <italic><sub>M</sub>
</italic>
 is the confidence interval on the joint space, <italic>Accepted</italic>
 is the array storing confidence intervals on the sources that passed the generalization test</td>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">1:</td>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e136.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">2:</td>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e137.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">3:</td>
<td align="left" rowspan="1" colspan="1"><bold>return</bold>
<italic>v</italic>
</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone-0103143-t002" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.t002</object-id>
<label>Table 2</label>
<caption><title>The function that implements LUS method.</title>
</caption>
<alternatives><graphic id="pone-0103143-t002-2" xlink:href="pone.0103143.t002"></graphic>
<table frame="hsides" rules="groups"><colgroup span="1"><col align="left" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead><tr><td colspan="2" align="left" rowspan="1">function LUS(<italic>M, Accepted</italic>
)</td>
</tr>
<tr><td colspan="2" align="left" rowspan="1">Input: <italic><sub>M</sub>
</italic>
 is the confidence interval on the joint space, <italic>Accepted</italic>
 is the array storing confidence intervals on the sources that passed the generalization test</td>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">1:</td>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e138.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">2:</td>
<td align="left" rowspan="1" colspan="1"><bold>if </bold>
<inline-formula><inline-graphic xlink:href="pone.0103143.e139.jpg"></inline-graphic>
</inline-formula>
<bold> then </bold>
<inline-formula><inline-graphic xlink:href="pone.0103143.e140.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">3:</td>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e141.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">4:</td>
<td align="left" rowspan="1" colspan="1"><bold>return</bold>
<italic>v</italic>
</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>After choosing an upper bound value (with either MOS or LUS methods) for all the actions, the action with the maximum upper bound value is selected as the final decision. By performing the selected action, the environment returns the reward <inline-formula><inline-graphic xlink:href="pone.0103143.e142.jpg"></inline-graphic>
</inline-formula>
. The complete pseudo-code of the proposed method is shown in <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
. The only parameter that needs to be initialized is <inline-formula><inline-graphic xlink:href="pone.0103143.e143.jpg"></inline-graphic>
</inline-formula>
, where <inline-formula><inline-graphic xlink:href="pone.0103143.e144.jpg"></inline-graphic>
</inline-formula>
 is the confidence coefficient of confidence intervals.</p>
<table-wrap id="pone-0103143-t003" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.t003</object-id>
<label>Table 3</label>
<caption><title>The proposed Algorithm for Multisensory Learning and Decision Making.</title>
</caption>
<alternatives><graphic id="pone-0103143-t003-3" xlink:href="pone.0103143.t003"></graphic>
<table frame="hsides" rules="groups"><colgroup span="1"><col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead><tr><td colspan="5" align="left" rowspan="1">Initialize <inline-formula><inline-graphic xlink:href="pone.0103143.e145.jpg"></inline-graphic>
</inline-formula>
, <inline-formula><inline-graphic xlink:href="pone.0103143.e146.jpg"></inline-graphic>
</inline-formula>
, <inline-formula><inline-graphic xlink:href="pone.0103143.e147.jpg"></inline-graphic>
</inline-formula>
, <bold>and</bold>
<inline-formula><inline-graphic xlink:href="pone.0103143.e148.jpg"></inline-graphic>
</inline-formula>
<bold>to zero</bold>
<inline-formula><inline-graphic xlink:href="pone.0103143.e149.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">1:</td>
<td colspan="4" align="left" rowspan="1"><bold>Repeat</bold>
 at each time step</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">2:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1"><italic>s = (o<sup>1</sup>
,o<sup>2</sup>
,…,o<sup>k</sup>
)</italic>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">3:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1"><bold>for each</bold>
<inline-formula><inline-graphic xlink:href="pone.0103143.e150.jpg"></inline-graphic>
</inline-formula>
<bold>do</bold>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">4:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="2" align="left" rowspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e151.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">5:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="2" align="left" rowspan="1"><bold>for each</bold>
 sensor <inline-formula><inline-graphic xlink:href="pone.0103143.e152.jpg"></inline-graphic>
</inline-formula>
<bold>do</bold>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">6:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Calculate <inline-formula><inline-graphic xlink:href="pone.0103143.e153.jpg"></inline-graphic>
</inline-formula>
, <inline-formula><inline-graphic xlink:href="pone.0103143.e154.jpg"></inline-graphic>
</inline-formula>
, and <inline-formula><inline-graphic xlink:href="pone.0103143.e155.jpg"></inline-graphic>
</inline-formula>
 based on either bounds (1), (2), or (3)</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">7:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"><bold>if <inline-formula><inline-graphic xlink:href="pone.0103143.e156.jpg"></inline-graphic>
</inline-formula>
 then <inline-formula><inline-graphic xlink:href="pone.0103143.e157.jpg"></inline-graphic>
</inline-formula>
</bold>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">8:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="2" align="left" rowspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e158.jpg"></inline-graphic>
</inline-formula>
MOS(<inline-formula><inline-graphic xlink:href="pone.0103143.e159.jpg"></inline-graphic>
</inline-formula>
) or LUS(<inline-formula><inline-graphic xlink:href="pone.0103143.e160.jpg"></inline-graphic>
</inline-formula>
)</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">9:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1">Perform <inline-formula><inline-graphic xlink:href="pone.0103143.e161.jpg"></inline-graphic>
</inline-formula>
, observe reward <inline-formula><inline-graphic xlink:href="pone.0103143.e162.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">10:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e163.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">11:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e164.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">12:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e165.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">13:</td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="3" align="left" rowspan="1"><inline-formula><inline-graphic xlink:href="pone.0103143.e166.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">14:</td>
<td colspan="4" align="left" rowspan="1"><bold>Until</bold>
 the end of the learning</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
</sec>
<sec id="s3"><title>Experiments and Results</title>
<p>The task is a modified version of the localization task in the visual and auditory modalities <xref rid="pone.0103143-Alais1" ref-type="bibr">[2]</xref>
<xref rid="pone.0103143-Battaglia1" ref-type="bibr">[20]</xref>
. The simulation setup is based partly on <xref rid="pone.0103143-Weisswange1" ref-type="bibr">[10]</xref>
. At each time step, a stimulus is generated randomly in one of the 30 discrete positions and each sensor observes a noisy representation of it. The observation noise for each sensor is modeled by a Gaussian distribution with standard deviation <inline-formula><inline-graphic xlink:href="pone.0103143.e167.jpg"></inline-graphic>
</inline-formula>
; see <xref ref-type="fig" rid="pone-0103143-g003">Figure 3</xref>
. After observing the stimulus through its sensors, the agent chooses one of the 30 discrete positions as the desirable action and receives an immediate reinforcement value in <inline-formula><inline-graphic xlink:href="pone.0103143.e168.jpg"></inline-graphic>
</inline-formula>
:<disp-formula id="pone.0103143.e169"><graphic xlink:href="pone.0103143.e169.jpg" position="anchor" orientation="portrait"></graphic>
<label>(13)</label>
</disp-formula>
</p>
<fig id="pone-0103143-g003" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g003</object-id>
<label>Figure 3</label>
<caption><title>Stimulus and observations by the auditory (<inline-formula><inline-graphic xlink:href="pone.0103143.e170.jpg"></inline-graphic>
</inline-formula>
) and the visual (<inline-formula><inline-graphic xlink:href="pone.0103143.e171.jpg"></inline-graphic>
</inline-formula>
) sensors.</title>
<p>Observations are based on Gaussian noise models. Variances control the reliability of each sensor.</p>
</caption>
<graphic xlink:href="pone.0103143.g003"></graphic>
</fig>
<p>We used <inline-formula><inline-graphic xlink:href="pone.0103143.e172.jpg"></inline-graphic>
</inline-formula>
, which indicates that only actions (estimates) within a radius of three units from the stimulus position receive positive rewards.</p>
<p>The agent has no prior information about the task, the observation models, and the relation between the sensory space and actions. Therefore, throughout the learning, it should learn the appropriate action only based on the sensory inputs and previously received rewards. On the other hand, the optimal Bayesian observer <xref rid="pone.0103143-Alais1" ref-type="bibr">[2]</xref>
 assumes that all of the mentioned information is available and chooses its action according to the following integration rule:<disp-formula id="pone.0103143.e173"><graphic xlink:href="pone.0103143.e173.jpg" position="anchor" orientation="portrait"></graphic>
<label>(14)</label>
</disp-formula>
where <inline-formula><inline-graphic xlink:href="pone.0103143.e174.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e175.jpg"></inline-graphic>
</inline-formula>
 are the standard deviations of the Gaussian noise models for the auditory and visual inputs, respectively. Moreover, <inline-formula><inline-graphic xlink:href="pone.0103143.e176.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e177.jpg"></inline-graphic>
</inline-formula>
 are the representations of the stimulus in the auditory and visual observation spaces. Behavioral studies have shown that adults integrate information from sensors in a statistically optimal manner which based on the Gaussian observation models, can be formulated by <xref ref-type="disp-formula" rid="pone.0103143.e173">equation (14</xref>
).</p>
<p>In all the following experiments, the proposed method uses the Cartesian product of the observation spaces of all the sensors for its state space. The agent's learning and decision making is based on <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
.</p>
<sec id="s3a"><title>Experiment 1</title>
<p>In the first experiment we use <inline-formula><inline-graphic xlink:href="pone.0103143.e178.jpg"></inline-graphic>
</inline-formula>
 and <inline-formula><inline-graphic xlink:href="pone.0103143.e179.jpg"></inline-graphic>
</inline-formula>
 (see <xref ref-type="fig" rid="pone-0103143-g003">Figure 3</xref>
). In order to validate our method, we employ three different agents. Two of the agents (Visual and Auditory agents) use only the individual sensors which will result in a state-action space of size <inline-formula><inline-graphic xlink:href="pone.0103143.e180.jpg"></inline-graphic>
</inline-formula>
 for each. The third one (Visual<inline-formula><inline-graphic xlink:href="pone.0103143.e181.jpg"></inline-graphic>
</inline-formula>
Auditory agent) uses both sensors for its learning and decision makings and has a state-action space of size <inline-formula><inline-graphic xlink:href="pone.0103143.e182.jpg"></inline-graphic>
</inline-formula>
. For these three agents, we employ the UCB1 policy <xref rid="pone.0103143-Auer1" ref-type="bibr">[19]</xref>
 for decision making. UCB1 calculates upper bounds on the means of the reward distributions based on the Hoeffding inequality. At each state <inline-formula><inline-graphic xlink:href="pone.0103143.e183.jpg"></inline-graphic>
</inline-formula>
, UCB1 chooses the action that maximizes<disp-formula id="pone.0103143.e184"><graphic xlink:href="pone.0103143.e184.jpg" position="anchor" orientation="portrait"></graphic>
<label>(15)</label>
</disp-formula>
where <inline-formula><inline-graphic xlink:href="pone.0103143.e185.jpg"></inline-graphic>
</inline-formula>
 is the average reward obtained from performing action <inline-formula><inline-graphic xlink:href="pone.0103143.e186.jpg"></inline-graphic>
</inline-formula>
 in state <inline-formula><inline-graphic xlink:href="pone.0103143.e187.jpg"></inline-graphic>
</inline-formula>
, <inline-formula><inline-graphic xlink:href="pone.0103143.e188.jpg"></inline-graphic>
</inline-formula>
 is the number of times <inline-formula><inline-graphic xlink:href="pone.0103143.e189.jpg"></inline-graphic>
</inline-formula>
 has been selected in <inline-formula><inline-graphic xlink:href="pone.0103143.e190.jpg"></inline-graphic>
</inline-formula>
, and <inline-formula><inline-graphic xlink:href="pone.0103143.e191.jpg"></inline-graphic>
</inline-formula>
 is the exploration coefficient <xref rid="pone.0103143-Audibert1" ref-type="bibr">[17]</xref>
. In the original version of UCB1, <inline-formula><inline-graphic xlink:href="pone.0103143.e192.jpg"></inline-graphic>
</inline-formula>
 is set to 2. However, this value results in a high exploration rate. We use <inline-formula><inline-graphic xlink:href="pone.0103143.e193.jpg"></inline-graphic>
</inline-formula>
 in all the experiments to increase the speed of learning for the rival agents.</p>
<p>It should be noted that when we use initial capital for a sensor, we are referring to the agent that learns in that sensor space. For instance, Visual refers to the agent that uses only the visual sate space for its learning.</p>
<p>The average reward against the time step for all the agents and the optimal Bayesian observer are shown in <xref ref-type="fig" rid="pone-0103143-g004">Figure 4A</xref>
. For the proposed methods (MOS and LUS), we employed bound (1) with <inline-formula><inline-graphic xlink:href="pone.0103143.e194.jpg"></inline-graphic>
</inline-formula>
. As can be seen in the figure, the proposed methods have a noticeably faster learning and higher rewards compared to the Visual<inline-formula><inline-graphic xlink:href="pone.0103143.e195.jpg"></inline-graphic>
</inline-formula>
Auditory agent. The Visual and the Auditory agents both have a smaller state space (only one sensor) which results in a fast learning during initial time steps. However, due to their partial perception, they can never reach the performance of the optimal Bayesian observer.</p>
<fig id="pone-0103143-g004" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g004</object-id>
<label>Figure 4</label>
<caption><title>Performance and behavior of the method in the localization task.</title>
<p>All graphs are results of averaging over 20 independent runs and passing a moving average window with size 500. (<bold>A</bold>
) Average reward for all agents. For the proposed methods (MOS and LUS), we used <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
, employing bound (1) with <inline-formula><inline-graphic xlink:href="pone.0103143.e196.jpg"></inline-graphic>
</inline-formula>
 for calculating confidence intervals. The rival methods employ the UCB1 policy on the individual sensors and on the joint space. (<bold>B</bold>
) Average acceptance rate (1–rejection rate) of the individual sensors in the proposed method (MOS). (<bold>C</bold>
) The average dominancy percentage of each source in decision making (MOS). In the first half of learning steps, vision is the dominant sensor while the agent prefers the integrated sensory data in the rest of learning steps.</p>
</caption>
<graphic xlink:href="pone.0103143.g004"></graphic>
</fig>
<p>To evaluate the proposed generalization test (see <xref ref-type="fig" rid="pone-0103143-g002">Figure 2</xref>
 and Generalization Test) for the proposed method (MOS), the average outcome of the test for the chosen action against the time step is shown in <xref ref-type="fig" rid="pone-0103143-g004">Figure 4B</xref>
. The value in the vertical axis specifies the rate of acceptance in the test which is 1–rejection rate. The test completely accepts the individual sensors during initial steps. This is in line with having a generalization power in the individual sensors due to more samples. Nevertheless, as the joint space learning improves, the rate of acceptance for the individual sensors decreases. This is because of sufficient experience accumulation in the joint space and existence of perceptual aliasing in the individual sensor spaces. This decline is more noticeable for the auditory sensor which is less reliable.</p>
<p>To investigate the decision making behavior of the proposed method (MOS), the average dominancy percentage of each source of information over time is shown in <xref ref-type="fig" rid="pone-0103143-g004">Figure 4C</xref>
. In the initial steps of learning, vision is the dominant modality. However, as the time step increases there is a tendency to rely on the joint space for decision making (sensory integration). Considering <xref ref-type="fig" rid="pone-0103143-g004">Figure 4A</xref>
 and <xref ref-type="fig" rid="pone-0103143-g004">Figure 4C</xref>
 we can conclude that as the average reward received in the joint space increases, the proposed method gradually switches its decision policy from selection to integration. This behavior is comparable to the humans' shift from sensory selection at childhood to sensory integration at adulthood.</p>
<p>Performance criteria for different variations of the proposed method and the Visual<inline-formula><inline-graphic xlink:href="pone.0103143.e197.jpg"></inline-graphic>
</inline-formula>
Auditory agent are illustrated in <xref ref-type="table" rid="pone-0103143-t004">Table 4</xref>
.</p>
<table-wrap id="pone-0103143-t004" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.t004</object-id>
<label>Table 4</label>
<caption><title>Analyzing the learning speed and the behavior of different methods for Experiment 1 and 2.</title>
</caption>
<alternatives><graphic id="pone-0103143-t004-4" xlink:href="pone.0103143.t004"></graphic>
<table frame="hsides" rules="groups"><colgroup span="1"><col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead><tr><td colspan="2" align="left" rowspan="1">Percentage of accumulated reward Learning Method</td>
<td colspan="4" align="left" rowspan="1">Experiment 1</td>
<td colspan="5" align="left" rowspan="1">Experiment 2</td>
</tr>
<tr><td colspan="2" align="left" rowspan="1"></td>
<td align="left" rowspan="1" colspan="1"># time step</td>
<td colspan="3" align="left" rowspan="1">Percentage of dominance</td>
<td align="left" rowspan="1" colspan="1"># time step</td>
<td colspan="4" align="left" rowspan="1">Percentage of dominance</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">V</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">I</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">V</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">N</td>
<td align="left" rowspan="1" colspan="1">I</td>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">60%</td>
<td align="left" rowspan="1" colspan="1">Joint Space</td>
<td align="left" rowspan="1" colspan="1">38,113</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">1,141,640</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">100</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">MOS, bound(1), <inline-formula><inline-graphic xlink:href="pone.0103143.e198.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">8,200</td>
<td align="left" rowspan="1" colspan="1">56</td>
<td align="left" rowspan="1" colspan="1">32</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">12,455</td>
<td align="left" rowspan="1" colspan="1">62</td>
<td align="left" rowspan="1" colspan="1">27</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">4</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">LUS, bound (1), <inline-formula><inline-graphic xlink:href="pone.0103143.e199.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">5,010</td>
<td align="left" rowspan="1" colspan="1">56</td>
<td align="left" rowspan="1" colspan="1">32</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">5,557</td>
<td align="left" rowspan="1" colspan="1">61</td>
<td align="left" rowspan="1" colspan="1">32</td>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">MOS, bound (2), <inline-formula><inline-graphic xlink:href="pone.0103143.e200.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">7,901</td>
<td align="left" rowspan="1" colspan="1">62</td>
<td align="left" rowspan="1" colspan="1">37</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">10,828</td>
<td align="left" rowspan="1" colspan="1">60</td>
<td align="left" rowspan="1" colspan="1">32</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">0</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">LUS, bound (2), <inline-formula><inline-graphic xlink:href="pone.0103143.e201.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">5,599</td>
<td align="left" rowspan="1" colspan="1">64</td>
<td align="left" rowspan="1" colspan="1">35</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">8,852</td>
<td align="left" rowspan="1" colspan="1">60</td>
<td align="left" rowspan="1" colspan="1">29</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">0</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">75%</td>
<td align="left" rowspan="1" colspan="1">Joint Space</td>
<td align="left" rowspan="1" colspan="1">81,179</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">2,437,811</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">100</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">MOS, bound (1), <inline-formula><inline-graphic xlink:href="pone.0103143.e202.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">17,393</td>
<td align="left" rowspan="1" colspan="1">52</td>
<td align="left" rowspan="1" colspan="1">28</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="left" rowspan="1" colspan="1">33,911</td>
<td align="left" rowspan="1" colspan="1">62</td>
<td align="left" rowspan="1" colspan="1">25</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">9</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">LUS, bound (1), <inline-formula><inline-graphic xlink:href="pone.0103143.e203.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">10,341</td>
<td align="left" rowspan="1" colspan="1">58</td>
<td align="left" rowspan="1" colspan="1">29</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">17,289</td>
<td align="left" rowspan="1" colspan="1">57</td>
<td align="left" rowspan="1" colspan="1">31</td>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">7</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">MOS, bound (2), <inline-formula><inline-graphic xlink:href="pone.0103143.e204.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">17,854</td>
<td align="left" rowspan="1" colspan="1">61</td>
<td align="left" rowspan="1" colspan="1">37</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">35,979</td>
<td align="left" rowspan="1" colspan="1">67</td>
<td align="left" rowspan="1" colspan="1">28</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">LUS, bound (2), <inline-formula><inline-graphic xlink:href="pone.0103143.e205.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">14,138</td>
<td align="left" rowspan="1" colspan="1">67</td>
<td align="left" rowspan="1" colspan="1">31</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">35,461</td>
<td align="left" rowspan="1" colspan="1">68</td>
<td align="left" rowspan="1" colspan="1">27</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">90%</td>
<td align="left" rowspan="1" colspan="1">Joint Space</td>
<td align="left" rowspan="1" colspan="1">348,945</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">10,036,225</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">100</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">MOS, bound (1), <inline-formula><inline-graphic xlink:href="pone.0103143.e206.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">72,689</td>
<td align="left" rowspan="1" colspan="1">40</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="left" rowspan="1" colspan="1">40</td>
<td align="left" rowspan="1" colspan="1">1,148,066</td>
<td align="left" rowspan="1" colspan="1">43</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">35</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">LUS, bound (1), <inline-formula><inline-graphic xlink:href="pone.0103143.e207.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">43,281</td>
<td align="left" rowspan="1" colspan="1">50</td>
<td align="left" rowspan="1" colspan="1">25</td>
<td align="left" rowspan="1" colspan="1">25</td>
<td align="left" rowspan="1" colspan="1">974,986</td>
<td align="left" rowspan="1" colspan="1">39</td>
<td align="left" rowspan="1" colspan="1">21</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">36</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">MOS, bound (2), <inline-formula><inline-graphic xlink:href="pone.0103143.e208.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">96,437</td>
<td align="left" rowspan="1" colspan="1">53</td>
<td align="left" rowspan="1" colspan="1">38</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">1,767,754</td>
<td align="left" rowspan="1" colspan="1">58</td>
<td align="left" rowspan="1" colspan="1">30</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">9</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">LUS, bound (2), <inline-formula><inline-graphic xlink:href="pone.0103143.e209.jpg"></inline-graphic>
</inline-formula>
</td>
<td align="left" rowspan="1" colspan="1">94,204</td>
<td align="left" rowspan="1" colspan="1">66</td>
<td align="left" rowspan="1" colspan="1">25</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">1,831,145</td>
<td align="left" rowspan="1" colspan="1">61</td>
<td align="left" rowspan="1" colspan="1">24</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">12</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot><fn id="nt101"><label></label>
<p>The performance criterion is the number of time steps needed to reach a certain percentage of the Bayesian optimal observer's accumulated reward. V = visual, A = auditory, N = noise, I = integration.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In <xref ref-type="fig" rid="pone-0103143-g004">Figure 4A</xref>
 there is a temporary decline in the average reward of the individual sensors and the joint space agents. The reason behind these declines is the inherent temporary exploration in UCB1. In UCB1, the policy calculates <inline-formula><inline-graphic xlink:href="pone.0103143.e210.jpg"></inline-graphic>
</inline-formula>
 upper confidence bound where <inline-formula><inline-graphic xlink:href="pone.0103143.e211.jpg"></inline-graphic>
</inline-formula>
 has an inverse relation with the total number of samples in state <inline-formula><inline-graphic xlink:href="pone.0103143.e212.jpg"></inline-graphic>
</inline-formula>
 (the logarithmic term in <xref ref-type="disp-formula" rid="pone.0103143.e184">equation (15</xref>
)). Therefore, if an action has not been visited in a state for a long time, this term forces the agent to choose that action. For large state-action spaces, it creates temporary exploration phases in the learning. This exploration is beneficial in non-stationary environments, however, our environment is stationary and the exploration results in the observed decline. We reduced the exploration effect by using small <inline-formula><inline-graphic xlink:href="pone.0103143.e213.jpg"></inline-graphic>
</inline-formula>
 in (15). We tested the individual sensors and the joint space agents using constant alpha and different types of confidence intervals as well and the significant superiority of the proposed method was still intact.</p>
<sec id="s3a1"><title>A non-stationary change in the environment</title>
<p>Having a stationary environment is one of the basic assumptions we made. To investigate the effect of an unexpected change in the environment, we decreased the reliability of visual sensor to the lowest possible value at step <inline-formula><inline-graphic xlink:href="pone.0103143.e214.jpg"></inline-graphic>
</inline-formula>
. The underlying reward distributions for the visual sensor and the joint space changed accordingly. As <xref ref-type="fig" rid="pone-0103143-g005">Figure 5A</xref>
 shows, this change is detected by the proposed test. As a result, the rate of acceptance of the visual sensor noticeably decreases after step <inline-formula><inline-graphic xlink:href="pone.0103143.e215.jpg"></inline-graphic>
</inline-formula>
. However, in the decision making section, only the MOS method could cope with this disturbance and the LUS method failed to adapt its behavior; as it relies more on the joint space. The percentage of dominance for each source of information in the MOS method is shown in <xref ref-type="fig" rid="pone-0103143-g005">Figure 5B</xref>
. After time step <inline-formula><inline-graphic xlink:href="pone.0103143.e216.jpg"></inline-graphic>
</inline-formula>
, the agent relies more on the auditory sensor and only about 13% of decisions are made according to the visual data. We will discuss more on non-stationary environments in Discussions and Conclusions.</p>
<fig id="pone-0103143-g005" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g005</object-id>
<label>Figure 5</label>
<caption><title>Performance of the method (MOS) in response to an unexpected change in the environment.</title>
<p>At time step <inline-formula><inline-graphic xlink:href="pone.0103143.e217.jpg"></inline-graphic>
</inline-formula>
 the visual sensor fails and its variance changes to the highest possible value. All graphs are results of averaging over 10 independent runs and passing a moving average window with size 500. (<bold>A</bold>
) Average acceptance rate (<inline-formula><inline-graphic xlink:href="pone.0103143.e218.jpg"></inline-graphic>
</inline-formula>
) of the individual sensors. (<bold>B</bold>
) The average dominancy percentage of each source in decision making (MOS). After failure of the visual sensor, the method detects this change and relies on the auditory sensor for decision making.</p>
</caption>
<graphic xlink:href="pone.0103143.g005"></graphic>
</fig>
</sec>
<sec id="s3a2"><title>Parameter setting</title>
<p>The method (<xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
) does not need any tuning and the only open parameter is <inline-formula><inline-graphic xlink:href="pone.0103143.e219.jpg"></inline-graphic>
</inline-formula>
, initialized at the beginning of the learning. Alpha defines the agent's characteristic; smaller value for <inline-formula><inline-graphic xlink:href="pone.0103143.e220.jpg"></inline-graphic>
</inline-formula>
 results in larger confidence intervals which means more tendency toward exploration than exploitation. Moreover, small value for alpha makes the test easier for individual sensors to pass, and as a result, postpones the transition from selection to integration. <xref ref-type="fig" rid="pone-0103143-g006">Figure 6</xref>
 shows these effects in Experiment 1.</p>
<fig id="pone-0103143-g006" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g006</object-id>
<label>Figure 6</label>
<caption><title>Impact of <inline-formula><inline-graphic xlink:href="pone.0103143.e221.jpg"></inline-graphic>
</inline-formula>
.</title>
<p>We used four different values (0.05, 0.25, 0.45, 0.80) for <inline-formula><inline-graphic xlink:href="pone.0103143.e222.jpg"></inline-graphic>
</inline-formula>
 from being conservative to liberal in terms of confidence. All graphs are results of averaging over 10 independent runs and passing a moving average window with size 500. (<bold>A</bold>
) Average acceptance rate (1–rejection rate) of the individual sensors in the proposed method (MOS). The upper/lower ribbon for each value of <inline-formula><inline-graphic xlink:href="pone.0103143.e223.jpg"></inline-graphic>
</inline-formula>
 represents visual/auditory sensor. By increasing <inline-formula><inline-graphic xlink:href="pone.0103143.e224.jpg"></inline-graphic>
</inline-formula>
, the test becomes harder for the individual sensors to pass. (<bold>B</bold>
) The average dominancy percentage of each source in decision making (MOS). For each value of <inline-formula><inline-graphic xlink:href="pone.0103143.e225.jpg"></inline-graphic>
</inline-formula>
, the ascending ribbon represents integration and the two descending ribbons represent selection of visual and auditory sensors. Increasing <inline-formula><inline-graphic xlink:href="pone.0103143.e226.jpg"></inline-graphic>
</inline-formula>
 results in earlier cross of the ascending and the descending ribbons; i.e. earlier switch from selection to integration.</p>
</caption>
<graphic xlink:href="pone.0103143.g006"></graphic>
</fig>
</sec>
</sec>
<sec id="s3b"><title>Experiment 2</title>
<p>The goal of this experiment is to study the method in the presence of an added unreliable sensor (noise). The new sensor's reading is uniformly distributed noise. In other words, there is no correlation between the position of the stimulus and the sensor's reading. By adding this sensor, the size of the joint state-action space jumps to <inline-formula><inline-graphic xlink:href="pone.0103143.e227.jpg"></inline-graphic>
</inline-formula>
.</p>
<p>The Noise agent has no beneficial learning and its average reward curve is flat throughout its life; see <xref ref-type="fig" rid="pone-0103143-g007">Figure 7A</xref>
. Furthermore, due to the presence of this unreliable sensor, learning by the joint space agent has been drastically diminished compared to the Visual agent. The proposed method (MOS) has been able to identify the unreliable source of information and therefore, has been superior to the joint space agent in terms of both learning speed and average reward. However, during the initial steps of learning, its average reward is slightly lower than the Visual agent. It is the cost of having no prior information about the unreliable sensor which makes the method to explore more at the early steps of learning.</p>
<fig id="pone-0103143-g007" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g007</object-id>
<label>Figure 7</label>
<caption><title>Performance and behavior of the method in response to an unreliable sensor.</title>
<p>All graphs are results of averaging over 20 independent runs and passing a moving average window with size 1000. (<bold>A</bold>
) Average reward for all agents. For the proposed method (MOS), we used <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
, employing bound (1) with <inline-formula><inline-graphic xlink:href="pone.0103143.e228.jpg"></inline-graphic>
</inline-formula>
 for calculating confidence intervals. The rival methods employ the UCB1 policy on the individual sensors and on the joint space. (<bold>B</bold>
) Average acceptance rate (1–rejection rate) of the individual sensors in the proposed method (MOS). (<bold>C</bold>
) The average dominancy percentage of each source in decision making (MOS). Due to unreliability of the noise sensor, it takes longer for learning in the integrated states to mature and, therefore, dominancy of the visual sensor is prolonged.</p>
</caption>
<graphic xlink:href="pone.0103143.g007"></graphic>
</fig>
<p>The results of the proposed test and the percentage of dominance of each source of information in decision making are shown in <xref ref-type="fig" rid="pone-0103143-g007">Figure 7B</xref>
 and <xref ref-type="fig" rid="pone-0103143-g007">Figure 7C</xref>
, respectively. The rate of acceptance for all subspaces declines by time and this decline is faster for the unreliable sensor. Moreover, according to <xref ref-type="fig" rid="pone-0103143-g007">Figure 7C</xref>
, only about 3% of the time the unreliable sensor chooses the final decision. This noise selection mostly contains explorative decisions. This result is evidence that the proposed method clearly considers a subsection of its state space as unreliable and filters it in the decision makings.</p>
<sec id="s3b1"><title>Comparisons</title>
<p><xref ref-type="table" rid="pone-0103143-t004">Table 4</xref>
 illustrates learning speed in terms of the number of time steps required for each method to reach a certain percentage of the accumulated reward that the Bayesian optimal decision maker achieves. <xref ref-type="table" rid="pone-0103143-t004">Table 4</xref>
 also shows the percentage of dominance for each source of information. In all variations of the proposed method, the percentage of dominance for sensory integration increases by progress of learning. Also in the second experiment, the dominance of the noise sensor decreases with time steps. The results indicate that presence of the unreliable sensor in the joint space has made the method slower in the second experiment. This is because the agent has to live with its reliable individual sensors until its joint space yields a reasonable amount of samples to be considered reliable.</p>
<p>We proposed two methods for decision making; namely MOS and LUS, see <xref ref-type="table" rid="pone-0103143-t001">Table 1</xref>
 and <xref ref-type="table" rid="pone-0103143-t002">Table 2</xref>
. The MOS method chooses the most optimistic source of information, while LUS attends the source with the lowest uncertainty. Both of these criteria are plausible choices for decision making and in our experience both and even some combinations of them work well in practice. Based on <xref ref-type="table" rid="pone-0103143-t004">Table 4</xref>
, the LUS method requires fewer time steps compared to the MOS method to reach a certain percentage of performance in both experiments.</p>
</sec>
<sec id="s3b2"><title>Confidence intervals</title>
<p>Due to the extreme conservative nature of bounds (2) and (3), for the same <inline-formula><inline-graphic xlink:href="pone.0103143.e229.jpg"></inline-graphic>
</inline-formula>
, their learning speed is slower than bound (1) in most cases. On the bright side, these bounds are mathematically valid for all kinds of reward distributions. To compensate for this conservativeness, it is recommended to use larger values for <inline-formula><inline-graphic xlink:href="pone.0103143.e230.jpg"></inline-graphic>
</inline-formula>
 (smaller confidence coefficients) when employing bounds (2) and (3). Furthermore, as mentioned in Method Section, bound (3) is only appropriate in situations where the variances of the reward distributions are small. However, in most cases, there is no information available about the type of the reward distributions and their variances. In these general situations, bound (2) with a moderate value for <inline-formula><inline-graphic xlink:href="pone.0103143.e231.jpg"></inline-graphic>
</inline-formula>
 is a reasonable choice. For example, in both of the discussed experiments, by using bound (2) and increasing the value of <inline-formula><inline-graphic xlink:href="pone.0103143.e232.jpg"></inline-graphic>
</inline-formula>
 to 0.4, we achieved similar learning speed and average reward to those illustrated in <xref ref-type="fig" rid="pone-0103143-g004">Figure 4A</xref>
 and <xref ref-type="fig" rid="pone-0103143-g007">Figure 7A</xref>
. A summary of these results is shown in <xref ref-type="table" rid="pone-0103143-t004">Table 4</xref>
.</p>
</sec>
<sec id="s3b3"><title>Extension to the power set of sensors</title>
<p>Throughout this paper, only individual sensors along with their joint space were considered as the sources of information. However, by a slight modification in <xref ref-type="disp-formula" rid="pone.0103143.e067">equations (4)–(7)</xref>
, we can calculate the necessary marginal values for any combination of sensors. Based on this idea, instead of <inline-formula><inline-graphic xlink:href="pone.0103143.e233.jpg"></inline-graphic>
</inline-formula>
 sensors, we can create <inline-formula><inline-graphic xlink:href="pone.0103143.e234.jpg"></inline-graphic>
</inline-formula>
 sources of information beside the primary joint space. By employing these sources instead of the individual sensors in line 5 of <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
, a new variation of the proposed method will be formed. Considering this modification in the algorithm, we performed Experiment 2 with the LUS method using bound (1) and <inline-formula><inline-graphic xlink:href="pone.0103143.e235.jpg"></inline-graphic>
</inline-formula>
. The percentage of dominance of each source of information is shown in <xref ref-type="fig" rid="pone-0103143-g008">Figure 8</xref>
. In the first section of learning, the final decision is mostly based on the reliable individual sensors and vision is the dominant modality. However, as the agent matures, the most reliable source of information, which is visual<inline-formula><inline-graphic xlink:href="pone.0103143.e236.jpg"></inline-graphic>
</inline-formula>
auditory subspace, takes the main role in decision makings. It means that the extended method has the ability to autonomously elicit the reliable subspaces and to filter the unreliable subspaces of its state space. This modification does not change the amount of required memory. However, the new processing complexity will be exponential, which is still reasonable for tasks with a few sensors.</p>
<fig id="pone-0103143-g008" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0103143.g008</object-id>
<label>Figure 8</label>
<caption><title>Dominancy of subspaces over time.</title>
<p>The average dominancy percentage of different combination of sensors in decision making (LUS). Subspaces including the unreliable source have been filtered. Furthermore, dependency on the integration of reliable sensors increases over time.</p>
</caption>
<graphic xlink:href="pone.0103143.g008"></graphic>
</fig>
</sec>
</sec>
</sec>
<sec id="s4"><title>Discussions and Conclusions</title>
<p>The optimal multisensory integration behavior of adults has been substantially addressed in the literature <xref rid="pone.0103143-Ernst1" ref-type="bibr">[1]</xref>
, <xref rid="pone.0103143-Alais1" ref-type="bibr">[2]</xref>
. However, there are fewer studies and experiments regarding the idea of sensory selection in children <xref rid="pone.0103143-Burr1" ref-type="bibr">[3]</xref>
–<xref rid="pone.0103143-Nardini2" ref-type="bibr">[6]</xref>
. This lack of sufficient observations is even more significant in the complete age spectral. As a result, there is not sufficient experimental data available to form a definite hypothesis about the transition from sensory selection to sensory integration.</p>
<p>One hypothesis regarding this transition has been proposed by Gori et al. <xref rid="pone.0103143-Gori1" ref-type="bibr">[4]</xref>
, <xref rid="pone.0103143-Gori2" ref-type="bibr">[21]</xref>
. Their hypothesis is that children select the more accurate sense in multisensory tasks with the purpose of cross-sensory calibration between senses. They suggested that the cross-sensory calibration might have an important impact on maturation of the multisensory perception. In this paper, we have illustrated that even in absence of the cross-sensory calibration hypothesis, the mere transition from the accurate subspaces to the joint space has its own computational advantages. This smooth transition not only facilitates maturation of the multisensory perception, but it is also essential for having a rewarding life.</p>
<p>To show these advantages, we proposed a general multisensory learning method (see <xref ref-type="sec" rid="s2">Method</xref>
 and <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
). The proposed method has the ability to autonomously choose different subsets of its state space based on their generalization property and reliability for decision making. Unlike the Bayesian framework, our method neither makes any prior assumptions about the observation model of sensors nor about the relation between sensory space and actions.</p>
<p>It was shown that for an agent who starts its life in a tabula rasa state, the seemingly optimal behavior is to rely on its individual sensors during early life, and to switch to the joint space (sensory integration) in later stages. This behavior is compatible to the empirical findings. Experimental data indicate that children do not integrate sensory information and make their judgments based only on one sensor, whereas adults use multisensory integration for their decision making <xref rid="pone.0103143-Burr1" ref-type="bibr">[3]</xref>
–<xref rid="pone.0103143-Nardini2" ref-type="bibr">[6]</xref>
. It was also shown that the proposed method is significantly superior to the individual sensor agents (sensory selection alone) and the joint space agent (only sensory integration) in terms of both learning speed and average reward. Based on these findings, we suggest that this selection and integration, which may be interpreted as two separate methods for decision making, are in fact two sides of a coin and both serve the reward maximization behavior. In addition, the transition from selection to integration is a developmental phenomenon and is smooth.</p>
<p>In our framework, the integration-based decisions will become dominant only after the agent receives enough multisensory experiences during the initial stages of its life. There is also similar empirical evidence that the maturation of the integration decisions is related to the early life experiences (see <xref rid="pone.0103143-Wallace1" ref-type="bibr">[22]</xref>
, <xref rid="pone.0103143-Burr1" ref-type="bibr">[3]</xref>
). Moreover, in <xref rid="pone.0103143-Weisswange1" ref-type="bibr">[10]</xref>
 the authors showed that by using the reward dependent framework, the problem of causal inference in multisensory perception <xref rid="pone.0103143-Krding1" ref-type="bibr">[23]</xref>
 could also be solved in an interactive fashion. For showing this, they used an artificial neural network for calculating the average reward statistics in the joint sensory space. Based on the average rewards, they used a softmax policy for decision making. With some simplifications, we can say that their agent is inherently equivalent to the joint space agent used in our work. The main focus of Weisswange et al. <xref rid="pone.0103143-Weisswange1" ref-type="bibr">[10]</xref>
 is on the ability of the learning agent to reach the performance of the Bayesian optimal observer. In our work, on the other hand, we have investigated the role of subspace selection in efficiency of interactive learning. Our results justify that our method can reach the performance of the Bayesian optimal observer as well. On top of that, our method justifies the switch from selection to integration in terms of reward maximization. These studies along with our results indicate that by considering the reward dependent framework, we can model (at least in the behavioral aspect) most of the age-related sensory integration phenomena, without making unnecessary mathematical assumptions about the sensor system and the task.</p>
<p>In Experiment 2 it was shown that the algorithm is also plausible in situations where there is a completely unreliable source of information in the joint space. Even in this extreme scenario, our method outperforms its competitors but faces a slight decrease in the learning speed during initial steps. This decrease is indispensable for any interactive learning method which explores different sources of information.</p>
<p>We assumed that the environment is stationary; i.e. the reward distributions are time invariant, or in other words, the sensory models are fixed throughout the learning. These assumptions are widely used in the learning literature. Nevertheless, interactive learning methods can inherently track non-stationary situations; but of course with a lag due to being experienced-based. We discuss this point more in the sequel. In <xref ref-type="fig" rid="pone-0103143-g005">Figure 5</xref>
 it is shown that the algorithm (using MOS) tracks the sudden change in the environment, called unexpected uncertainty <xref rid="pone.0103143-Dayan1" ref-type="bibr">[24]</xref>
, and adapts itself. Nevertheless, there are some methods to directly deal with unexpected uncertainty. For example, a solution is recalculation of the required statistics after detection of an unusual behavior from the environment. This can easily be done by saving the received rewards in a moving window (a short-term memory) and calculating the necessary statistics accordingly <xref rid="pone.0103143-Narain1" ref-type="bibr">[25]</xref>
.</p>
<p>In this work, for simplicity, we used tables for storing the required statistics. This naturally results in the discretization of the state space. Nevertheless, our approach can be generalized to continuous spaces by using the idea of function approximation for estimating the required statistics in <xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
. We believe that to demonstrate the subspace selection behavior of the proposed method for the task at hand, a simple discrete state space is a well-suited balance of complexity and simplicity. However, in our future works we will investigate and test the theory of continuous version of our algorithm in more complex and practical tasks.</p>
<p>In summary, the proposed algorithm is a dynamic subspace selection method for decision making in interactive learning frameworks. Our method intelligently evades the curse of dimensionality problem by exploiting inherent perceptual aliasing in subspaces. This results in fast learning in addition to an efficient and self-governing transition from sensory selection to integration. This transition is essential for having a rewarding life. In addition, the proposed algorithm (<xref ref-type="table" rid="pone-0103143-t003">Table 3</xref>
) is easily implementable. These properties make our method an appropriate candidate for lifetime learning of artificial agents having a large number of sensors. Therefore, an important direction of our research team is to extend the current single-step algorithm to a general multi-step learning and decision making algorithm (reinforcement learning). Based on the value-based decision making framework proposed in <xref rid="pone.0103143-Rangel1" ref-type="bibr">[9]</xref>
, we can categorize the main contribution of our algorithm in the representation phase where given a set of sensory inputs, the goal is to achieve the most rewarding state representation.</p>
</sec>
</body>
<back><ack><p>The author Pedram Daee would like to thank Amin Niazi and Habib Zafarian for their time and comments.</p>
</ack>
<ref-list><title>References</title>
<ref id="pone.0103143-Ernst1"><label>1</label>
<mixed-citation publication-type="journal"><name><surname>Ernst</surname>
<given-names>MO</given-names>
</name>
, <name><surname>Banks</surname>
<given-names>MS</given-names>
</name>
 (<year>2002</year>
) <article-title>Humans integrate visual and haptic information in a statistically optimal fashion</article-title>
. <source>Nature</source>
<volume>415</volume>
: <fpage>429</fpage>
–<lpage>433</lpage>
.<pub-id pub-id-type="pmid">11807554</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Alais1"><label>2</label>
<mixed-citation publication-type="journal"><name><surname>Alais</surname>
<given-names>D</given-names>
</name>
, <name><surname>Burr</surname>
<given-names>D</given-names>
</name>
 (<year>2004</year>
) <article-title>The Ventriloquist Effect Results from Near-Optimal Bimodal Integration</article-title>
. <source>Current Biology</source>
<volume>14</volume>
: <fpage>257</fpage>
–<lpage>262</lpage>
.<pub-id pub-id-type="pmid">14761661</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Burr1"><label>3</label>
<mixed-citation publication-type="book">Burr D, Gori M (2012) Multisensory Integration Develops Late in Humans. In: Murray MM, Wallace MT, editors. The Neural Bases of Multisensory Processes. Boca Raton (FL): CRC Press.</mixed-citation>
</ref>
<ref id="pone.0103143-Gori1"><label>4</label>
<mixed-citation publication-type="journal"><name><surname>Gori</surname>
<given-names>M</given-names>
</name>
, <name><surname>Del Viva</surname>
<given-names>M</given-names>
</name>
, <name><surname>Sandini</surname>
<given-names>G</given-names>
</name>
, <name><surname>Burr</surname>
<given-names>DC</given-names>
</name>
 (<year>2008</year>
) <article-title>Young Children Do Not Integrate Visual and Haptic Form Information</article-title>
. <source>Current Biology</source>
<volume>18</volume>
: <fpage>694</fpage>
–<lpage>698</lpage>
.<pub-id pub-id-type="pmid">18450446</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Nardini1"><label>5</label>
<mixed-citation publication-type="journal"><name><surname>Nardini</surname>
<given-names>M</given-names>
</name>
, <name><surname>Jones</surname>
<given-names>P</given-names>
</name>
, <name><surname>Bedford</surname>
<given-names>R</given-names>
</name>
, <name><surname>Braddick</surname>
<given-names>O</given-names>
</name>
 (<year>2008</year>
) <article-title>Development of Cue Integration in Human Navigation</article-title>
. <source>Current Biology</source>
<volume>18</volume>
: <fpage>689</fpage>
–<lpage>693</lpage>
.<pub-id pub-id-type="pmid">18450447</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Nardini2"><label>6</label>
<mixed-citation publication-type="journal"><name><surname>Nardini</surname>
<given-names>M</given-names>
</name>
, <name><surname>Bedford</surname>
<given-names>R</given-names>
</name>
, <name><surname>Mareschal</surname>
<given-names>D</given-names>
</name>
 (<year>2010</year>
) <article-title>Fusion of visual cues is not mandatory in children</article-title>
. <source>PNAS</source>
<volume>107</volume>
: <fpage>17041</fpage>
–<lpage>17046</lpage>
.<pub-id pub-id-type="pmid">20837526</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Ernst2"><label>7</label>
<mixed-citation publication-type="journal"><name><surname>Ernst</surname>
<given-names>MO</given-names>
</name>
 (<year>2008</year>
) <article-title>Multisensory integration: a late bloomer</article-title>
. <source>Current Biology</source>
<volume>18</volume>
: <fpage>R519</fpage>
–<lpage>521</lpage>
.<pub-id pub-id-type="pmid">18579094</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Sutton1"><label>8</label>
<mixed-citation publication-type="book">Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction. Cambridge, UK: MIT Press.</mixed-citation>
</ref>
<ref id="pone.0103143-Rangel1"><label>9</label>
<mixed-citation publication-type="journal"><name><surname>Rangel</surname>
<given-names>A</given-names>
</name>
, <name><surname>Camerer</surname>
<given-names>C</given-names>
</name>
, <name><surname>Montague</surname>
<given-names>PR</given-names>
</name>
 (<year>2008</year>
) <article-title>A framework for studying the neurobiology of value-based decision making</article-title>
. <source>Nature Reviews Neuroscience</source>
<volume>9</volume>
: <fpage>545</fpage>
–<lpage>556</lpage>
.<pub-id pub-id-type="pmid">18545266</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Weisswange1"><label>10</label>
<mixed-citation publication-type="journal"><name><surname>Weisswange</surname>
<given-names>TH</given-names>
</name>
, <name><surname>Rothkopf</surname>
<given-names>CA</given-names>
</name>
, <name><surname>Rodemann</surname>
<given-names>T</given-names>
</name>
, <name><surname>Triesch</surname>
<given-names>J</given-names>
</name>
 (<year>2011</year>
) <article-title>Bayesian Cue Integration as a Developmental Outcome of Reward Mediated Learning</article-title>
. <source>PLoS ONE</source>
<volume>6(7)</volume>
: <fpage>e21575</fpage>
<comment>doi:<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0021575">10.1371/journal.pone.0021575</ext-link>
</comment>
<pub-id pub-id-type="pmid">21750717</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Firouzi1"><label>11</label>
<mixed-citation publication-type="journal"><name><surname>Firouzi</surname>
<given-names>H</given-names>
</name>
, <name><surname>Ahmadabadi</surname>
<given-names>MN</given-names>
</name>
, <name><surname>Araabi</surname>
<given-names>BN</given-names>
</name>
, <name><surname>Amizadeh</surname>
<given-names>S</given-names>
</name>
, <name><surname>Mirian</surname>
<given-names>MS</given-names>
</name>
, <etal>et al</etal>
 (<year>2012</year>
) <article-title>Interactive Learning in Continuous Multimodal Space: A Bayesian Approach to Action-Based Soft Partitioning and Learning</article-title>
. <source>Autonomous Mental Development, IEEE Transactions on</source>
<volume>4</volume>
: <fpage>124</fpage>
–<lpage>138</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Mirian1"><label>12</label>
<mixed-citation publication-type="journal"><name><surname>Mirian</surname>
<given-names>MS</given-names>
</name>
, <name><surname>Ahmadabadi</surname>
<given-names>MN</given-names>
</name>
, <name><surname>Araabi</surname>
<given-names>BN</given-names>
</name>
, <name><surname>Siegwart</surname>
<given-names>RR</given-names>
</name>
 (<year>2010</year>
) <article-title>Learning Active Fusion of Multiple Experts' Decisions: An Attention-Based Approach</article-title>
. <source>Neural Computation</source>
<volume>23</volume>
: <fpage>558</fpage>
–<lpage>591</lpage>
.<pub-id pub-id-type="pmid">21105824</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Whitehead1"><label>13</label>
<mixed-citation publication-type="journal"><name><surname>Whitehead</surname>
<given-names>SD</given-names>
</name>
, <name><surname>Ballard</surname>
<given-names>DH</given-names>
</name>
 (<year>1991</year>
) <article-title>Learning to Perceive and Act by Trial and Error</article-title>
. <source>Machine Learning</source>
<volume>7</volume>
: <fpage>45</fpage>
–<lpage>83</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Mccallum1"><label>14</label>
<mixed-citation publication-type="other">Mccallum RA (1995) Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State. In Proceedings of the Twelfth International Conference on Machine Learning: 387–395.</mixed-citation>
</ref>
<ref id="pone.0103143-Mccallum2"><label>15</label>
<mixed-citation publication-type="other">Mccallum RA (1993) Overcoming Incomplete Perception with Utile Distinction Memory. In Proceedings of the Tenth International Conference on Machine Learning: 190–196.</mixed-citation>
</ref>
<ref id="pone.0103143-Casella1"><label>16</label>
<mixed-citation publication-type="other">Casella G, Berger RL (1990) Statistical inference. Belmont, CA: Duxbury Press.</mixed-citation>
</ref>
<ref id="pone.0103143-Audibert1"><label>17</label>
<mixed-citation publication-type="journal"><name><surname>Audibert</surname>
<given-names>J-Y</given-names>
</name>
, <name><surname>Munos</surname>
<given-names>R</given-names>
</name>
, <name><surname>Szepesvári</surname>
<given-names>C</given-names>
</name>
 (<year>2009</year>
) <article-title>Exploration-exploitation tradeoff using variance estimates in multi-armed bandits</article-title>
. <source>Theoretical Computer Science</source>
<volume>410</volume>
: <fpage>1876</fpage>
–<lpage>1902</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Lai1"><label>18</label>
<mixed-citation publication-type="journal"><name><surname>Lai</surname>
<given-names>T</given-names>
</name>
, <name><surname>Robbins</surname>
<given-names>H</given-names>
</name>
 (<year>1985</year>
) <article-title>Asymptotically efficient adaptive allocation rules</article-title>
. <source>Advances in Applied Mathematics</source>
<volume>6</volume>
: <fpage>4</fpage>
–<lpage>22</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Auer1"><label>19</label>
<mixed-citation publication-type="journal"><name><surname>Auer</surname>
<given-names>P</given-names>
</name>
, <name><surname>Cesa-Bianchi</surname>
<given-names>N</given-names>
</name>
, <name><surname>Fischer</surname>
<given-names>P</given-names>
</name>
 (<year>2002</year>
) <article-title>Finite-time Analysis of the Multiarmed Bandit Problem</article-title>
. <source>Machine Learning</source>
<volume>47</volume>
: <fpage>235</fpage>
–<lpage>256</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Battaglia1"><label>20</label>
<mixed-citation publication-type="journal"><name><surname>Battaglia</surname>
<given-names>PW</given-names>
</name>
, <name><surname>Jacobs</surname>
<given-names>RA</given-names>
</name>
, <name><surname>Aslin</surname>
<given-names>RN</given-names>
</name>
 (<year>2003</year>
) <article-title>Bayesian integration of visual and auditory signals for spatial localization</article-title>
. <source>Journal of the Optical Society of America A, Optics, image science, and vision</source>
<volume>20</volume>
: <fpage>1391</fpage>
–<lpage>1397</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Gori2"><label>21</label>
<mixed-citation publication-type="journal"><name><surname>Gori</surname>
<given-names>M</given-names>
</name>
, <name><surname>Sandini</surname>
<given-names>G</given-names>
</name>
, <name><surname>Burr</surname>
<given-names>D</given-names>
</name>
 (<year>2012</year>
) <article-title>Development of Visuo-Auditory Integration in Space and Time</article-title>
. <source>Frontiers in Integrative Neuroscience</source>
<volume>6</volume>
: <fpage>77</fpage>
.<pub-id pub-id-type="pmid">23060759</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Wallace1"><label>22</label>
<mixed-citation publication-type="journal"><name><surname>Wallace</surname>
<given-names>MT</given-names>
</name>
, <name><surname>Stein</surname>
<given-names>BE</given-names>
</name>
 (<year>2007</year>
) <article-title>Early experience determines how the senses will interact</article-title>
. <source>Journal of Neurophysiology</source>
<volume>97</volume>
: <fpage>921</fpage>
–<lpage>926</lpage>
.<pub-id pub-id-type="pmid">16914616</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Krding1"><label>23</label>
<mixed-citation publication-type="journal"><name><surname>Körding</surname>
<given-names>KP</given-names>
</name>
, <name><surname>Beierholm</surname>
<given-names>U</given-names>
</name>
, <name><surname>Ma</surname>
<given-names>WJ</given-names>
</name>
, <name><surname>Quartz</surname>
<given-names>S</given-names>
</name>
, <name><surname>Tenenbaum</surname>
<given-names>JB</given-names>
</name>
, <etal>et al</etal>
 (<year>2007</year>
) <article-title>Causal Inference in Multisensory Perception</article-title>
. <source>PLoS ONE</source>
<volume>2</volume>
: <fpage>e943</fpage>
.<pub-id pub-id-type="pmid">17895984</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0103143-Dayan1"><label>24</label>
<mixed-citation publication-type="journal"><name><surname>Dayan</surname>
<given-names>P</given-names>
</name>
, <name><surname>J Yu</surname>
<given-names>A</given-names>
</name>
 (<year>2003</year>
) <article-title>Uncertainty and learning</article-title>
. <source>IETE Journal of Research</source>
<volume>49</volume>
 (<issue>2/3</issue>
) <fpage>171</fpage>
–<lpage>182</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0103143-Narain1"><label>25</label>
<mixed-citation publication-type="journal"><name><surname>Narain</surname>
<given-names>D</given-names>
</name>
, <name><surname>van Beers</surname>
<given-names>RJ</given-names>
</name>
, <name><surname>Smeets</surname>
<given-names>JBJ</given-names>
</name>
, <name><surname>Brenner</surname>
<given-names>E</given-names>
</name>
 (<year>2013</year>
) <article-title>Sensorimotor priors in nonstationary environments</article-title>
. <source>J Neurophysiol</source>
<volume>109</volume>
: <fpage>1259</fpage>
–<lpage>67</lpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1152/jn.00605.2012">10.1152/jn.00605.2012</ext-link>
</comment>
<pub-id pub-id-type="pmid">23235999</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
<affiliations><list><country><li>Iran</li>
</country>
</list>
<tree><country name="Iran"><noRegion><name sortKey="Daee, Pedram" sort="Daee, Pedram" uniqKey="Daee P" first="Pedram" last="Daee">Pedram Daee</name>
</noRegion>
<name sortKey="Ahmadabadi, Majid Nili" sort="Ahmadabadi, Majid Nili" uniqKey="Ahmadabadi M" first="Majid Nili" last="Ahmadabadi">Majid Nili Ahmadabadi</name>
<name sortKey="Ahmadabadi, Majid Nili" sort="Ahmadabadi, Majid Nili" uniqKey="Ahmadabadi M" first="Majid Nili" last="Ahmadabadi">Majid Nili Ahmadabadi</name>
<name sortKey="Mirian, Maryam S" sort="Mirian, Maryam S" uniqKey="Mirian M" first="Maryam S." last="Mirian">Maryam S. Mirian</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/HapticV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003163 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 003163 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    HapticV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:4110011
   |texte=   Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:25058591" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a HapticV1

This area was generated with Dilib version V0.6.23.
Data generation: Mon Jun 13 01:09:46 2016. Site generation: Wed Mar 6 09:54:07 2024

	Serveur d'exploration sur les dispositifs haptiques
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur les dispositifs haptiques

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki