HapticV1, Ncbi, Merge, bibRecord, 003374

Structure learning and the Occam's razor principle: a new view of human function acquisition

Identifieur interne : 003374 ( Ncbi/Merge ); précédent : 003373; suivant : 003375

Structure learning and the Occam's razor principle: a new view of human function acquisition

Auteurs : Devika Narain [Allemagne, Pays-Bas] ; Jeroen B. J. Smeets [Pays-Bas] ; Pascal Mamassian [France] ; Eli Brenner [Pays-Bas] ; Robert J. Van Beers [Pays-Bas]

Source :

Frontiers in Computational Neuroscience [ 1662-5188 ] ; 2014.

RBID : PMC:4179744

Abstract

We often encounter pairs of variables in the world whose mutual relationship can be described by a function. After training, human responses closely correspond to these functional relationships. Here we study how humans predict unobserved segments of a function that they have been trained on and we compare how human predictions differ to those made by various function-learning models in the literature. Participants' performance was best predicted by the polynomial functions that generated the observations. Further, participants were able to explicitly report the correct generating function in most cases upon a post-experiment survey. This suggests that humans can abstract functions. To understand how they do so, we modeled human learning using an hierarchical Bayesian framework organized at two levels of abstraction: function learning and parameter learning, and used it to understand the time course of participants' learning as we surreptitiously changed the generating function over time. This Bayesian model selection framework allowed us to analyze the time course of function learning and parameter learning in relative isolation. We found that participants acquired new functions as they changed and even when parameter learning was not completely accurate, the probability that the correct function was learned remained high. Most importantly, we found that humans selected the simplest-fitting function with the highest probability and that they acquired simpler functions faster than more complex ones. Both aspects of this behavior, extent and rate of selection, present evidence that human function learning obeys the Occam's razor principle.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4179744

DOI: 10.3389/fncom.2014.00121
PubMed: 25324770
PubMed Central: 4179744

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: 001D11
to stream Pmc, to step Curation: 001D11
to stream Pmc, to step Checkpoint: 000933

Links to Exploration step

PMC:4179744

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Structure learning and the Occam's razor principle: a new view of human function acquisition</title>
<author><name sortKey="Narain, Devika" sort="Narain, Devika" uniqKey="Narain D" first="Devika" last="Narain">Devika Narain</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><institution>Max Planck Institute for Biological Cybernetics</institution>
<country>Tuebingen, Germany</country>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff2"><institution>Max Planck Institute for Intelligent Systems</institution>
<country>Tuebingen, Germany</country>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Smeets, Jeroen B J" sort="Smeets, Jeroen B J" uniqKey="Smeets J" first="Jeroen B. J." last="Smeets">Jeroen B. J. Smeets</name>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Mamassian, Pascal" sort="Mamassian, Pascal" uniqKey="Mamassian P" first="Pascal" last="Mamassian">Pascal Mamassian</name>
<affiliation wicri:level="1"><nlm:aff id="aff4"><institution>Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Ecole Normale Supérieure</institution>
<country>Paris, France</country>
</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Brenner, Eli" sort="Brenner, Eli" uniqKey="Brenner E" first="Eli" last="Brenner">Eli Brenner</name>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Van Beers, Robert J" sort="Van Beers, Robert J" uniqKey="Van Beers R" first="Robert J." last="Van Beers">Robert J. Van Beers</name>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">25324770</idno>
<idno type="pmc">4179744</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4179744</idno>
<idno type="RBID">PMC:4179744</idno>
<idno type="doi">10.3389/fncom.2014.00121</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">001D11</idno>
<idno type="wicri:Area/Pmc/Curation">001D11</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000933</idno>
<idno type="wicri:Area/Ncbi/Merge">003374</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Structure learning and the Occam's razor principle: a new view of human function acquisition</title>
<author><name sortKey="Narain, Devika" sort="Narain, Devika" uniqKey="Narain D" first="Devika" last="Narain">Devika Narain</name>
<affiliation wicri:level="1"><nlm:aff id="aff1"><institution>Max Planck Institute for Biological Cybernetics</institution>
<country>Tuebingen, Germany</country>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff2"><institution>Max Planck Institute for Intelligent Systems</institution>
<country>Tuebingen, Germany</country>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Smeets, Jeroen B J" sort="Smeets, Jeroen B J" uniqKey="Smeets J" first="Jeroen B. J." last="Smeets">Jeroen B. J. Smeets</name>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Mamassian, Pascal" sort="Mamassian, Pascal" uniqKey="Mamassian P" first="Pascal" last="Mamassian">Pascal Mamassian</name>
<affiliation wicri:level="1"><nlm:aff id="aff4"><institution>Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Ecole Normale Supérieure</institution>
<country>Paris, France</country>
</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Brenner, Eli" sort="Brenner, Eli" uniqKey="Brenner E" first="Eli" last="Brenner">Eli Brenner</name>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Van Beers, Robert J" sort="Van Beers, Robert J" uniqKey="Van Beers R" first="Robert J." last="Van Beers">Robert J. Van Beers</name>
<affiliation wicri:level="1"><nlm:aff id="aff3"><institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</nlm:aff>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea></wicri:regionArea>
<wicri:regionArea># see nlm:aff region in country</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Frontiers in Computational Neuroscience</title>
<idno type="eISSN">1662-5188</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>We often encounter pairs of variables in the world whose mutual relationship can be described by a function. After training, human responses closely correspond to these functional relationships. Here we study how humans predict unobserved segments of a function that they have been trained on and we compare how human predictions differ to those made by various function-learning models in the literature. Participants' performance was best predicted by the polynomial functions that generated the observations. Further, participants were able to explicitly report the correct generating function in most cases upon a post-experiment survey. This suggests that humans can abstract functions. To understand how they do so, we modeled human learning using an hierarchical Bayesian framework organized at two levels of abstraction: function learning and parameter learning, and used it to understand the time course of participants' learning as we surreptitiously changed the generating function over time. This Bayesian model selection framework allowed us to analyze the time course of function learning and parameter learning in relative isolation. We found that participants acquired new functions as they changed and even when parameter learning was not completely accurate, the probability that the correct function was learned remained high. Most importantly, we found that humans selected the simplest-fitting function with the highest probability and that they acquired simpler functions faster than more complex ones. Both aspects of this behavior, extent and rate of selection, present evidence that human function learning obeys the Occam's razor principle.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Acu A, D E" uniqKey="Acu A D">D. E. Acuña</name>
</author>
<author><name sortKey="Schrater, P" uniqKey="Schrater P">P. Schrater</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alais, D" uniqKey="Alais D">D. Alais</name>
</author>
<author><name sortKey="Burr, D" uniqKey="Burr D">D. Burr</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bedford, F L" uniqKey="Bedford F">F. L. Bedford</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Braun, D A" uniqKey="Braun D">D. A. Braun</name>
</author>
<author><name sortKey="Aertsen, A" uniqKey="Aertsen A">A. Aertsen</name>
</author>
<author><name sortKey="Wolpert, D M" uniqKey="Wolpert D">D. M. Wolpert</name>
</author>
<author><name sortKey="Mehring, C" uniqKey="Mehring C">C. Mehring</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Braun, D A" uniqKey="Braun D">D. A. Braun</name>
</author>
<author><name sortKey="Mehring, C" uniqKey="Mehring C">C. Mehring</name>
</author>
<author><name sortKey="Wolpert, D M" uniqKey="Wolpert D">D. M. Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Braun, D A" uniqKey="Braun D">D. A. Braun</name>
</author>
<author><name sortKey="Waldert, S" uniqKey="Waldert S">S. Waldert</name>
</author>
<author><name sortKey="Aertsen, A" uniqKey="Aertsen A">A. Aertsen</name>
</author>
<author><name sortKey="Wolpert, D M" uniqKey="Wolpert D">D. M. Wolpert</name>
</author>
<author><name sortKey="Mehring, C" uniqKey="Mehring C">C. Mehring</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brehmer, B" uniqKey="Brehmer B">B. Brehmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brenner, E" uniqKey="Brenner E">E. Brenner</name>
</author>
<author><name sortKey="Smeets, J B J" uniqKey="Smeets J">J. B. J. Smeets</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Burnham, K P" uniqKey="Burnham K">K. P. Burnham</name>
</author>
<author><name sortKey="Anderson, D R" uniqKey="Anderson D">D. R. Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Busemeyer, J R" uniqKey="Busemeyer J">J. R. Busemeyer</name>
</author>
<author><name sortKey="Byun, E" uniqKey="Byun E">E. Byun</name>
</author>
<author><name sortKey="Delosh, E L" uniqKey="Delosh E">E. L. DeLosh</name>
</author>
<author><name sortKey="Mcdaniel, M A" uniqKey="Mcdaniel M">M. A. McDaniel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Carroll, J D" uniqKey="Carroll J">J. D. Carroll</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Deane, D H" uniqKey="Deane D">D. H. Deane</name>
</author>
<author><name sortKey="Hammond, K R" uniqKey="Hammond K">K. R. Hammond</name>
</author>
<author><name sortKey="Summers, D A" uniqKey="Summers D">D. A. Summers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Delosh, E L" uniqKey="Delosh E">E. L. DeLosh</name>
</author>
<author><name sortKey="Busemeyer, J R" uniqKey="Busemeyer J">J. R. Busemeyer</name>
</author>
<author><name sortKey="Mcdaniel, M A" uniqKey="Mcdaniel M">M. A. McDaniel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ernst, M O" uniqKey="Ernst M">M. O. Ernst</name>
</author>
<author><name sortKey="Banks, M S" uniqKey="Banks M">M. S. Banks</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ernst, M O" uniqKey="Ernst M">M. O. Ernst</name>
</author>
<author><name sortKey="Van Dam, L C J" uniqKey="Van Dam L">L. C. J. van Dam</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Faisal, A A" uniqKey="Faisal A">A. A. Faisal</name>
</author>
<author><name sortKey="Selen, L P" uniqKey="Selen L">L. P. Selen</name>
</author>
<author><name sortKey="Wolpert, D M" uniqKey="Wolpert D">D. M. Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fulvio, J" uniqKey="Fulvio J">J. Fulvio</name>
</author>
<author><name sortKey="Green, C S" uniqKey="Green C">C. S. Green</name>
</author>
<author><name sortKey="Schrater, P" uniqKey="Schrater P">P. Schrater</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Genewein, T" uniqKey="Genewein T">T. Genewein</name>
</author>
<author><name sortKey="Braun, D A" uniqKey="Braun D">D. A. Braun</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gershman, S J" uniqKey="Gershman S">S. J. Gershman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Griffiths, T L" uniqKey="Griffiths T">T. L. Griffiths</name>
</author>
<author><name sortKey="Chater, N" uniqKey="Chater N">N. Chater</name>
</author>
<author><name sortKey="Kemp, C" uniqKey="Kemp C">C. Kemp</name>
</author>
<author><name sortKey="Perfors, A" uniqKey="Perfors A">A. Perfors</name>
</author>
<author><name sortKey="Tenenbaum, J B" uniqKey="Tenenbaum J">J. B. Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Griffiths, T L" uniqKey="Griffiths T">T. L. Griffiths</name>
</author>
<author><name sortKey="Williams, J G" uniqKey="Williams J">J. G. Williams</name>
</author>
<author><name sortKey="Kalish, M L" uniqKey="Kalish M">M. L. Kalish</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jazayeri, M" uniqKey="Jazayeri M">M. Jazayeri</name>
</author>
<author><name sortKey="Shadlen, M N" uniqKey="Shadlen M">M. N. Shadlen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kalish, M L" uniqKey="Kalish M">M. L. Kalish</name>
</author>
<author><name sortKey="Griffiths, T L" uniqKey="Griffiths T">T. L. Griffiths</name>
</author>
<author><name sortKey="Lewandowsky, S" uniqKey="Lewandowsky S">S. Lewandowsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kalish, M L" uniqKey="Kalish M">M. L. Kalish</name>
</author>
<author><name sortKey="Lewandowsky, S" uniqKey="Lewandowsky S">S. Lewandowsky</name>
</author>
<author><name sortKey="Kruschke, J K" uniqKey="Kruschke J">J. K. Kruschke</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kass, R E" uniqKey="Kass R">R. E. Kass</name>
</author>
<author><name sortKey="Raftery, A E" uniqKey="Raftery A">A. E. Raftery</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kemp, C" uniqKey="Kemp C">C. Kemp</name>
</author>
<author><name sortKey="Tenenbaum, J B" uniqKey="Tenenbaum J">J. B. Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kemp, C" uniqKey="Kemp C">C. Kemp</name>
</author>
<author><name sortKey="Tenenbaum, J B" uniqKey="Tenenbaum J">J. B. Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koh, K" uniqKey="Koh K">K. Koh</name>
</author>
<author><name sortKey="Meyer, D E" uniqKey="Meyer D">D. E. Meyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kording, K P" uniqKey="Kording K">K. P. Körding</name>
</author>
<author><name sortKey="Beierholm, U" uniqKey="Beierholm U">U. Beierholm</name>
</author>
<author><name sortKey="Ma, W J" uniqKey="Ma W">W. J. Ma</name>
</author>
<author><name sortKey="Quartz, S" uniqKey="Quartz S">S. Quartz</name>
</author>
<author><name sortKey="Tenenbaum, J B" uniqKey="Tenenbaum J">J. B. Tenenbaum</name>
</author>
<author><name sortKey="Shams, L" uniqKey="Shams L">L. Shams</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Maij, F" uniqKey="Maij F">F. Maij</name>
</author>
<author><name sortKey="Brenner, E" uniqKey="Brenner E">E. Brenner</name>
</author>
<author><name sortKey="Smeets, J B J" uniqKey="Smeets J">J. B. J. Smeets</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mcdaniel, M A" uniqKey="Mcdaniel M">M. A. McDaniel</name>
</author>
<author><name sortKey="Busemeyer, J R" uniqKey="Busemeyer J">J. R. Busemeyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mcdaniel, M A" uniqKey="Mcdaniel M">M. A. McDaniel</name>
</author>
<author><name sortKey="Dimperio, E" uniqKey="Dimperio E">E. Dimperio</name>
</author>
<author><name sortKey="Griego, J A" uniqKey="Griego J">J. A. Griego</name>
</author>
<author><name sortKey="Busemeyer, J R" uniqKey="Busemeyer J">J. R. Busemeyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Narain, D" uniqKey="Narain D">D. Narain</name>
</author>
<author><name sortKey="Mamassian, P" uniqKey="Mamassian P">P. Mamassian</name>
</author>
<author><name sortKey="Van Beers, R J" uniqKey="Van Beers R">R. J. van Beers</name>
</author>
<author><name sortKey="Smeets, J B J" uniqKey="Smeets J">J. B. J. Smeets</name>
</author>
<author><name sortKey="Brenner, E" uniqKey="Brenner E">E. Brenner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Narain, D" uniqKey="Narain D">D. Narain</name>
</author>
<author><name sortKey="Van Beers, R J" uniqKey="Van Beers R">R. J. van Beers</name>
</author>
<author><name sortKey="Smeets, J B J" uniqKey="Smeets J">J. B. J. Smeets</name>
</author>
<author><name sortKey="Brenner, E" uniqKey="Brenner E">E. Brenner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Raftery, A E" uniqKey="Raftery A">A. E. Raftery</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tenenbaum, J B" uniqKey="Tenenbaum J">J. B. Tenenbaum</name>
</author>
<author><name sortKey="Kemp, C" uniqKey="Kemp C">C. Kemp</name>
</author>
<author><name sortKey="Griffiths, T L" uniqKey="Griffiths T">T. L. Griffiths</name>
</author>
<author><name sortKey="Goodman, N D" uniqKey="Goodman N">N. D. Goodman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Turnham, E J A" uniqKey="Turnham E">E. J. A. Turnham</name>
</author>
<author><name sortKey="Braun, D A" uniqKey="Braun D">D. A. Braun</name>
</author>
<author><name sortKey="Wolpert, D M" uniqKey="Wolpert D">D. M. Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Beers, R J" uniqKey="Van Beers R">R. J. van Beers</name>
</author>
<author><name sortKey="Sittig, A C" uniqKey="Sittig A">A. C. Sittig</name>
</author>
<author><name sortKey="Denier Van Der Gon, J J" uniqKey="Denier Van Der Gon J">J. J. Denier van der Gon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wasserman, L" uniqKey="Wasserman L">L. Wasserman</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Front Comput Neurosci</journal-id>
<journal-id journal-id-type="iso-abbrev">Front Comput Neurosci</journal-id>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title-group><journal-title>Frontiers in Computational Neuroscience</journal-title>
</journal-title-group>
<issn pub-type="epub">1662-5188</issn>
<publisher><publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">25324770</article-id>
<article-id pub-id-type="pmc">4179744</article-id>
<article-id pub-id-type="doi">10.3389/fncom.2014.00121</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Neuroscience</subject>
<subj-group><subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group><article-title>Structure learning and the Occam's razor principle: a new view of human function acquisition</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Narain</surname>
<given-names>Devika</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup>
</xref>
<xref ref-type="author-notes" rid="fn001"><sup>*</sup>
</xref>
<uri xlink:type="simple" xlink:href="http://community.frontiersin.org/people/u/168816"></uri>
</contrib>
<contrib contrib-type="author"><name><surname>Smeets</surname>
<given-names>Jeroen B. J.</given-names>
</name>
<xref ref-type="aff" rid="aff3"><sup>3</sup>
</xref>
<uri xlink:type="simple" xlink:href="http://community.frontiersin.org/people/u/154506"></uri>
</contrib>
<contrib contrib-type="author"><name><surname>Mamassian</surname>
<given-names>Pascal</given-names>
</name>
<xref ref-type="aff" rid="aff4"><sup>4</sup>
</xref>
<uri xlink:type="simple" xlink:href="http://community.frontiersin.org/people/u/120255"></uri>
</contrib>
<contrib contrib-type="author"><name><surname>Brenner</surname>
<given-names>Eli</given-names>
</name>
<xref ref-type="aff" rid="aff3"><sup>3</sup>
</xref>
<uri xlink:type="simple" xlink:href="http://community.frontiersin.org/people/u/63919"></uri>
</contrib>
<contrib contrib-type="author"><name><surname>van Beers</surname>
<given-names>Robert J.</given-names>
</name>
<xref ref-type="aff" rid="aff3"><sup>3</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup>
<institution>Max Planck Institute for Biological Cybernetics</institution>
<country>Tuebingen, Germany</country>
</aff>
<aff id="aff2"><sup>2</sup>
<institution>Max Planck Institute for Intelligent Systems</institution>
<country>Tuebingen, Germany</country>
</aff>
<aff id="aff3"><sup>3</sup>
<institution>Faculty of Human Movement Sciences, MOVE Research Institute Amsterdam, VU University</institution>
<country>Amsterdam, Netherlands</country>
</aff>
<aff id="aff4"><sup>4</sup>
<institution>Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Ecole Normale Supérieure</institution>
<country>Paris, France</country>
</aff>
<author-notes><fn fn-type="edited-by"><p>Edited by: Martin Giese, University Clinic Tübingen, Germany</p>
</fn>
<fn fn-type="edited-by"><p>Reviewed by: Le Wang, Boston University, USA; Dominik M. Endres, Phillipps-University Marburg, Germany</p>
</fn>
<corresp id="fn001">*Correspondence: Devika Narain, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems, Spemannstrasse 38, Tuebingen 72076, Germany e-mail: <email xlink:type="simple">devika.narain@tuebingen.mpg.de</email>
</corresp>
<fn fn-type="other" id="fn002"><p>This article was submitted to the journal Frontiers in Computational Neuroscience.</p>
</fn>
</author-notes>
<pub-date pub-type="epub"><day>30</day>
<month>9</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="collection"><year>2014</year>
</pub-date>
<volume>8</volume>
<elocation-id>121</elocation-id>
<history><date date-type="received"><day>23</day>
<month>6</month>
<year>2014</year>
</date>
<date date-type="accepted"><day>11</day>
<month>9</month>
<year>2014</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2014 Narain, Smeets, Mamassian, Brenner and van Beers.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<abstract><p>We often encounter pairs of variables in the world whose mutual relationship can be described by a function. After training, human responses closely correspond to these functional relationships. Here we study how humans predict unobserved segments of a function that they have been trained on and we compare how human predictions differ to those made by various function-learning models in the literature. Participants' performance was best predicted by the polynomial functions that generated the observations. Further, participants were able to explicitly report the correct generating function in most cases upon a post-experiment survey. This suggests that humans can abstract functions. To understand how they do so, we modeled human learning using an hierarchical Bayesian framework organized at two levels of abstraction: function learning and parameter learning, and used it to understand the time course of participants' learning as we surreptitiously changed the generating function over time. This Bayesian model selection framework allowed us to analyze the time course of function learning and parameter learning in relative isolation. We found that participants acquired new functions as they changed and even when parameter learning was not completely accurate, the probability that the correct function was learned remained high. Most importantly, we found that humans selected the simplest-fitting function with the highest probability and that they acquired simpler functions faster than more complex ones. Both aspects of this behavior, extent and rate of selection, present evidence that human function learning obeys the Occam's razor principle.</p>
</abstract>
<kwd-group><kwd>structure learning</kwd>
<kwd>function learning</kwd>
<kwd>Bayesian model selection</kwd>
<kwd>Occam's razor</kwd>
<kwd>sensorimotor learning</kwd>
</kwd-group>
<counts><fig-count count="4"></fig-count>
<table-count count="1"></table-count>
<equation-count count="8"></equation-count>
<ref-count count="39"></ref-count>
<page-count count="13"></page-count>
<word-count count="9555"></word-count>
</counts>
</article-meta>
</front>
<body><sec sec-type="introduction" id="s1"><title>Introduction</title>
<p>Identifying relationships among environmental variables is often crucial to accurately predicting the value of one variable while using information from another. For instance, we routinely judge whether to cross a road after a quick glance at an oncoming car because we make predictions consistent with the functional relationship between distance, velocity and time. While playing ball sports, we predict how to best intercept a moving ball aided by predictions based on limited visual information of its trajectory. When trained, humans can reproduce these functional relationships fairly accurately. An explanation of how humans learn to accurately reproduce functions is however widely contested.</p>
<p>Some key issues underlying the study of function learning are, abstraction, rule-based learning, and parsimony. Abstraction refers to the ability to observe low-level information and infer an overarching rule that helps to better classify and predict these observations. Rule-based learning, in the context of function learning, refers to whether humans make principled assumptions while interpolating or extrapolating functions. Parsimony refers to the preference of a learning method that is effective in producing reliable predictions yet requires minimal resources.</p>
<p>Some theories suggest that functions are abstracted as integrated entities in a manner similar to polynomial regression (Carroll, <xref rid="B11" ref-type="bibr">1963</xref>
; Brehmer, <xref rid="B7" ref-type="bibr">1974</xref>
; Koh and Meyer, <xref rid="B27" ref-type="bibr">1991</xref>
) while other theories assume that no abstraction is necessary to explain human function learning. The proponents of the latter view propose that localized learning of a function through multiple independent elements (Kalish et al., <xref rid="B23" ref-type="bibr">2004</xref>
), Gaussian Processes (Griffiths et al., <xref rid="B21" ref-type="bibr">2008</xref>
) or paired associations between input and output based on trial and error learning (Busemeyer et al., <xref rid="B10" ref-type="bibr">1997</xref>
; DeLosh et al., <xref rid="B13" ref-type="bibr">1997</xref>
; McDaniel et al., <xref rid="B29" ref-type="bibr">2009</xref>
) can explain responses produced by humans. The second issue of contention is whether function learning is rule-based (irrespective of abstraction of the function) or whether function learning is a consequence of simple associations that emerge through trial-and-error learning. For instance, some algorithms assume that local linear basis functions can partition any given function until it is reasonably approximated. The use of a linear basis function constitutes a rule-based assumption. Thirdly, various algorithms proposed in function learning use a different number of parameters and therefore differ in algorithmic complexity, while at the same time, the generating functions presented to participants also differ in complexity since they have a different number of parameters (see McDaniel et al., <xref rid="B29" ref-type="bibr">2009</xref>
). Under these circumstances, it becomes difficult to compare models in an unbiased manner and at the same time detect systematic differences that humans display toward functions with different complexity. Without resolving such issues, the study of online learning of changing functional relationships will remain a challenge and it will be nearly impossible to tease apart whether a given effect is a consequence of inherent bias in algorithms or is derived from human behavior during online function learning.</p>
<p>In this work, we propose to study function learning under a new framework that provides a unifying perspective on all three of the above mentioned issues: <italic>structure learning</italic>
 (Braun et al., <xref rid="B4" ref-type="bibr">2009</xref>
, <xref rid="B5" ref-type="bibr">2010a</xref>
; Kemp and Tenenbaum, <xref rid="B26" ref-type="bibr">2009</xref>
; Tenenbaum et al., <xref rid="B33" ref-type="bibr">2011</xref>
). Structure learning supports the view that (1) functions are abstracted while at the same time, (2) allowing room for both associative and rule-based accounts for learning. (3) Assuming a structure hierarchy between functions and parameters allows us to study function learning online by internally compensating for algorithmic complexity, and therefore revealing the human treatment of function complexity while adjusting to online changes in the presented function.</p>
<p>Structure learning has recently emerged as an important theory to explain human learning in cognitive science (Kemp and Tenenbaum, <xref rid="B25" ref-type="bibr">2008</xref>
, <xref rid="B26" ref-type="bibr">2009</xref>
; Griffiths et al., <xref rid="B20" ref-type="bibr">2010</xref>
; Tenenbaum et al., <xref rid="B33" ref-type="bibr">2011</xref>
), perceptual learning (Körding et al., <xref rid="B37" ref-type="bibr">2007</xref>
; Braun et al., <xref rid="B6" ref-type="bibr">2010b</xref>
; Turnham et al., <xref rid="B34" ref-type="bibr">2011</xref>
; Narain et al., <xref rid="B30" ref-type="bibr">2013a</xref>
), and sensorimotor learning (Braun et al., <xref rid="B4" ref-type="bibr">2009</xref>
, <xref rid="B6" ref-type="bibr">2010b</xref>
; Acuña and Schrater, <xref rid="B1" ref-type="bibr">2010</xref>
). Its main principle is abstraction along an hierarchy of variables. It contends that the rapidity of learning and the extensiveness of learning generalization in humans can be explained by the abstraction of lower-dimensional manifolds. For example, in the case of function learning, a set of limited discrete function hypotheses may exist on a low dimensional subspace (ex. linear, quadratic or cubic), whereas the parameter spaces of each of these lie at a higher dimension (2 dimensions for linear, 3 dimensions for the quadratic). One of the advantages of such structure learning is that if evidence for a certain function is substantial, such information will constrain the search of parameters from an infinite search space to a finite subset within these higher dimensions and therefore facilitate the discovery of the true parameters. Thus, far, function-learning research has never attempted to separate these two levels of abstraction. Therefore, when participants did not reproduce the exact function, it was concluded that they could not learn the function. Although function learning and parameter learning may depend on each other, it should be possible to abstract the correct function without a completely accurate understanding of the parameters of that function.</p>
<p>Here we develop a new intuitive paradigm for function learning that introduces uncertainty into the learning process, thereby making multiple function hypotheses possible. Participants were given a brief spatial cue and were asked to shoot at a transient target that would appear after an unknown time-interval. Unbeknownst to the participants, the cue location and target time were related according to various continuous functions. In our first experiment, we tested performance based on predictions of associative learning algorithms (ALM: Associative learning model and EXAM: Extrapolation-Association model Busemeyer et al., <xref rid="B10" ref-type="bibr">1997</xref>
; DeLosh et al., <xref rid="B13" ref-type="bibr">1997</xref>
), on partition-based algorithms [POLE: Population of linear experts; (Kalish et al., <xref rid="B23" ref-type="bibr">2004</xref>
)], polynomial regression (Carroll, <xref rid="B11" ref-type="bibr">1963</xref>
; Koh and Meyer, <xref rid="B27" ref-type="bibr">1991</xref>
), and Gaussian Processes (Griffiths et al., <xref rid="B21" ref-type="bibr">2008</xref>
). Most importantly, we did not merely test the mean of the predictions, but also took the variance of the predictions of each of these algorithms into account. This method allowed us to assess how much of the variability in participants' responses could be explained by each algorithm.</p>
<p>The results of this study revealed that participants may be abstracting the functions that we presented. Motivated by this finding, we assumed a hierarchy between functions and parameters and used Bayesian model selection (BMS) to separate the model-learning and parameter-learning levels of analysis. BMS is a method to compute relative posterior probabilities among models by comparing the likelihood that each model produced the observed data (Raftery, <xref rid="B32" ref-type="bibr">1995</xref>
; Wasserman, <xref rid="B36" ref-type="bibr">2000</xref>
; Burnham and Anderson, <xref rid="B9" ref-type="bibr">2002</xref>
). The most important attribute of BMS is that it requires each parameter to be integrated out from the likelihood of each model. This gives rise to the marginal likelihood (or evidence) for that model, which is the key term in determining the model posteriors. When multiple models can fit the data equally well, models with more parameters have broader marginal likelihoods than those of simpler models. Therefore, BMS has an inbuilt parsimony mechanism that penalizes model hypotheses that have more parameters. At the same time, we implement BMS without any free parameters and ensure that all model hypotheses are equiprobable <italic>a priori</italic>
. This ensures that the probabilities generated by BMS are largely driven by the data of the participants. This enables us to focus on the results of participants' behavior and to determine whether human learning is governed by any perceptual rules of parsimony based on function complexity. We ensure that other aspects of our analysis do not bias our findings by performing a control analysis where we quantify what to expect from a simulated participant with full knowledge of the structure in the stimuli.</p>
<p>When we analyzed our participants' data using BMS, we found that even when other functions were viable candidates, and even if parameter learning was not completely accurate, the probability of the simplest function that could account for the data was the highest. Furthermore, the rate of acquisition of this simpler function was faster than that of more complex functions. Such parsimonious selection and facilitation is reminiscent of the theoretical principle called the Occam's razor. The principle states that when different models of varying complexity can account for the data equally well, the simplest one should be selected.</p>
</sec>
<sec sec-type="materials|methods" id="s2"><title>Materials and methods</title>
<sec><title>Experiment 1</title>
<sec><title>Design and procedure</title>
<p>Seven naïve paid participants gave written informed consent to perform a computerized experiment. This experiment was part of a program that has been approved by the ethical committee of the faculty of Human Movement Sciences, VU University, and adheres to the principles expressed in the Declaration of Helsinki. Before starting the experiment, all participants were shown an instruction video that familiarized them with the protocol and showed an example of a trial. They were invited to perform five trials on the setup (display: 597 × 336 mm, 83 Hz) on stimuli that were not used in the experiment.</p>
<p>In the study, participants attempted to strike a transient target by means of an animated bullet, launched on the screen by means of a key press (Figure <xref ref-type="fig" rid="F1">1A</xref>
). Within a single trial, the animated bullet moved upwards and the target flashed at one fixed location. The lateral position of the target changed across trials. Before the start of each trial, participants saw a starting rectangle indicating the lateral coordinate of the future target, and pressed a key to initiate the trial. Upon initiation, the screen refreshed to a plain black background whereupon a cue flashed briefly (35 ms) at the exact location of the future target, serving as a second spatial cue of the lateral target position. Participants had to anticipate the target by firing an animated shot that would intercept the target, which flashed for 150 ms after an unknown interval. If the shot was fired after the target appeared it would surely miss (300 ms travel time on each trial).</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p><bold>Experiment 1</bold>
. <bold>(A)</bold>
 Timeline of a single trial. Participants initiated a trial, were presented with a cue about the spatial location of a future target and they pressed a key to launch an animated bullet to catch it en route. <bold>(B)</bold>
 Generating functions in three experimental sessions. The vertical extent of the curves indicates the time during which the target was visible (150 ms). Orange colors represent the test regions; other colors represent the training regions. <bold>(C)</bold>
 Smoothed data-averages (Gaussian kernel radius 10 mm) from all participants overlaid upon pooled responses. <bold>(D)</bold>
 A single participant's data for the quadratic (purple) and cubic (green) sessions along with the set of models/heuristics that were fit individually to the training data of each participant and whose prediction distributions were used to calculate likelihoods on the basis of the test data (curves indicate mean, shaded regions represent standard deviation). <bold>(E)</bold>
 Differences between the negative log likelihoods (summed over all participants) of the generating function and remaining heuristics and models. Positive bars indicate worse performance when compared to the generating function whereas negative bars indicate better performance. Numbers atop each bar quantify how well the model performs with respect to the generating function model i.e., 2lnK where K is the Bayes factor (see Materials and Methods for details).</p>
</caption>
<graphic xlink:href="fncom-08-00121-g0001"></graphic>
</fig>
<p>Unbeknownst to the participants, there was a functional relationship between the time at which the target appeared after the start of the trial and the lateral position of the cue (identical to the lateral position of the target). In order to distinguish the presented functions from functions inferred from participant responses, we shall refer to the former as ‘generating functions’. Three such generating functions were presented in a different session each: a linear, a quadratic, and a cubic function spanning 200 mm of space and a temporal range of 660–2000 ms. Within a session the lateral positions of the stimuli were serially uncorrelated and uniformly distributed. In the equations, x represents the lateral position of the stimulus in mm, where <italic>x</italic>
 = 0 represents the center of the display and of all distributions. f(x) represents various functions specifying the onset time of the stimuli (in ms) relative to trial initiation:</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext>Linear</mml:mtext>
<mml:mo>:</mml:mo>
<mml:mtext> </mml:mtext>
<mml:msub><mml:mi>f</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1250</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext>Quadratic</mml:mtext>
<mml:mo>:</mml:mo>
<mml:mtext> </mml:mtext>
<mml:msub><mml:mi>f</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>0.1</mml:mn>
<mml:msup><mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1700</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext>Cubic </mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>wherea</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>1500</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>:</mml:mo>
<mml:mtext> </mml:mtext>
<mml:msub><mml:mi>f</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>a</mml:mi>
<mml:msup><mml:mi>x</mml:mi>
<mml:mn>3</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mi>a</mml:mi>
<mml:msup><mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1500</mml:mn>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>There were two types of trials, those in which the target was shown (training trials) and those in which the target was never shown (test trials). In the training trials, the full extent of the error was observable since the bullet crossed the screen before the trial ended. Participants were given points, auditory and visual signals if they scored a hit on the training trials. The bullet was seen on test trials, but the target never appeared, and neither did any feedback. Training and test trials were drawn from separate regions (Figure <xref ref-type="fig" rid="F1">1B</xref>
). Three sessions, separated by 5-min pauses, were each dedicated to a certain generating function. Different pre-trial screen colors were used to aid participants in their recollection of these sessions when they were questioned later. Participants were asked to treat each block as a new experiment. The session order was randomized across participants. Each session consisted of 170 trials in total, the first 50 were training trials and thereafter 60 test and training trials were interleaved.</p>
<p>At the conclusion of the experiment, participants were asked if they used a certain “strategy” to maximize their points. If they indicated that they recognized a relationship between the target location and timing, they were presented with a sheet containing 16 graphs (Supplementary Information Figure <xref ref-type="supplementary-material" rid="SM1">SI-1</xref>
), consisting of functions of various shapes and forms, and asked to pick the generating functions in the correct order in the three sessions (also distinguished by different colors) in which they were presented.</p>
</sec>
<sec><title>Data analysis</title>
<p>Of the seven participants that we tested, we excluded from the analysis the data of one participant who secured an average hit rate of less than 10% for all three sessions and responded at roughly the same value for the duration of each session, thereby demonstrating no knowledge of the nature of the generating function. The lowest average hit rate among the remaining six participants was 24%, while the mean hit rate was about 38%.</p>
<p>To the training responses of the remaining six participants, we fitted several models and heuristics that were inspired by various function-learning accounts in the literature. The parameters of all of the following models were fit in the same way only on the training region data, through maximum likelihood estimation. (1) An <italic>interpolation</italic>
 heuristic was fitted (details of fitting follow in next paragraph) with separate line segments for each training region and the predictions were interpolated linearly between these. In case of extrapolation regions (linear and cubic functions), this algorithm produced flat flanks extending at a constant value obtained from the nearest peripheral observations of the training range. This heuristic was developed to account for the behavior of algorithms like the associative learning model (ALM: Busemeyer et al., <xref rid="B10" ref-type="bibr">1997</xref>
) and for findings in studies where participants used linear approximations for interpolation but were not successful in extrapolating (Ernst and van Dam, <xref rid="B15" ref-type="bibr">2010</xref>
). We also used (2) an <italic>extrapolation</italic>
 heuristic that also fitted line segments to each training region but extrapolated its predictions. In the flanked test regions (central), when the intersection point of the two fitted line segments lay within the test region, we extrapolated these segments and when it did not, we linearly interpolated between the line segments. This is a partitioning algorithm based on Population of Linear Experts (POLE: Kalish et al., <xref rid="B23" ref-type="bibr">2004</xref>
). The associative ALM model was further developed to include linear extrapolation EXAM (DeLosh et al., <xref rid="B13" ref-type="bibr">1997</xref>
), and therefore uses different mechanisms to explain interpolation and extrapolation behavior. EXAM's predictions are therefore covered by a combination of the interpolation and extrapolation heuristic (discussed later).</p>
<p>To account for the function abstraction view either through polynomial regression or log-polynomial regression (Carroll, <xref rid="B11" ref-type="bibr">1963</xref>
; Brehmer, <xref rid="B7" ref-type="bibr">1974</xref>
; Koh and Meyer, <xref rid="B27" ref-type="bibr">1991</xref>
), we also used the (3) <italic>generating function</italic>
 and (4) <italic>various polynomials of lower and higher degrees</italic>
. And finally, we used (5) Gaussian processes regression with a squared exponential kernel function (free parameters: scale and precision) to test a novel approach to function learning (Griffiths et al., <xref rid="B21" ref-type="bibr">2008</xref>
). We could have used different basis functions for each generating function like Griffiths et al. (<xref rid="B21" ref-type="bibr">2008</xref>
), however, the use of various polynomial basis functions implies the abstraction of different functions and therefore would be tantamount to using the Structure learning framework. Therefore, we constrain our Gaussian processes regression analysis in a non-parametric and abstraction-free spirit using the most widely-used Gaussian processes kernel, the squared-exponential kernel.</p>
<p>We assumed, based on previous experiments (Narain et al., <xref rid="B30" ref-type="bibr">2013a</xref>
) that participants would have adequately learned the task in the first fifty training trials. We therefore removed these from the analysis. The remaining 60 training trials, which were interleaved with the same number of test trials, were used to fit the parameters for each function using maximum likelihood estimation. Using the fitted values of the parameters from the training regions, the mean and variance of predictive distributions were generated for the stimuli presented in test regions. The predictive distributions for continuous functions were assumed to be Gaussian and their mean and variance were calculated by calculating the variance of the model given parameters. For example, in the linear case, where <inline-formula><mml:math id="M9"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, is the estimate of the constant (bias) parameter, and <inline-formula><mml:math id="M10"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 that of the slope, while <italic>x</italic>
<sup>*</sup>
 represents the observed location of the cue for which a timing will be predicted, then the variance of that prediction is given by:</p>
<disp-formula id="E2"><mml:math id="M2"><mml:mrow><mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msup><mml:mi>x</mml:mi>
<mml:mo>*</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup><mml:mi>x</mml:mi>
<mml:mrow><mml:msup><mml:mo>*</mml:mo>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mn>2</mml:mn>
<mml:msup><mml:mi>x</mml:mi>
<mml:mo>*</mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>v</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mover accent="true"><mml:mi>θ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In the case of heuristics, different variances in two segments gave rise to discontinuities in the predicted variance. The Gaussian Processes provide their own predictive distribution. Given these predictive distributions, we then obtained the likelihoods of these heuristics and models given the responses in the test regions.</p>
</sec>
<sec><title>Bayes factor</title>
<p>The Bayes Factor (K) was calculated using the marginal likelihoods by integrating the likelihoods over a large range of parameters values for each model and heuristic. We then took the ratio of the marginal likelihoods of each model and heuristic against that of the generating function. The values reported in Figure <xref ref-type="fig" rid="F1">1E</xref>
 are 2 ln(K), which can be interpreted as in Table <xref ref-type="table" rid="T1">1</xref>
 (Kass and Raftery, <xref rid="B24" ref-type="bibr">1995</xref>
).</p>
<table-wrap id="T1" position="float"><label>Table 1</label>
<caption><p><bold>A scale to interpret the measure 2 ln(K), where K is the Bayes Factor, a measure used in Figure <xref ref-type="fig" rid="F1">1E</xref>
</bold>
.</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left" rowspan="1" colspan="1"><bold><italic>2 ln(K)</italic>
</bold>
</th>
<th align="left" rowspan="1" colspan="1"><bold>Strength of evidence</bold>
</th>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">0–2</td>
<td align="left" rowspan="1" colspan="1">Not worth a mention</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">2–6</td>
<td align="left" rowspan="1" colspan="1">Positive</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">6–10</td>
<td align="left" rowspan="1" colspan="1">Strong</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">>10</td>
<td align="left" rowspan="1" colspan="1">Very strong</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p><italic>Unlike hypothesis testing, there is no single criterion but graded levels of evidence to support one of two hypotheses</italic>
.</p>
</table-wrap-foot>
</table-wrap>
<p>While Experiment 1 was designed to test whether human participants can learn the structure of an abstract spatio-temporal relationship at all, it tells us little about how fast they can learn this relationship, how resilient they are while switching to a new relationship, and whether they acquire different models at different rates. We addressed these issues in a second experiment, and describe methods for Experiment 2 below.</p>
</sec>
</sec>
<sec><title>Experiment 2</title>
<sec><title>Design and procedure</title>
<p>Thirty-one naïve paid participants provided written informed consent to perform the same task as Experiment 1. The data from three participants who matched the poor hit rate criteria in Experiment 1 were excluded from the analysis. Unlike Experiment 1, feedback of errors was given on each trial and the full domain of the generating function was uniformly and independently sampled in the presented range. The generating functions used were constant, linear and quadratic spanning the same ranges as in Experiment 1 (in the equations below, x represents lateral stimulus position in mm, functions f(x) represent the stimulus onset-timing relative to trial initiation, in ms).</p>
<disp-formula id="E3"><mml:math id="M3"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext>Constant</mml:mtext>
<mml:mo>:</mml:mo>
<mml:mtext> </mml:mtext>
<mml:msub><mml:mi>f</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>1350</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext>Linear </mml:mtext>
<mml:msub><mml:mi>f</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>7</mml:mn>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1350</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext>Quadratic </mml:mtext>
<mml:msub><mml:mi>f</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>0.15</mml:mn>
<mml:msup><mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mn>0.1</mml:mn>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>2100</mml:mn>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>During the course of the experiment, the generating function was switched twice, once after 100 trials and then after 250 trials. Participants were divided into four groups (after exclusions, there were seven participants in the first two groups, eight participants in the third group and six in the last group) with various kinds of switches (Figures <xref ref-type="fig" rid="F2">2</xref>
, <xref ref-type="fig" rid="F3">3</xref>
).</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p><bold>Function learning in Experiment 2</bold>
. <bold>(A)</bold>
 The three generating functions used in this experiment were presented in different serial orders of presentation to the four groups. <bold>(B)</bold>
 The average posterior probabilities (foreground bars) for the most-likely function corresponding to participants' responses within a moving-window of 50 trials are shown over the course of the experiment. The generating function that was used at a certain trial is indicated by the presence of a background color. The four panels represent averages of the four groups with different presentation orders for the generating functions.</p>
</caption>
<graphic xlink:href="fncom-08-00121-g0002"></graphic>
</fig>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p><bold>Parameter learning in Experiment 2</bold>
. <bold>(A–D)</bold>
 The four panels represent the four different groups of participants. Each panel consists of three sub-parts, each indicating the parameters for the terms of the model when the corresponding generating functions were presented (note that the parameters have different scales and dimensions). Dashed lines represent the values of the generating function. All other lines represent the maximum likelihood estimate (MLE) of the parameters in a moving window of 50 trials. Thin lines represent individual participants and thick lines represent averages. The lines pertain to the parameters of the presented generating function and therefore there may be discontinuities in the lines at the switches. All participants' data are plotted irrespective of whether or not the generating function was the best description of the participants' responses.</p>
</caption>
<graphic xlink:href="fncom-08-00121-g0003"></graphic>
</fig>
</sec>
<sec><title>Data analysis and simulation</title>
<p>We analyzed both the model-level and parameter-level of analysis in moving windows of 50 trials (451 window frames for the 500 trials in total). This window-size was chosen based on a trade-off between the reduced power in smaller window sizes and loss of temporal resolution in larger ones. Rates of acquisition were calculated for 0.33, 0.5, 0.66, and 0.99 threshold probabilities of selection based on the posterior probabilities obtained from the <italic>Model level of analysis</italic>
.</p>
</sec>
<sec><title>Control analysis</title>
<p>To control for whether the pattern of results was an inadvertent consequence of our analysis, we simulated the responses of a participant with noisy responses but perfect knowledge of the switches between models, the generating functions, and their corresponding parameters. Noise in responses plays an important role in such analyses and therefore we added different levels of zero-mean white Gaussian noise (sd: 10, 30, 50, 100, 150, 200, 400 ms) using Monte Carlo methods to the simulated participant's responses. We then performed a moving-window analysis identical to that used for the actual participants' data, in order to obtain the posterior probability of each model in each window frame. We found that even with different noise levels, the posteriors and acquisition times were almost identical as long as the maximum likelihood estimate (not the actual noise level) for the noise level was recalculated in every window (451 frames for 500 trials) for each run (total Monte Carlo runs = 500). In Figure <xref ref-type="fig" rid="F4">4</xref>
, we use a simulated dataset with standard deviation 10 ms, which was the uncertainty caused by the refresh rate of the monitor.</p>
<fig id="F4" position="float"><label>Figure 4</label>
<caption><p><bold>Comparisons of function acquisition times in Experiment 2</bold>
. Comparison of <bold>(A)</bold>
 participants' data and <bold>(B)</bold>
 average responses from simulated participants with noisy responses (see Materials and Methods for various noise values) for different switches of generating functions. The ordinate represents acquisition time, i.e., the number of trials taken to achieve a selection probability of 0.33. Average acquisition times of each model for <bold>(C)</bold>
 participant's data and <bold>(D)</bold>
 for responses from a simulated participant. <bold>(E)</bold>
 Average function acquisition times for participants' data as a function of the selection threshold at which the acquisition time is determined. Error bars represent standard error across participants.</p>
</caption>
<graphic xlink:href="fncom-08-00121-g0004"></graphic>
</fig>
</sec>
<sec><title>Bayesian model selection</title>
<p>We used the Bayesian Model Selection framework to obtain the posterior probabilities of each model within a window frame. We computed the marginal distributions for each model by analytically integrating out any parameters from lower levels in the hierarchy. By using Bayes' rule, and assuming prior distributions (described below), we obtained posterior distributions for the models given the data.</p>
<p>The data that we obtained from participants were first cast as a multivariate Gaussian likelihood distribution (Equation 1). Here, Λ constituted a diagonal matrix where the precision (1/variance) of each element was specified by <italic>l, observation noise</italic>
. Since we did not always have access to the value of <italic>l</italic>
, we numerically marginalized this parameter (range of 0–5000 ms), assuming a uniform prior over this entire range. Therefore, our implementation of the BMS algorithm had no free parameters.</p>
<list list-type="simple"><list-item><p>(1) The Likelihood of the models and parameters given the data for Experiment 2:
<disp-formula id="E4"><mml:math id="M4"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>M</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mo>|</mml:mo>
<mml:mi>Λ</mml:mi>
<mml:msup><mml:mo>|</mml:mo>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>π</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>n</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:msup><mml:mi>e</mml:mi>
<mml:mrow><mml:mo>−</mml:mo>
<mml:mfrac><mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mi>Λ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtable columnalign="left"><mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mi>n</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Number of observations</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mi>m</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Numberof parameters in </mml:mtext>
<mml:msup><mml:mi>j</mml:mi>
<mml:mrow><mml:mtext>th</mml:mtext>
</mml:mrow>
</mml:msup>
<mml:mtext> model</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow><mml:msub><mml:mi>M</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:msup><mml:mi>j</mml:mi>
<mml:mrow><mml:mi>th</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mtext> model</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>where </mml:mtext>
<mml:mi>j</mml:mi>
<mml:mtext> = 1</mml:mtext>
<mml:mo>−</mml:mo>
<mml:mtext>3 models under</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow></mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>consideration: constant, linear and quadratic</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<disp-formula id="E5"><mml:math id="M5"><mml:mrow><mml:mtable columnalign="left"><mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow><mml:mi>X</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup><mml:mi>ℝ</mml:mi>
<mml:mi>n</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Stimuli (lateral position)</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow><mml:mi>Y</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup><mml:mi>ℝ</mml:mi>
<mml:mi>n</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Observation vector (responses)</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mi>F</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Generating function</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mi>Λ</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Precision matrix for multivariate Gaussian</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow></mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Likelihood</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow><mml:msub><mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Parameter space for each model</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
</list>
<p>We defined priors over each of the parameters of the three generating functions (Equation 2). For simplicity, the Gaussian priors for the parameters were assumed to be independent of each other and centered at a mean value far from the true parameter value with their standard deviation scaled to an order of magnitude larger than their means. Thus, BMS did not have correct <italic>a priori</italic>
 information about the true mean of the parameters. It is important to note that the role of the prior in BMS is negligible in comparison to that of the marginal likelihood under these circumstances i.e., the mean value of the prior, due to its inflated variance hardly affects the outcome.</p>
<list list-type="simple"><list-item><p>(2) Prior over parameters for Experiment 2:
<disp-formula id="E6"><mml:math id="M6"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>M</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true"><mml:munderover><mml:mo>∏</mml:mo>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:mrow><mml:mfrac><mml:mrow><mml:msqrt><mml:mrow><mml:msub><mml:mi>ω</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
<mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>π</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
<mml:msup><mml:mi>e</mml:mi>
<mml:mrow><mml:mo>−</mml:mo>
<mml:mfrac><mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub><mml:mi>μ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:msub><mml:mi>ω</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msup>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtable columnalign="left"><mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow><mml:msub><mml:mi>ω</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>Precision of the </mml:mtext>
<mml:msup><mml:mi>i</mml:mi>
<mml:mrow><mml:mtext>th</mml:mtext>
</mml:mrow>
</mml:msup>
<mml:mtext> parameter distribution </mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>1/variance</mml:mtext>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left"><mml:mtd columnalign="left"><mml:mrow></mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mrow><mml:mtext>of the </mml:mtext>
<mml:msup><mml:mi>j</mml:mi>
<mml:mrow><mml:mtext>th</mml:mtext>
</mml:mrow>
</mml:msup>
<mml:mtext> model</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>μ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd><mml:mrow><mml:mtext>Mean of </mml:mtext>
<mml:msup><mml:mi>i</mml:mi>
<mml:mrow><mml:mtext>th</mml:mtext>
</mml:mrow>
</mml:msup>
<mml:mtext> parameter distribution </mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>θ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd><mml:mrow><mml:msup><mml:mi>i</mml:mi>
<mml:mrow><mml:mtext>th</mml:mtext>
</mml:mrow>
</mml:msup>
<mml:mtext> parameter vector of </mml:mtext>
<mml:msup><mml:mi>j</mml:mi>
<mml:mrow><mml:mtext>th</mml:mtext>
</mml:mrow>
</mml:msup>
<mml:mtext> model where </mml:mtext>
<mml:mi>j</mml:mi>
<mml:mtext>=1</mml:mtext>
<mml:mo>−</mml:mo>
<mml:mtext>3</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtable><mml:mtr><mml:mtd><mml:mrow></mml:mrow>
</mml:mtd>
<mml:mtd><mml:mrow><mml:mtext>models under consideration: constant, linear and</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtable><mml:mtr><mml:mtd><mml:mrow></mml:mrow>
</mml:mtd>
<mml:mtd><mml:mrow><mml:mtext>quadratic</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
</list-item>
</list>
<p>For the posterior probability of models described in Equation (3), we first calculated the marginal likelihood by integrating the product of the likelihood from Equation (1) with the parameter priors from Equation (2).</p>
<list list-type="simple"><list-item><p>(3) Marginal Likelihood for Experiment 2:
<disp-formula id="E7"><mml:math id="M7"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true"><mml:mrow><mml:msub><mml:mo>∫</mml:mo>
<mml:mrow><mml:msub><mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>…</mml:mo>
</mml:mrow>
</mml:mstyle>
<mml:mstyle displaystyle="true"><mml:mrow><mml:msub><mml:mo>∫</mml:mo>
<mml:mrow><mml:msub><mml:mi>θ</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:mstyle>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext>                                         </mml:mtext>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo>
<mml:mi>d</mml:mi>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>…</mml:mo>
<mml:mi>d</mml:mi>
<mml:msub><mml:mi>θ</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
</list-item>
</list>
<p>After each parameter was integrated out analytically, we used the marginal likelihood to compute the posterior probability of each model given the data (Equation 4). Here the prior probability of each model being selected was assumed to be uniform (equi-probable), therefore no one model was more likely than the other <italic>a priori</italic>
.</p>
<list list-type="simple"><list-item><p>(4) Posterior Distribution of models in Experiment 2:
<disp-formula id="E8"><mml:math id="M8"><mml:mrow><mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>∑</mml:mo>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mn>3</mml:mn>
</mml:msubsup>
<mml:mrow><mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:msub><mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:msub><mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
</list>
</sec>
<sec><title>Parameter estimation</title>
<p>In addition to analyzing the model posteriors in Experiment 2, we also analyzed the learning of the parameters over the course of the experiment. We obtained maximum likelihood estimates of each parameter in each model given the observation noise that maximized the marginals in each window frame, obtained through Bayesian model selection. The search space for parameters used to determine the likelihood distributions (two orders of magnitude larger than actual parameter dimensions) for terms of various degrees were as follows, constant term: −10,000:10,000 ms, linear term: −100:100 ms.mm<sup>−1</sup>
 quadratic term: −10:10 ms.mm<sup>−2</sup>
. The maximum likelihood estimates were calculated at each frame of a window of 50 trials, and thus computed over the course of the experiment. These curves were smoothed using a univariate Gaussian kernel with a radius of 5 data points.</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3"><title>Results</title>
<sec><title>Experiment I</title>
<p>After being shown a brief spatial cue, participants were asked to respond by firing an animated bullet, aiming to hit a transient target en route. Unbeknownst to them, the location of the cue and the timing of the target constituted a functional relationship (linear, quadratic, or cubic; Figure <xref ref-type="fig" rid="F1">1B</xref>
) and the purpose of the experiment was to train them on certain regions of this function and test transfer to other regions. The participants' responses (lines in Figure <xref ref-type="fig" rid="F1">1C</xref>
) seem to approximate most of the presented functions. In Figure <xref ref-type="fig" rid="F1">1D</xref>
 we see an example of some models and heuristics of the quadratic and cubic sessions being fit to the responses from the training regions obtained from a representative participant. To assess model performance, the likelihoods of the various models were calculated given the <italic>test</italic>
 stimuli and participants' test responses, based on MLE estimates of parameters from the training regions. The best model or heuristic would maximize the likelihood of the predictive distributions when given the participants' test responses i.e., it would minimize the negative log likelihood. None of the models or heuristics enjoy an advantage due to having more parameters since they were fitted on regions that were different than those they were tested on. This procedure uses a principle similar to cross-validation techniques. The minimum negative log likelihood (summed over trials and participants) that was obtained in all three cases was that of the generating function, indicating that it was the best model in all cases. But this does not necessarily imply that it was an unequivocal winner. To understand the relative performances of the models, we subtract the summed negative log likelihood of the generating function model from that of each other model (Figure <xref ref-type="fig" rid="F1">1E</xref>
).</p>
<p>Another measure that we use to quantify these differences is the Bayes factor (reported atop bars in Figure <xref ref-type="fig" rid="F1">1E</xref>
). We calculate the ratio of evidence (Marginal likelihood) between the generating function and each other model or heuristic to obtain the Bayes factor (details in Materials and Methods). We report twice the log of the Bayes factor for each heuristic and model in Figure <xref ref-type="fig" rid="F1">1E</xref>
 where any value greater than 2 indicates positive evidence for rejecting that hypothesis in favor of the generating function (Table <xref ref-type="table" rid="T1">1</xref>
, also see Kass and Raftery, <xref rid="B24" ref-type="bibr">1995</xref>
). We find that the differences between predictions from the <italic>higher degree</italic>
 polynomial regression and those from regressing with the generating function are negligible for the linear and quadratic sessions. For all three sessions, participants' performance is best predicted by regression with polynomial functions of degree equal to or higher than the generating function (Figure <xref ref-type="fig" rid="F1">1E</xref>
). For all sessions, we find that the <italic>extrapolation</italic>
 and <italic>interpolation</italic>
 predictions, predictions from lower degree polynomials and predictions of Gaussian process regression do not perform as well as the generating function. Based on these results, one can conclude that our participants' performance is either consistent with the use of the veridical function or a higher degree polynomial suggesting that function abstraction may have taken place.</p>
<p>We had another source of information to determine whether participants had abstracted the generating function. After the experiment, participants were asked whether they were able to formulate a reliable strategy and if they indicated that there was a relationship between the lateral target position and its timing, they were asked to identify the nature of these relationships in the order in which they were presented them from a panel of 16 cartoons (Figure <xref ref-type="supplementary-material" rid="SM1">S1</xref>
, Supplementary Information). All six participants identified the existence of a relationship in the stimuli and also identified the linear and quadratic relationship in the correct order. Four of these picked a positively sloped line for the cubic condition and two participants chose a constant line for their choice of generating function in the cubic session. Their selection of a positively sloped line for the cubic session also seems to correspond with what can be seen in the averaged responses of participants for the cubic session in Figure <xref ref-type="fig" rid="F1">1C</xref>
. On the other hand, in Figure <xref ref-type="fig" rid="F1">1E</xref>
, we find that a cubic model best predicts the responses of the participants in the cubic session. It must be noted, however, that the fitted cubic functions do not resemble the true generating function in the stimuli. The participants' selection of a straight line in the questionnaire after the experiment, may suggest that the information we provided about the cubic was too noisy for them to be certain of its true shape and a line best described the observations. Alternatively, they might have been using a model that we did not test. We speculate that participants' inability to reproduce the actual cubic function may be due to the very limited extent to which we exposed the cubic generating function in the training regions. This seems quite plausible since we know from other studies that observations of target times and positions can be noisy (van Beers et al., <xref rid="B35" ref-type="bibr">1999</xref>
; Ernst and Banks, <xref rid="B14" ref-type="bibr">2002</xref>
; Alais and Burr, <xref rid="B2" ref-type="bibr">2004</xref>
; Brenner and Smeets, <xref rid="B8" ref-type="bibr">2007</xref>
; Faisal et al., <xref rid="B16" ref-type="bibr">2008</xref>
; Maij et al., <xref rid="B28" ref-type="bibr">2009</xref>
; Jazayeri and Shadlen, <xref rid="B21a" ref-type="bibr">2010</xref>
; Narain et al., <xref rid="B31" ref-type="bibr">2013b</xref>
).</p>
</sec>
<sec><title>Experiment II</title>
<p>We found in Experiment 1 that polynomial regression, often with the generating function, best explains the participants' responses in most cases. Furthermore, upon post-experiment questioning, participants correctly identified the shapes of the linear and quadratic functions. This shows that the performance of our participants was most consistent with an abstracted function, especially in the linear and quadratic cases. The structure-learning framework when applied to this problem would constitute a hierarchy between the function and its parameters, whereby the abstraction of a function would constrain the learning of its parameters. To study whether function learning and parameter learning can coexist, we performed an experiment (Experiment 2) where we surreptitiously changed the function over the course of the experiment.</p>
<p>We analyzed the time course of function-learning and parameter-learning independently for our participants. We determined analytical solutions for the marginal likelihoods specific to our generating functions using the Bayesian Model Selection framework (Raftery, <xref rid="B32" ref-type="bibr">1995</xref>
; Wasserman, <xref rid="B36" ref-type="bibr">2000</xref>
; Burnham and Anderson, <xref rid="B9" ref-type="bibr">2002</xref>
). We thus obtained posterior probabilities of each function, which were independent of the values of their parameters. This allowed us to determine the time course of function acquisition. These posterior probabilities were directly inferred from the participants' responses and indicated the probability of functions that were most likely to be used by the participants in generating their responses. Therefore, we took these posterior probabilities to reflect the functions that participants abstracted. By computing these posteriors over a moving window, we were able to obtain an insight into how participants relearned functions after we administered a sudden switch. Figure <xref ref-type="fig" rid="F2">2B</xref>
 shows that participants were able to switch to the changes in the generating function. The background colors in Figure <xref ref-type="fig" rid="F2">2B</xref>
 indicate the generating function and we see the average posterior probabilities in the foreground. In order to show the variability of participants within each group, we calculated the mean posterior value for each participant's data for all three models while that model determined the stimulus (including the first 100 trials where learning may have been incomplete). We then averaged this mean value of the posterior for each model across participants in each group (Supplementary Information Figure <xref ref-type="supplementary-material" rid="SM1">SI-2</xref>
). Note that these represent the extent of model selection and not the rate of acquisition shown in subsequent figures.</p>
<p>Figure <xref ref-type="fig" rid="F3">3</xref>
 shows the results of the parameter-level of analysis, which in combination with Figure <xref ref-type="fig" rid="F2">2</xref>
, provides us with the complete picture of our participants' learning behavior. While the parameter estimates are calculated for each function, it is worth noting that higher order functions can fully mimic simpler functions whereas the reverse is not true. In Figure <xref ref-type="fig" rid="F3">3</xref>
, the dashed lines represent the true value of the parameters for the three models and the background colors display the generating functions that were presented over the course of the experiment. We see here that the averages across participants (thick lines) begin to converge toward the dashed lines after the switches occur, indicating that the parameters selected by the participants were close to the veridical parameter values. Please note that different parameter values differ in dimension and impact the shape of the function in different ways. For instance, small fluctuations in the quadratic term, lead to large changes in the convexity or concavity of the quadratic function. Most importantly, we see that the parameter values for participants do not always reach the veridical values, while the function selection probabilities remain high. Bayesian model selection can only incur high posterior probabilities for a certain function when the responses and noise-level indicate it to be a clear winner. Therefore, function abstraction is possible even when parameter learning is not fully accurate.</p>
<p>Furthermore, since Bayesian model selection calculates posterior probabilities for all models simultaneously and compensates inherently for their complexity, it allows us to perform Bayesian model comparison on all three functions simultaneously. We find that posteriors calculated for participant's responses correspond to high selection probabilities for the simplest model that is viable for the data (Figure <xref ref-type="fig" rid="F2">2B</xref>
). For example, in selecting the linear model, the parameters of both the linear and quadratic models yield the same result. In fact, had there been serious non-linearities in participants' responses, the posteriors would have been highest for the quadratic function. We find in Figure <xref ref-type="fig" rid="F2">2B</xref>
, that there is a remarkable coincidence in high selection probabilities for the function presented. This is independently confirmed by the parameter values in Figure <xref ref-type="fig" rid="F3">3</xref>
. This selection of the simplest most viable model when other functions were equally plausible is consistent with the Occam's razor principle. One could argue however, that these results are symptomatic of the fact that Bayesian model selection has an internal penalty on its marginals that controls for complexity. To address this issue, in the following paragraphs we discuss an independent measure to test whether human learning of function obeys the Occam's razor.</p>
<p>Figure <xref ref-type="fig" rid="F2">2B</xref>
 suggests that the time taken by participants to learn different functions varies. As we reasoned above, if we can establish that the selection of simpler models that are viable is also facilitated over the selection of more complex models, this would be further support for the view that learning is consistent with the Occam's razor principle. If we observe the acquisition time for the quadratic function in Group 2 (Figure <xref ref-type="fig" rid="F2">2B</xref>
), and compare it to the opposite transition, the switch to the constant function from the quadratic in Group 4, we find the latter to be more rapid. To better quantify these acquisition times, we organized the various pairs of switches on the basis of function complexity, i.e., constant to linear vs. linear to constant, and so forth, and calculated the average time for which the posterior probability for the correct model for each participant reaches a certain threshold (<italic>p</italic>
 = 0.33; chance level for model selection).</p>
<p>The results in Figure <xref ref-type="fig" rid="F4">4A</xref>
 show that participants acquired simpler models faster than complex models. To interpret these results and to ensure that the pattern observed in the data was not merely an artifact created by elements of the analysis or algorithm, we simulated responses generated by a simulated participant who had perfect knowledge of the location of switches and the nature of the models and parameters. We added various levels of noise to the simulated participant's responses (not shown in Figure <xref ref-type="fig" rid="F4">4</xref>
, details in Materials and Methods) and then subjected these responses to the same windowed analysis as that used for our participants in Figure <xref ref-type="fig" rid="F2">2B</xref>
. We considered the acquisition times obtained from the simulated participant as a baseline to interpret actual participants' data. In Figure <xref ref-type="fig" rid="F4">4B</xref>
, we observe that the baseline is biased toward selecting a complex model more rapidly than selecting a simple one. This bias is introduced due to our use of the moving-window analysis. When the window over which we analyze our data moves over a transition-point between two models, it contains a mixture of responses from two different models. Of the two, the model with higher degrees of freedom is likely to distinguish itself more prominently than the simpler model and is therefore likely to be selected faster by the algorithm. For example, 15 independent samples from a quadratic are likely to provide strong evidence of non-linearity even when 15 remaining samples suggest a noisy linear trend. The moving-window analysis therefore invariably creates an asymmetry in the acquisition times such that more complex models are selected faster. Please note that the asymmetry found in the data of human participants was exactly the opposite in nature (Figure <xref ref-type="fig" rid="F4">4A</xref>
).</p>
<p>For the participants, the acquisition of complex models was slower than that of simpler ones, which can be seen in the averaged plots in Figure <xref ref-type="fig" rid="F4">4C</xref>
. Treating the model acquisition times from the simulated participant responses as a baseline criterion, we performed two-tailed <italic>t</italic>
-tests (since the effect could be in either direction) with the differences obtained from the participants' data (Figures <xref ref-type="fig" rid="F4">4A,B</xref>
). We found all differences to be significantly different from those obtained from the baseline [LC-CL <italic>t</italic>
<sub>(5)</sub>
 = −13.04, <italic>p</italic>
 < 0.01; QL-LQ <italic>t</italic>
<sub>(6)</sub>
 = −9.39, <italic>p</italic>
 < 0.001; QC-CQ <italic>t</italic>
<sub>(5)</sub>
 = −7.80, <italic>p</italic>
 < 0.001]. In all cases, the difference in acquisition times for the transitions to complex models were longer than those for the transitions to simper models. To determine whether the pattern of results from the participants' data depended upon a certain <italic>selection probability</italic>
, i.e., the posterior probability at which we compute the acquisition time, we repeated the above analysis for different threshold criteria (Figure <xref ref-type="fig" rid="F4">4E</xref>
). We found that the result that the acquisition for transitions to simpler functions is facilitated over transitions to complex ones, holds for different selection probabilities. Therefore, not only is the simplest most viable function selected to the greatest extent by the participants, its acquisition is also facilitated over that for more complex models. These results in combination are suggestive that function acquisition obeys the Occam's razor principle.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4"><title>Discussion</title>
<p>In this paper, we propose structure learning as a candidate framework to explain human function learning. We make our case by tackling three issues central to function learning. (1) Can humans abstract functions? (2) Are rule-based learning and associative learning mutually exclusive? (3) Do humans employ principles of parsimony, like the Occam's razor while learning various functions?</p>
<sec><title>Abstraction</title>
<p>Early accounts of function learning claim that humans are capable of abstracting continuous functions (Carroll, <xref rid="B11" ref-type="bibr">1963</xref>
; Bedford, <xref rid="B3" ref-type="bibr">1989</xref>
; Koh and Meyer, <xref rid="B27" ref-type="bibr">1991</xref>
). Recent accounts have shifted the focus from testing performance in humans to the trial-by-trial learning of the generating function presented (Kalish et al., <xref rid="B22" ref-type="bibr">2007</xref>
; McDaniel et al., <xref rid="B29" ref-type="bibr">2009</xref>
). In Experiment 1, we generate predictions that are compatible with five different function learning algorithms in the literature. When we tested both the participants' responses in the test regions we found that polynomial regression of the veridical model or higher orders, best explained the data. We assumed that all these algorithms consistently applied a single principle during interpolation and extrapolation of the function and therefore, we could not directly test an important function learning algorithm, EXAM, which assumes associational learning for interpolation but linear extrapolation. However, the predictions of EXAM would be consistent with the extrapolation heuristic for the linear and cubic functions, and for the quadratic function, they would be most consistent with the interpolation heuristic (see McDaniel et al., <xref rid="B29" ref-type="bibr">2009</xref>
). For our data, neither of these heuristics performed better than the generating functions.</p>
<p>Intuitively, one would expect human behavior to reflect some continuity of prediction across the entire domain of the functions that humans experience and predict in. For example, one could expect the variance of human predictions to gradually increase as the location of extrapolation moves further away from the last observed point, but one would not expect an abrupt jump in the prediction variance just outside the experienced range of stimuli. Such discontinuities sometime occur while implementing POLE and EXAM. To address this problem, Gaussian processes may prove a good alternative both (Griffiths et al., <xref rid="B21" ref-type="bibr">2008</xref>
). Gaussian process regression has elements of rule-based and instance-based learning, and embodies a unifying principle and suffers no discontinuities in its predictions. It is also compatible with neural network implementations and therefore enjoys a plausible description at a biological level. What we achieve through Bayesian model selection can be achieved by assuming different polynomial Gaussian process kernels, however, this would also require the explicit assumption that the function is abstracted, and therefore amounts to the same argument that is made by us through BMS. Gaussian processes with a generic kernel (squared exponent) could not however explain the performance of our participants convincingly, suggesting that there may be some value in assuming abstracted basis functions.</p>
<p>It has been thought that polynomial and heuristic regression models are indistinguishable since they make predictions that cannot be teased apart, however, with adequate statistical power, the use of likelihoods and quantifying these results using the Bayes factor, one can sensitively discriminate the predictions of these models. As is the case with all multiple model comparisons, there can be no certainty about whether participants were using a model that was not included in the hypothesis set. However, within the hypothesis set considered, which in our opinions spans all major algorithms in function-learning research, polynomial regression, especially with the veridical function, emerges as the best candidate. These results in combination with the results from the questionnaires after the experiments imply that either the veridical function, or a function of higher order was abstracted by the participants.</p>
</sec>
<sec><title>Rule-based learning vs. associative learning</title>
<p>In the structure learning account, rule-based learning and associative learning can in fact coexist, just like we demonstrate in Experiment 2 that high posterior selection probabilities can occur even when the parameter values are not fully learned. One possibility to reconcile these two views is that associative learning may take place until a function is abstracted, and thereafter the function can constrain the search for parameters in a rule-based manner. This may indicate a new interpretation of results in previous function learning studies; the inability to recreate a presented function exactly does not imply the lack of an understanding of that function.</p>
<p>One interesting insight that has emerged from recent data is that the manner in which a function is exposed to participants may influence whether they abstract a rule or whether they merely learn an associative map. Fulvio et al. (<xref rid="B17" ref-type="bibr">2013</xref>
) propose that sparse sampling leads to associative mapping whereas dense sampling may lead to the abstraction of a function. Narain et al. (<xref rid="B30" ref-type="bibr">2013a</xref>
) found that participants could not learn a function when it was exposed to them locally, whereas they rapidly learned it when the sampling was uncorrelated and dense. Ernst and van Dam (<xref rid="B15" ref-type="bibr">2010</xref>
) found a lack of linear extrapolation behavior in a shape association task, however they used sparse sampling. All these results suggest that density and serial correlation of sampling may play a role in whether learning is rule-based or associative.</p>
</sec>
<sec><title>Occam's razor</title>
<p>It has long been reported in literature that there is a primacy of linear functions in human function learning (Carroll, <xref rid="B11" ref-type="bibr">1963</xref>
; Deane et al., <xref rid="B12" ref-type="bibr">1972</xref>
; Brehmer, <xref rid="B7" ref-type="bibr">1974</xref>
; Bedford, <xref rid="B3" ref-type="bibr">1989</xref>
). In other words, humans learn linear functions faster than non-monotonic non-linear functions. It has also been shown that across different blocks, more complex functions take longer to be learned (McDaniel and Busemeyer, <xref rid="B38" ref-type="bibr">2005</xref>
).</p>
<p>In Experiment 2, we study the acquisition of functions of different complexity as they change surreptitiously over time. This is the first time that a function learning study has been performed in a non-stationary environment. Our use of Bayesian model selection allows us to perform a parallel comparison among the three functions without any fear of biased outcomes due to overfitting with more complex functions. The results demonstrate that even when other models could explain the data equally well (i.e., for the constant case, a linear and quadratic model can mimic a constant model), The selection probabilities were highest for the simplest model (Figure <xref ref-type="fig" rid="F2">2</xref>
). Please note that this result stands out especially because we implement BMS without any free parameters and use an equal <italic>a priori</italic>
 selection probability for each model.</p>
<p>The data in Experiment 2 was analyzed over a sliding window of 50 trials. This window-size was selected because it traded-off reliable posterior calculations with the minimum loss of temporal resolution. This does not take the possibility that participants may have switched models multiple times during the transition, into account. Unfortunately, at smaller window sizes estimating the posterior becomes unreliable due to noise (Supplementary Information Figure <xref ref-type="supplementary-material" rid="SM1">SI-3</xref>
), although the overall pattern remains the same as the fifty-trial window. We may not be able to pinpoint the exact switch point but with our current analysis and by testing multiple thresholds, we can generate a consistent and reliable measure for the model acquisition rate of each model and participant.</p>
<p>In addition to selecting the simplest, most-viable function, the simpler functions also enjoyed faster acquisition rates for different switch combinations (Figure <xref ref-type="fig" rid="F4">4</xref>
). Therefore, simpler functions have an advantage both in the extent of selection and also in their rates of selection. Both these independent measures suggest that the learning of simpler models is preferred over that of equally viable complex models. The Occam's razor principle states that when multiple models are equally capable of explaining a dataset, the simplest one should be selected. Recent studies have revealed that humans exhibit Occam's razor-like parsimony in perceptual and sensorimotor learning tasks (Gershman, <xref rid="B19" ref-type="bibr">2013</xref>
; Genewein and Braun, <xref rid="B18" ref-type="bibr">2014</xref>
). If the Occam's razor is a pervasive principle in human perception and model learning, it may explain the previous findings on primacy of linear function over non-linear functions, since linear models would have low parametric complexity. Such a unifying principle could help us better understand how humans abstract and learn functions among variables in the world.</p>
<p>In summary, we propose that structure learning may serve as a unifying framework for function learning. We believe this to be a viable framework due to the results in Experiment 1 where function abstraction seems most likely. The separation of function learning and parameter learning makes room for abstraction, rule-based learning as well as associative accounts. Further, it helps us to understand human perceptual principles free of algorithm-induced biases and in a massively parallel manner. In this study, it has led to the revelation that human function learning obeys the Occam's razor principle.</p>
</sec>
<sec><title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back><ack><p>We would like to thank Daniel Braun, Loes van Dam, and Jacqueline Fulvio for insightful discussions. Funding for this research was provided by the European Community's Seventh Framework Programme FP7/2007-2013 (grant agreement number 214728-2), and by the VICI grant (453-08-004) from the Netherlands Organization for Scientific Research grants.</p>
</ack>
<sec sec-type="supplementary-material" id="s5"><title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://www.frontiersin.org/journal/10.3389/fncom.2014.00121/abstract">http://www.frontiersin.org/journal/10.3389/fncom.2014.00121/abstract</ext-link>
</p>
<supplementary-material content-type="local-data" id="SM1"><media xlink:href="DataSheet1.PDF"><caption><p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
<ref-list><title>References</title>
<ref id="B1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Acuña</surname>
<given-names>D. E.</given-names>
</name>
<name><surname>Schrater</surname>
<given-names>P.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>Structure learning in human sequential decision-making</article-title>
. <source>PLoS Comput. Biol</source>
. <volume>6</volume>
:<fpage>e1001003</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1001003</pub-id>
<pub-id pub-id-type="pmid">21151963</pub-id>
</mixed-citation>
</ref>
<ref id="B2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alais</surname>
<given-names>D.</given-names>
</name>
<name><surname>Burr</surname>
<given-names>D.</given-names>
</name>
</person-group>
 (<year>2004</year>
). <article-title>The ventriloquist effect results from near-optimal bimodal integration</article-title>
. <source>Curr. Biol</source>
. <volume>14</volume>
, <fpage>257</fpage>
–<lpage>262</lpage>
<pub-id pub-id-type="doi">10.1016/j.cub.2004.01.029</pub-id>
<pub-id pub-id-type="pmid">14761661</pub-id>
</mixed-citation>
</ref>
<ref id="B3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bedford</surname>
<given-names>F. L.</given-names>
</name>
</person-group>
 (<year>1989</year>
). <article-title>Constraints on learning new mappings between perceptual dimensions</article-title>
. <source>J. Exp. Psychol. Hum. Percept. Perform</source>
. <volume>15</volume>
, <fpage>232</fpage>
–<lpage>248</lpage>
<pub-id pub-id-type="doi">10.1037/0096-1523.15.2.232</pub-id>
<pub-id pub-id-type="pmid">7720360</pub-id>
</mixed-citation>
</ref>
<ref id="B4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braun</surname>
<given-names>D. A.</given-names>
</name>
<name><surname>Aertsen</surname>
<given-names>A.</given-names>
</name>
<name><surname>Wolpert</surname>
<given-names>D. M.</given-names>
</name>
<name><surname>Mehring</surname>
<given-names>C.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Motor task variation induces structural learning</article-title>
. <source>Curr. Biol</source>
. <volume>19</volume>
, <fpage>352</fpage>
–<lpage>357</lpage>
<pub-id pub-id-type="doi">10.1016/j.cub.2009.01.036</pub-id>
<pub-id pub-id-type="pmid">19217296</pub-id>
</mixed-citation>
</ref>
<ref id="B5"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braun</surname>
<given-names>D. A.</given-names>
</name>
<name><surname>Mehring</surname>
<given-names>C.</given-names>
</name>
<name><surname>Wolpert</surname>
<given-names>D. M.</given-names>
</name>
</person-group>
 (<year>2010a</year>
). <article-title>Structure learning in action</article-title>
. <source>Behav. Brain Res</source>
. <volume>206</volume>
, <fpage>157</fpage>
–<lpage>165</lpage>
<pub-id pub-id-type="doi">10.1016/j.bbr.2009.08.031</pub-id>
<pub-id pub-id-type="pmid">19720086</pub-id>
</mixed-citation>
</ref>
<ref id="B6"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braun</surname>
<given-names>D. A.</given-names>
</name>
<name><surname>Waldert</surname>
<given-names>S.</given-names>
</name>
<name><surname>Aertsen</surname>
<given-names>A.</given-names>
</name>
<name><surname>Wolpert</surname>
<given-names>D. M.</given-names>
</name>
<name><surname>Mehring</surname>
<given-names>C.</given-names>
</name>
</person-group>
 (<year>2010b</year>
). <article-title>Structure learning in a sensorimotor association task</article-title>
. <source>PLoS ONE</source>
<volume>5</volume>
:<fpage>e8973</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0008973</pub-id>
<pub-id pub-id-type="pmid">20126409</pub-id>
</mixed-citation>
</ref>
<ref id="B7"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brehmer</surname>
<given-names>B.</given-names>
</name>
</person-group>
 (<year>1974</year>
). <article-title>Hypotheses about relations between scaled variablesin the learning of probabilistic inference tasks</article-title>
. <source>Organ Behav. Hum. Perform</source>
. <volume>11</volume>
, <fpage>1</fpage>
–<lpage>27</lpage>
<pub-id pub-id-type="doi">10.1016/0030-5073(74)90002-6</pub-id>
</mixed-citation>
</ref>
<ref id="B8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brenner</surname>
<given-names>E.</given-names>
</name>
<name><surname>Smeets</surname>
<given-names>J. B. J.</given-names>
</name>
</person-group>
 (<year>2007</year>
). <article-title>Flexibility in intercepting moving objects</article-title>
. <source>J. Vis</source>
. <volume>7</volume>
:<fpage>14</fpage>
<pub-id pub-id-type="doi">10.1167/7.5.14</pub-id>
<pub-id pub-id-type="pmid">18217854</pub-id>
</mixed-citation>
</ref>
<ref id="B9"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Burnham</surname>
<given-names>K. P.</given-names>
</name>
<name><surname>Anderson</surname>
<given-names>D. R.</given-names>
</name>
</person-group>
 (<year>2002</year>
). <source>Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach</source>
. <publisher-loc>New York, NY</publisher-loc>
: <publisher-name>Springer</publisher-name>
</mixed-citation>
</ref>
<ref id="B10"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Busemeyer</surname>
<given-names>J. R.</given-names>
</name>
<name><surname>Byun</surname>
<given-names>E.</given-names>
</name>
<name><surname>DeLosh</surname>
<given-names>E. L.</given-names>
</name>
<name><surname>McDaniel</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
 (<year>1997</year>
). <article-title>Learning functional relations based on experience with input-output pairs by humans and artificial neural networks</article-title>
, in <source>Knowledge, Concepts and Categories: Studies in Cognition</source>
, eds <person-group person-group-type="editor"><name><surname>Lamberts</surname>
<given-names>K.</given-names>
</name>
<name><surname>Shanks</surname>
<given-names>D. R.</given-names>
</name>
</person-group>
 (<publisher-loc>Cambridge, MA</publisher-loc>
: <publisher-name>MIT Press</publisher-name>
), <fpage>408</fpage>
–<lpage>437</lpage>
</mixed-citation>
</ref>
<ref id="B11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carroll</surname>
<given-names>J. D.</given-names>
</name>
</person-group>
 (<year>1963</year>
). <source>Functional Learning: The Learning of Continuous Functional Maps Relating Stimulus and Response Continua</source>
. Technical Report. Educational Testing Service RB 63-26. Princeton, NJ.</mixed-citation>
</ref>
<ref id="B12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Deane</surname>
<given-names>D. H.</given-names>
</name>
<name><surname>Hammond</surname>
<given-names>K. R.</given-names>
</name>
<name><surname>Summers</surname>
<given-names>D. A.</given-names>
</name>
</person-group>
 (<year>1972</year>
). <article-title>Acquisition and application of knowledge in complex inference tasks</article-title>
. <source>J. Exp. Psychol</source>
. <volume>92</volume>
, <fpage>20</fpage>
–<lpage>26</lpage>
<pub-id pub-id-type="doi">10.1037/h0032162</pub-id>
</mixed-citation>
</ref>
<ref id="B13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>DeLosh</surname>
<given-names>E. L.</given-names>
</name>
<name><surname>Busemeyer</surname>
<given-names>J. R.</given-names>
</name>
<name><surname>McDaniel</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
 (<year>1997</year>
). <article-title>Extrapolation: the sine qua non for abstraction in function learning</article-title>
. <source>J. Exp. Psychol. Learn. Mem. Cogn</source>
. <volume>23</volume>
, <fpage>968</fpage>
<pub-id pub-id-type="doi">10.1037/0278-7393.23.4.968</pub-id>
<pub-id pub-id-type="pmid">9231439</pub-id>
</mixed-citation>
</ref>
<ref id="B14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ernst</surname>
<given-names>M. O.</given-names>
</name>
<name><surname>Banks</surname>
<given-names>M. S.</given-names>
</name>
</person-group>
 (<year>2002</year>
). <article-title>Humans integrate visual and haptic information in a statistically optimal fashion</article-title>
. <source>Nature</source>
<volume>415</volume>
, <fpage>429</fpage>
<pub-id pub-id-type="doi">10.1038/415429a</pub-id>
<pub-id pub-id-type="pmid">11807554</pub-id>
</mixed-citation>
</ref>
<ref id="B15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ernst</surname>
<given-names>M. O.</given-names>
</name>
<name><surname>van Dam</surname>
<given-names>L. C. J.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>Mapping shape to visuomotor mapping: generalization to novel shapes</article-title>
. <source>J. Vis</source>
. <volume>10</volume>
, <fpage>1077</fpage>
<pub-id pub-id-type="doi">10.1167/10.7.1077</pub-id>
</mixed-citation>
</ref>
<ref id="B16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Faisal</surname>
<given-names>A. A.</given-names>
</name>
<name><surname>Selen</surname>
<given-names>L. P.</given-names>
</name>
<name><surname>Wolpert</surname>
<given-names>D. M.</given-names>
</name>
</person-group>
 (<year>2008</year>
). <article-title>Noise in the nervous system</article-title>
. <source>Nat. Rev. Neurosci</source>
. <volume>9</volume>
, <fpage>292</fpage>
–<lpage>303</lpage>
<pub-id pub-id-type="doi">10.1038/nrn2258</pub-id>
<pub-id pub-id-type="pmid">18319728</pub-id>
</mixed-citation>
</ref>
<ref id="B17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fulvio</surname>
<given-names>J.</given-names>
</name>
<name><surname>Green</surname>
<given-names>C. S.</given-names>
</name>
<name><surname>Schrater</surname>
<given-names>P.</given-names>
</name>
</person-group>
 (<year>2013</year>
). <article-title>Specificity in learning: blame the paradigm</article-title>
. <source>J. Vis</source>
. <volume>13</volume>
:<fpage>246</fpage>
<pub-id pub-id-type="doi">10.1167/13.9.246</pub-id>
</mixed-citation>
</ref>
<ref id="B18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Genewein</surname>
<given-names>T.</given-names>
</name>
<name><surname>Braun</surname>
<given-names>D. A.</given-names>
</name>
</person-group>
 (<year>2014</year>
). <article-title>Occam's razor in sensorimotor learning</article-title>
. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci</source>
. <volume>281</volume>
:<fpage>2952</fpage>
<pub-id pub-id-type="doi">10.1098/rspb.2013.2952</pub-id>
<pub-id pub-id-type="pmid">24671968</pub-id>
</mixed-citation>
</ref>
<ref id="B19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gershman</surname>
<given-names>S. J.</given-names>
</name>
</person-group>
 (<year>2013</year>
). <article-title>Perceptual estimation obeys Occam's razor</article-title>
. <source>Front. Psychol</source>
. <volume>23</volume>
:<issue>623</issue>
<pub-id pub-id-type="doi">10.3389/fpsyg.2013.00623</pub-id>
<pub-id pub-id-type="pmid">24137136</pub-id>
</mixed-citation>
</ref>
<ref id="B20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Griffiths</surname>
<given-names>T. L.</given-names>
</name>
<name><surname>Chater</surname>
<given-names>N.</given-names>
</name>
<name><surname>Kemp</surname>
<given-names>C.</given-names>
</name>
<name><surname>Perfors</surname>
<given-names>A.</given-names>
</name>
<name><surname>Tenenbaum</surname>
<given-names>J. B.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>Probabilistic models of cognition: exploring representations and inductive biases</article-title>
. <source>Trends Cogn. Sci</source>
. <volume>14</volume>
, <fpage>357</fpage>
–<lpage>364</lpage>
<pub-id pub-id-type="doi">10.1016/j.tics.2010.05.004</pub-id>
<pub-id pub-id-type="pmid">20576465</pub-id>
</mixed-citation>
</ref>
<ref id="B21"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Griffiths</surname>
<given-names>T. L.</given-names>
</name>
<name><surname>Williams</surname>
<given-names>J. G.</given-names>
</name>
<name><surname>Kalish</surname>
<given-names>M. L.</given-names>
</name>
</person-group>
 (<year>2008</year>
). <article-title>Modeling human function learning with Gaussian processes</article-title>
. <source>Adv. Neural Inf. Process Syst</source>
. <volume>3529</volume>
, <fpage>553</fpage>
–<lpage>560</lpage>
</mixed-citation>
</ref>
<ref id="B21a"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jazayeri</surname>
<given-names>M.</given-names>
</name>
<name><surname>Shadlen</surname>
<given-names>M. N.</given-names>
</name>
</person-group>
 (<year>2010</year>
). <article-title>Temporal context calibrates interval timing</article-title>
. <source>Nat. Neurosci</source>
. <volume>13</volume>
, <fpage>1020</fpage>
–<lpage>1026</lpage>
<pub-id pub-id-type="doi">10.1038/nn.2590</pub-id>
<pub-id pub-id-type="pmid">20581842</pub-id>
</mixed-citation>
</ref>
<ref id="B22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kalish</surname>
<given-names>M. L.</given-names>
</name>
<name><surname>Griffiths</surname>
<given-names>T. L.</given-names>
</name>
<name><surname>Lewandowsky</surname>
<given-names>S.</given-names>
</name>
</person-group>
 (<year>2007</year>
). <article-title>Iterated learning: intergenerational knowledge transmission reveals inductive biases</article-title>
. <source>Psychon. Bull. Rev</source>
. <volume>14</volume>
, <fpage>288</fpage>
–<lpage>294</lpage>
<pub-id pub-id-type="doi">10.3758/BF03194066</pub-id>
<pub-id pub-id-type="pmid">17694915</pub-id>
</mixed-citation>
</ref>
<ref id="B23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kalish</surname>
<given-names>M. L.</given-names>
</name>
<name><surname>Lewandowsky</surname>
<given-names>S.</given-names>
</name>
<name><surname>Kruschke</surname>
<given-names>J. K.</given-names>
</name>
</person-group>
 (<year>2004</year>
). <article-title>Population of linear experts: knowledge partitioning and function learning</article-title>
. <source>Psychol. Rev</source>
. <volume>111</volume>
, <fpage>1072</fpage>
–<lpage>1099</lpage>
<pub-id pub-id-type="doi">10.1037/0033-295X.111.4.1072</pub-id>
<pub-id pub-id-type="pmid">15482074</pub-id>
</mixed-citation>
</ref>
<ref id="B24"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kass</surname>
<given-names>R. E.</given-names>
</name>
<name><surname>Raftery</surname>
<given-names>A. E.</given-names>
</name>
</person-group>
 (<year>1995</year>
). <article-title>Bayes factors</article-title>
. <source>J. Am. Stat. Assoc</source>
. <volume>90</volume>
, <fpage>773</fpage>
–<lpage>795</lpage>
<pub-id pub-id-type="doi">10.1080/01621459.1995.10476572</pub-id>
</mixed-citation>
</ref>
<ref id="B25"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kemp</surname>
<given-names>C.</given-names>
</name>
<name><surname>Tenenbaum</surname>
<given-names>J. B.</given-names>
</name>
</person-group>
 (<year>2008</year>
). <article-title>The discovery of structural form</article-title>
. <source>Proc. Natl. Acad. Sci. U.S.A</source>
. <volume>110</volume>
, <fpage>E1064</fpage>
–<lpage>E1073</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0802631105</pub-id>
<pub-id pub-id-type="pmid">18669663</pub-id>
</mixed-citation>
</ref>
<ref id="B26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kemp</surname>
<given-names>C.</given-names>
</name>
<name><surname>Tenenbaum</surname>
<given-names>J. B.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Structured statistical models of inductive reasoning</article-title>
. <source>Psychol. Rev</source>
. <volume>116</volume>
, <fpage>20</fpage>
<pub-id pub-id-type="doi">10.1037/a0014282</pub-id>
<pub-id pub-id-type="pmid">19159147</pub-id>
</mixed-citation>
</ref>
<ref id="B27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koh</surname>
<given-names>K.</given-names>
</name>
<name><surname>Meyer</surname>
<given-names>D. E.</given-names>
</name>
</person-group>
 (<year>1991</year>
). <article-title>Function learning: induction of continuous stimulus-response relations</article-title>
. <source>J. Exp. Psychol. Learn. Mem. Cogn</source>
. <volume>17</volume>
:<fpage>811</fpage>
<pub-id pub-id-type="doi">10.1037/0278-7393.17.5.811</pub-id>
<pub-id pub-id-type="pmid">1834766</pub-id>
</mixed-citation>
</ref>
<ref id="B37"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Körding</surname>
<given-names>K. P.</given-names>
</name>
<name><surname>Beierholm</surname>
<given-names>U.</given-names>
</name>
<name><surname>Ma</surname>
<given-names>W. J.</given-names>
</name>
<name><surname>Quartz</surname>
<given-names>S.</given-names>
</name>
<name><surname>Tenenbaum</surname>
<given-names>J. B.</given-names>
</name>
<name><surname>Shams</surname>
<given-names>L.</given-names>
</name>
</person-group>
 (<year>2007</year>
). <article-title>Causal inference in multisensory perception</article-title>
. <source>PLoS ONE</source>
<volume>2</volume>
:<fpage>e943</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0000943</pub-id>
<pub-id pub-id-type="pmid">17895984</pub-id>
</mixed-citation>
</ref>
<ref id="B28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Maij</surname>
<given-names>F.</given-names>
</name>
<name><surname>Brenner</surname>
<given-names>E.</given-names>
</name>
<name><surname>Smeets</surname>
<given-names>J. B. J.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Temporal information can influence spatial localization</article-title>
. <source>J. Neurophysiol</source>
. <volume>102</volume>
, <fpage>490</fpage>
–<lpage>495</lpage>
<pub-id pub-id-type="doi">10.1152/jn.91253.2008</pub-id>
<pub-id pub-id-type="pmid">19439670</pub-id>
</mixed-citation>
</ref>
<ref id="B38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McDaniel</surname>
<given-names>M. A.</given-names>
</name>
<name><surname>Busemeyer</surname>
<given-names>J. R.</given-names>
</name>
</person-group>
 (<year>2005</year>
). <article-title>The conceptual basis of function learning and extrapolation: comparison of rule-based and associative-based models</article-title>
. <source>Psychon. Bull. Rev</source>
. <volume>12</volume>
, <fpage>24</fpage>
–<lpage>42</lpage>
<pub-id pub-id-type="doi">10.3758/BF03196347</pub-id>
<pub-id pub-id-type="pmid">15948282</pub-id>
</mixed-citation>
</ref>
<ref id="B29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McDaniel</surname>
<given-names>M. A.</given-names>
</name>
<name><surname>Dimperio</surname>
<given-names>E.</given-names>
</name>
<name><surname>Griego</surname>
<given-names>J. A.</given-names>
</name>
<name><surname>Busemeyer</surname>
<given-names>J. R.</given-names>
</name>
</person-group>
 (<year>2009</year>
). <article-title>Predicting transfer performance: a comparison of competing function learning models</article-title>
. <source>J. Exp. Psychol. Learn. Mem. Cogn</source>
. <volume>35</volume>
, <fpage>173</fpage>
–<lpage>195</lpage>
<pub-id pub-id-type="doi">10.1037/a0013982</pub-id>
<pub-id pub-id-type="pmid">19210089</pub-id>
</mixed-citation>
</ref>
<ref id="B30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Narain</surname>
<given-names>D.</given-names>
</name>
<name><surname>Mamassian</surname>
<given-names>P.</given-names>
</name>
<name><surname>van Beers</surname>
<given-names>R. J.</given-names>
</name>
<name><surname>Smeets</surname>
<given-names>J. B. J.</given-names>
</name>
<name><surname>Brenner</surname>
<given-names>E.</given-names>
</name>
</person-group>
 (<year>2013a</year>
). <article-title>How the statistics of sequential presentation influence the learning of structure</article-title>
. <source>PLoS ONE</source>
<volume>8</volume>
:<fpage>e62276</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0062276</pub-id>
<pub-id pub-id-type="pmid">23638022</pub-id>
</mixed-citation>
</ref>
<ref id="B31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Narain</surname>
<given-names>D.</given-names>
</name>
<name><surname>van Beers</surname>
<given-names>R. J.</given-names>
</name>
<name><surname>Smeets</surname>
<given-names>J. B. J.</given-names>
</name>
<name><surname>Brenner</surname>
<given-names>E.</given-names>
</name>
</person-group>
 (<year>2013b</year>
). <article-title>Sensorimotor priors in nonstationary environments</article-title>
. <source>J. Neurophys</source>
. <volume>109</volume>
, <fpage>1259</fpage>
–<lpage>1267</lpage>
<pub-id pub-id-type="doi">10.1152/jn.00605.2012</pub-id>
<pub-id pub-id-type="pmid">23235999</pub-id>
</mixed-citation>
</ref>
<ref id="B32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Raftery</surname>
<given-names>A. E.</given-names>
</name>
</person-group>
 (<year>1995</year>
). <article-title>Bayesian model selection in social research</article-title>
. <source>Sociol. Methodol</source>
. <volume>25</volume>
, <fpage>111</fpage>
–<lpage>163</lpage>
<pub-id pub-id-type="doi">10.2307/271063</pub-id>
</mixed-citation>
</ref>
<ref id="B33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tenenbaum</surname>
<given-names>J. B.</given-names>
</name>
<name><surname>Kemp</surname>
<given-names>C.</given-names>
</name>
<name><surname>Griffiths</surname>
<given-names>T. L.</given-names>
</name>
<name><surname>Goodman</surname>
<given-names>N. D.</given-names>
</name>
</person-group>
 (<year>2011</year>
). <article-title>How to grow a mind: statistics, structure, and abstraction</article-title>
. <source>Science</source>
<volume>331</volume>
, <fpage>1279</fpage>
–<lpage>1285</lpage>
<pub-id pub-id-type="doi">10.1126/science.1192788</pub-id>
<pub-id pub-id-type="pmid">21393536</pub-id>
</mixed-citation>
</ref>
<ref id="B34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Turnham</surname>
<given-names>E. J. A.</given-names>
</name>
<name><surname>Braun</surname>
<given-names>D. A.</given-names>
</name>
<name><surname>Wolpert</surname>
<given-names>D. M.</given-names>
</name>
</person-group>
 (<year>2011</year>
). <article-title>Inferring visuomotor priors for sensorimotor learning</article-title>
. <source>PLoS Comput. Biol</source>
. <volume>7</volume>
:<fpage>e1001112</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1001112</pub-id>
<pub-id pub-id-type="pmid">21483475</pub-id>
</mixed-citation>
</ref>
<ref id="B35"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Beers</surname>
<given-names>R. J.</given-names>
</name>
<name><surname>Sittig</surname>
<given-names>A. C.</given-names>
</name>
<name><surname>Denier van der Gon</surname>
<given-names>J. J.</given-names>
</name>
</person-group>
 (<year>1999</year>
). <article-title>Integration of proprioceptive and visual position-information: an experimentally supported model</article-title>
. <source>J. Neurophys</source>
. <volume>81</volume>
, <fpage>1355</fpage>
–<lpage>1364</lpage>
<pub-id pub-id-type="pmid">10085361</pub-id>
</mixed-citation>
</ref>
<ref id="B36"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wasserman</surname>
<given-names>L.</given-names>
</name>
</person-group>
 (<year>2000</year>
). <article-title>Bayesian model selection and model averaging</article-title>
. <source>J. Math. Psychol</source>
. <volume>44</volume>
, <fpage>92</fpage>
–<lpage>107</lpage>
<pub-id pub-id-type="doi">10.1006/jmps.1999.1278</pub-id>
<pub-id pub-id-type="pmid">10733859</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
<affiliations><list><country><li>Allemagne</li>
<li>France</li>
<li>Pays-Bas</li>
</country>
</list>
<tree><country name="Allemagne"><noRegion><name sortKey="Narain, Devika" sort="Narain, Devika" uniqKey="Narain D" first="Devika" last="Narain">Devika Narain</name>
</noRegion>
<name sortKey="Narain, Devika" sort="Narain, Devika" uniqKey="Narain D" first="Devika" last="Narain">Devika Narain</name>
</country>
<country name="Pays-Bas"><noRegion><name sortKey="Narain, Devika" sort="Narain, Devika" uniqKey="Narain D" first="Devika" last="Narain">Devika Narain</name>
</noRegion>
<name sortKey="Brenner, Eli" sort="Brenner, Eli" uniqKey="Brenner E" first="Eli" last="Brenner">Eli Brenner</name>
<name sortKey="Smeets, Jeroen B J" sort="Smeets, Jeroen B J" uniqKey="Smeets J" first="Jeroen B. J." last="Smeets">Jeroen B. J. Smeets</name>
<name sortKey="Van Beers, Robert J" sort="Van Beers, Robert J" uniqKey="Van Beers R" first="Robert J." last="Van Beers">Robert J. Van Beers</name>
</country>
<country name="France"><noRegion><name sortKey="Mamassian, Pascal" sort="Mamassian, Pascal" uniqKey="Mamassian P" first="Pascal" last="Mamassian">Pascal Mamassian</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/HapticV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003374 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 003374 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    HapticV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:4179744
   |texte=   Structure learning and the Occam's razor principle: a new view of human function acquisition
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:25324770" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a HapticV1

This area was generated with Dilib version V0.6.23.
Data generation: Mon Jun 13 01:09:46 2016. Site generation: Wed Mar 6 09:54:07 2024

	Serveur d'exploration sur les dispositifs haptiques
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur les dispositifs haptiques

Structure learning and the Occam's razor principle: a new view of human function acquisition

Structure learning and the Occam's razor principle: a new view of human function acquisition

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki