HapticV1, Pmc, Curation, bibRecord, 000538

Fast and Accurate Learning When Making Discrete Numerical Estimates

Identifieur interne : 000538 ( Pmc/Curation ); précédent : 000537; suivant : 000539

Fast and Accurate Learning When Making Discrete Numerical Estimates

Auteurs : Adam N. Sanborn [Royaume-Uni] ; Ulrik R. Beierholm [Royaume-Uni]

Source :

PLoS Computational Biology [ 1553-734X ] ; 2016.

RBID : PMC:4829178

Abstract

Many everyday estimation tasks have an inherently discrete nature, whether the task is counting objects (e.g., a number of paint buckets) or estimating discretized continuous variables (e.g., the number of paint buckets needed to paint a room). While Bayesian inference is often used for modeling estimates made along continuous scales, discrete numerical estimates have not received as much attention, despite their common everyday occurrence. Using two tasks, a numerosity task and an area estimation task, we invoke Bayesian decision theory to characterize how people learn discrete numerical distributions and make numerical estimates. Across three experiments with novel stimulus distributions we found that participants fell between two common decision functions for converting their uncertain representation into a response: drawing a sample from their posterior distribution and taking the maximum of their posterior distribution. While this was consistent with the decision function found in previous work using continuous estimation tasks, surprisingly the prior distributions learned by participants in our experiments were much more adaptive: When making continuous estimates, participants have required thousands of trials to learn bimodal priors, but in our tasks participants learned discrete bimodal and even discrete quadrimodal priors within a few hundred trials. This makes discrete numerical estimation tasks good testbeds for investigating how people learn and make estimates.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4829178

DOI: 10.1371/journal.pcbi.1004859
PubMed: 27070155
PubMed Central: 4829178

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000538

Links to Exploration step

PMC:4829178

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Fast and Accurate Learning When Making Discrete Numerical Estimates</title>
<author><name sortKey="Sanborn, Adam N" sort="Sanborn, Adam N" uniqKey="Sanborn A" first="Adam N." last="Sanborn">Adam N. Sanborn</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Department of Psychology, University of Warwick, Coventry, United Kingdom</addr-line>
</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Psychology, University of Warwick, Coventry</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Beierholm, Ulrik R" sort="Beierholm, Ulrik R" uniqKey="Beierholm U" first="Ulrik R." last="Beierholm">Ulrik R. Beierholm</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>Department of Psychology, Durham University, Durham, United Kingdom</addr-line>
</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Psychology, Durham University, Durham</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff003"><addr-line>Centre for Computational Neuroscience and Cognitive Robotics, University of Birmingham, United Kingdom</addr-line>
</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Centre for Computational Neuroscience and Cognitive Robotics, University of Birmingham</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27070155</idno>
<idno type="pmc">4829178</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4829178</idno>
<idno type="RBID">PMC:4829178</idno>
<idno type="doi">10.1371/journal.pcbi.1004859</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000538</idno>
<idno type="wicri:Area/Pmc/Curation">000538</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Fast and Accurate Learning When Making Discrete Numerical Estimates</title>
<author><name sortKey="Sanborn, Adam N" sort="Sanborn, Adam N" uniqKey="Sanborn A" first="Adam N." last="Sanborn">Adam N. Sanborn</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Department of Psychology, University of Warwick, Coventry, United Kingdom</addr-line>
</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Psychology, University of Warwick, Coventry</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Beierholm, Ulrik R" sort="Beierholm, Ulrik R" uniqKey="Beierholm U" first="Ulrik R." last="Beierholm">Ulrik R. Beierholm</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>Department of Psychology, Durham University, Durham, United Kingdom</addr-line>
</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Psychology, Durham University, Durham</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="aff003"><addr-line>Centre for Computational Neuroscience and Cognitive Robotics, University of Birmingham, United Kingdom</addr-line>
</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Centre for Computational Neuroscience and Cognitive Robotics, University of Birmingham</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS Computational Biology</title>
<idno type="ISSN">1553-734X</idno>
<idno type="eISSN">1553-7358</idno>
<imprint><date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>Many everyday estimation tasks have an inherently discrete nature, whether the task is counting objects (e.g., a number of paint buckets) or estimating discretized continuous variables (e.g., the number of paint buckets needed to paint a room). While Bayesian inference is often used for modeling estimates made along continuous scales, discrete numerical estimates have not received as much attention, despite their common everyday occurrence. Using two tasks, a numerosity task and an area estimation task, we invoke Bayesian decision theory to characterize how people learn discrete numerical distributions and make numerical estimates. Across three experiments with novel stimulus distributions we found that participants fell between two common decision functions for converting their uncertain representation into a response: drawing a sample from their posterior distribution and taking the maximum of their posterior distribution. While this was consistent with the decision function found in previous work using continuous estimation tasks, surprisingly the prior distributions learned by participants in our experiments were much more adaptive: When making continuous estimates, participants have required thousands of trials to learn bimodal priors, but in our tasks participants learned discrete bimodal and even discrete quadrimodal priors within a few hundred trials. This makes discrete numerical estimation tasks good testbeds for investigating how people learn and make estimates.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Bernoulli, D" uniqKey="Bernoulli D">D Bernoulli</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Savage, Lj" uniqKey="Savage L">LJ Savage</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Von Neumann, Lj" uniqKey="Von Neumann L">LJ von Neumann</name>
</author>
<author><name sortKey="Morgenstern, O" uniqKey="Morgenstern O">O Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Emlen, Jm" uniqKey="Emlen J">JM Emlen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Macarthur, Rh" uniqKey="Macarthur R">RH MacArthur</name>
</author>
<author><name sortKey="Pianka, Er" uniqKey="Pianka E">ER Pianka</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Anderson, Jr" uniqKey="Anderson J">JR Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Anderson, Jr" uniqKey="Anderson J">JR Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ernst, Mo" uniqKey="Ernst M">MO Ernst</name>
</author>
<author><name sortKey="Banks, Ms" uniqKey="Banks M">MS Banks</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Geisler, Ws" uniqKey="Geisler W">WS Geisler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kording, K" uniqKey="Kording K">K Körding</name>
</author>
<author><name sortKey="Wolpert, Dm" uniqKey="Wolpert D">DM Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Oaksford, M" uniqKey="Oaksford M">M Oaksford</name>
</author>
<author><name sortKey="Chater, N" uniqKey="Chater N">N Chater</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jones, M" uniqKey="Jones M">M Jones</name>
</author>
<author><name sortKey="Love, Bc" uniqKey="Love B">BC Love</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bowers, Js" uniqKey="Bowers J">JS Bowers</name>
</author>
<author><name sortKey="Davis, Cj" uniqKey="Davis C">CJ Davis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Acerbi, L" uniqKey="Acerbi L">L Acerbi</name>
</author>
<author><name sortKey="Ma, Wj" uniqKey="Ma W">WJ Ma</name>
</author>
<author><name sortKey="Vijayakumar, S" uniqKey="Vijayakumar S">S Vijayakumar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Acerbi, L" uniqKey="Acerbi L">L Acerbi</name>
</author>
<author><name sortKey="Vijayakumar, S" uniqKey="Vijayakumar S">S Vijayakumar</name>
</author>
<author><name sortKey="Wolpert, Dm" uniqKey="Wolpert D">DM Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Griffiths, Tl" uniqKey="Griffiths T">TL Griffiths</name>
</author>
<author><name sortKey="Chater, N" uniqKey="Chater N">N Chater</name>
</author>
<author><name sortKey="Norris, D" uniqKey="Norris D">D Norris</name>
</author>
<author><name sortKey="Pouget, A" uniqKey="Pouget A">A Pouget</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Lindskog, M" uniqKey="Lindskog M">M Lindskog</name>
</author>
<author><name sortKey="Winman, A" uniqKey="Winman A">A Winman</name>
</author>
<author><name sortKey="Juslin, P" uniqKey="Juslin P">P Juslin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Acerbi, L" uniqKey="Acerbi L">L Acerbi</name>
</author>
<author><name sortKey="Wolpert, Dm" uniqKey="Wolpert D">DM Wolpert</name>
</author>
<author><name sortKey="Vijayakumar, S" uniqKey="Vijayakumar S">S Vijayakumar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Battaglia, Pw" uniqKey="Battaglia P">PW Battaglia</name>
</author>
<author><name sortKey="Kersten, D" uniqKey="Kersten D">D Kersten</name>
</author>
<author><name sortKey="Schrater, Pr" uniqKey="Schrater P">PR Schrater</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berniker, M" uniqKey="Berniker M">M Berniker</name>
</author>
<author><name sortKey="Voss, M" uniqKey="Voss M">M Voss</name>
</author>
<author><name sortKey="Kording, K" uniqKey="Kording K">K Kording</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chalk, M" uniqKey="Chalk M">M Chalk</name>
</author>
<author><name sortKey="Seitz, Ar" uniqKey="Seitz A">AR Seitz</name>
</author>
<author><name sortKey="Series, P" uniqKey="Series P">P Seriès</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jazayeri, M" uniqKey="Jazayeri M">M Jazayeri</name>
</author>
<author><name sortKey="Shadlen, Mn" uniqKey="Shadlen M">MN Shadlen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kording, Kp" uniqKey="Kording K">KP Körding</name>
</author>
<author><name sortKey="Wolpert, Dm" uniqKey="Wolpert D">DM Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wozny, Dr" uniqKey="Wozny D">DR Wozny</name>
</author>
<author><name sortKey="Beierholm, Ur" uniqKey="Beierholm U">UR Beierholm</name>
</author>
<author><name sortKey="Shams, L" uniqKey="Shams L">L Shams</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lehmann, El" uniqKey="Lehmann E">EL Lehmann</name>
</author>
<author><name sortKey="Casella, G" uniqKey="Casella G">G Casella</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tassinari, H" uniqKey="Tassinari H">H Tassinari</name>
</author>
<author><name sortKey="Hudson, Te" uniqKey="Hudson T">TE Hudson</name>
</author>
<author><name sortKey="Landy, Ms" uniqKey="Landy M">MS Landy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bishop, Cm" uniqKey="Bishop C">CM Bishop</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ashby, Fg" uniqKey="Ashby F">FG Ashby</name>
</author>
<author><name sortKey="Alfonso Reese, La" uniqKey="Alfonso Reese L">LA Alfonso-Reese</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="J Kel, F" uniqKey="J Kel F">F Jäkel</name>
</author>
<author><name sortKey="Scholkopf, B" uniqKey="Scholkopf B">B Schölkopf</name>
</author>
<author><name sortKey="Wichmann, Fa" uniqKey="Wichmann F">Fa Wichmann</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rosseel, Y" uniqKey="Rosseel Y">Y Rosseel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Vanpaemel, W" uniqKey="Vanpaemel W">W Vanpaemel</name>
</author>
<author><name sortKey="Storms, G" uniqKey="Storms G">G Storms</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fretwell, Sd" uniqKey="Fretwell S">SD Fretwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schulze, C" uniqKey="Schulze C">C Schulze</name>
</author>
<author><name sortKey="Van Ravenzwaaij, D" uniqKey="Van Ravenzwaaij D">D van Ravenzwaaij</name>
</author>
<author><name sortKey="Newell, Br" uniqKey="Newell B">BR Newell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Griffiths, Tl" uniqKey="Griffiths T">TL Griffiths</name>
</author>
<author><name sortKey="Lieder, F" uniqKey="Lieder F">F Lieder</name>
</author>
<author><name sortKey="Goodman, Nd" uniqKey="Goodman N">ND Goodman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Vul, E" uniqKey="Vul E">E Vul</name>
</author>
<author><name sortKey="Goodman, N" uniqKey="Goodman N">N Goodman</name>
</author>
<author><name sortKey="Griffiths, Tl" uniqKey="Griffiths T">TL Griffiths</name>
</author>
<author><name sortKey="Tenenbaum, Jb" uniqKey="Tenenbaum J">JB Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Burgess, A" uniqKey="Burgess A">A Burgess</name>
</author>
<author><name sortKey="Barlow, H" uniqKey="Barlow H">H Barlow</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kanitscheider, I" uniqKey="Kanitscheider I">I Kanitscheider</name>
</author>
<author><name sortKey="Brown, A" uniqKey="Brown A">A Brown</name>
</author>
<author><name sortKey="Pouget, A" uniqKey="Pouget A">A Pouget</name>
</author>
<author><name sortKey="Churchland, Ak" uniqKey="Churchland A">AK Churchland</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krueger, Le" uniqKey="Krueger L">LE Krueger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Izard, V" uniqKey="Izard V">V Izard</name>
</author>
<author><name sortKey="Dehaene, S" uniqKey="Dehaene S">S Dehaene</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Piazza, M" uniqKey="Piazza M">M Piazza</name>
</author>
<author><name sortKey="Izard, V" uniqKey="Izard V">V Izard</name>
</author>
<author><name sortKey="Pinel, P" uniqKey="Pinel P">P Pinel</name>
</author>
<author><name sortKey="Le Bihan, D" uniqKey="Le Bihan D">D Le Bihan</name>
</author>
<author><name sortKey="Dehaene, S" uniqKey="Dehaene S">S Dehaene</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Goldstein, Dg" uniqKey="Goldstein D">DG Goldstein</name>
</author>
<author><name sortKey="Rothschild, D" uniqKey="Rothschild D">D Rothschild</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nassar, Mr" uniqKey="Nassar M">MR Nassar</name>
</author>
<author><name sortKey="Wilson, Rc" uniqKey="Wilson R">RC Wilson</name>
</author>
<author><name sortKey="Heasly, B" uniqKey="Heasly B">B Heasly</name>
</author>
<author><name sortKey="Gold, Ji" uniqKey="Gold J">JI Gold</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pollack, I" uniqKey="Pollack I">I Pollack</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Petrov, Aa" uniqKey="Petrov A">AA Petrov</name>
</author>
<author><name sortKey="Anderson, Jr" uniqKey="Anderson J">JR Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Van Oeffelen, Mp" uniqKey="Van Oeffelen M">MP Van Oeffelen</name>
</author>
<author><name sortKey="Vos, Pg" uniqKey="Vos P">PG Vos</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kass, Re" uniqKey="Kass R">RE Kass</name>
</author>
<author><name sortKey="Raftery, Ae" uniqKey="Raftery A">AE Raftery</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alter, Al" uniqKey="Alter A">AL Alter</name>
</author>
<author><name sortKey="Oppenheimer, Dm" uniqKey="Oppenheimer D">DM Oppenheimer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mckinley, Sc" uniqKey="Mckinley S">SC McKinley</name>
</author>
<author><name sortKey="Nosofsky, Rm" uniqKey="Nosofsky R">RM Nosofsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shi, L" uniqKey="Shi L">L Shi</name>
</author>
<author><name sortKey="Griffiths, Tl" uniqKey="Griffiths T">TL Griffiths</name>
</author>
<author><name sortKey="Feldman, Nh" uniqKey="Feldman N">NH Feldman</name>
</author>
<author><name sortKey="Sanborn, An" uniqKey="Sanborn A">AN Sanborn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gershman, Sj" uniqKey="Gershman S">SJ Gershman</name>
</author>
<author><name sortKey="Niv, Y" uniqKey="Niv Y">Y Niv</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ortega, Pa" uniqKey="Ortega P">PA Ortega</name>
</author>
<author><name sortKey="Braun, Da" uniqKey="Braun D">DA Braun</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Acuna, De" uniqKey="Acuna D">DE Acuna</name>
</author>
<author><name sortKey="Berniker, M" uniqKey="Berniker M">M Berniker</name>
</author>
<author><name sortKey="Fernandes, Hl" uniqKey="Fernandes H">HL Fernandes</name>
</author>
<author><name sortKey="Kording, Kp" uniqKey="Kording K">KP Kording</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kording, K" uniqKey="Kording K">K Körding</name>
</author>
<author><name sortKey="Beierholm, U" uniqKey="Beierholm U">U Beierholm</name>
</author>
<author><name sortKey="Ma, W" uniqKey="Ma W">W Ma</name>
</author>
<author><name sortKey="Quartz, S" uniqKey="Quartz S">S Quartz</name>
</author>
<author><name sortKey="Tenenbaum, J" uniqKey="Tenenbaum J">J Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ashby, Fg" uniqKey="Ashby F">FG Ashby</name>
</author>
<author><name sortKey="Maddox, Wt" uniqKey="Maddox W">WT Maddox</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Luce, Rd" uniqKey="Luce R">RD Luce</name>
</author>
<author><name sortKey="Luce, Rd" uniqKey="Luce R">RD Luce</name>
</author>
<author><name sortKey="Bush, Rr" uniqKey="Bush R">RR Bush</name>
</author>
<author><name sortKey="Galanter, E" uniqKey="Galanter E">E Galanter</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS Comput. Biol</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id>
<journal-title-group><journal-title>PLoS Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher><publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">27070155</article-id>
<article-id pub-id-type="pmc">4829178</article-id>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-15-01594</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1004859</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Cognitive Science</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Psychology</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Social Sciences</subject>
<subj-group><subject>Psychology</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Learning and Memory</subject>
<subj-group><subject>Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Cognitive Science</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
<subj-group><subject>Human Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Psychology</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
<subj-group><subject>Human Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Social Sciences</subject>
<subj-group><subject>Psychology</subject>
<subj-group><subject>Cognitive Psychology</subject>
<subj-group><subject>Learning</subject>
<subj-group><subject>Human Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Learning and Memory</subject>
<subj-group><subject>Learning</subject>
<subj-group><subject>Human Learning</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Physical Sciences</subject>
<subj-group><subject>Mathematics</subject>
<subj-group><subject>Applied Mathematics</subject>
<subj-group><subject>Decision Theory</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Physical Sciences</subject>
<subj-group><subject>Mathematics</subject>
<subj-group><subject>Statistics (Mathematics)</subject>
<subj-group><subject>Decision Theory</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Physical Sciences</subject>
<subj-group><subject>Mathematics</subject>
<subj-group><subject>Probability Theory</subject>
<subj-group><subject>Probability Distribution</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Physical Sciences</subject>
<subj-group><subject>Mathematics</subject>
<subj-group><subject>Operator Theory</subject>
<subj-group><subject>Kernel Functions</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Cognitive Science</subject>
<subj-group><subject>Cognition</subject>
<subj-group><subject>Decision Making</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Physical Sciences</subject>
<subj-group><subject>Mathematics</subject>
<subj-group><subject>Statistics (Mathematics)</subject>
<subj-group><subject>Statistical Noise</subject>
<subj-group><subject>Gaussian Noise</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Neuroscience</subject>
<subj-group><subject>Sensory Perception</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Biology and Life Sciences</subject>
<subj-group><subject>Psychology</subject>
<subj-group><subject>Sensory Perception</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3"><subject>Social Sciences</subject>
<subj-group><subject>Psychology</subject>
<subj-group><subject>Sensory Perception</subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group><article-title>Fast and Accurate Learning When Making Discrete Numerical Estimates</article-title>
<alt-title alt-title-type="running-head">Fast and Accurate Learning When Making Discrete Numerical Estimates</alt-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Sanborn</surname>
<given-names>Adam N.</given-names>
</name>
<xref ref-type="aff" rid="aff001"><sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Beierholm</surname>
<given-names>Ulrik R.</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003"><sup>3</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff001"><label>1</label>
<addr-line>Department of Psychology, University of Warwick, Coventry, United Kingdom</addr-line>
</aff>
<aff id="aff002"><label>2</label>
<addr-line>Department of Psychology, Durham University, Durham, United Kingdom</addr-line>
</aff>
<aff id="aff003"><label>3</label>
<addr-line>Centre for Computational Neuroscience and Cognitive Robotics, University of Birmingham, United Kingdom</addr-line>
</aff>
<contrib-group><contrib contrib-type="editor"><name><surname>Schrater</surname>
<given-names>Paul</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1"><addr-line>University of Minnesota, UNITED STATES</addr-line>
</aff>
<author-notes><fn fn-type="conflict" id="coi001"><p>The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001"><p>Conceived and designed the experiments: ANS URB. Performed the experiments: ANS. Analyzed the data: ANS URB. Contributed reagents/materials/analysis tools: ANS URB. Wrote the paper: ANS URB.</p>
</fn>
<corresp id="cor001">* E-mail: <email>a.n.sanborn@warwick.ac.uk</email>
</corresp>
</author-notes>
<pub-date pub-type="collection"><month>4</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="epub"><day>12</day>
<month>4</month>
<year>2016</year>
</pub-date>
<volume>12</volume>
<issue>4</issue>
<elocation-id>e1004859</elocation-id>
<history><date date-type="received"><day>22</day>
<month>9</month>
<year>2015</year>
</date>
<date date-type="accepted"><day>9</day>
<month>3</month>
<year>2016</year>
</date>
</history>
<permissions><copyright-statement>© 2016 Sanborn, Beierholm</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Sanborn, Beierholm</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pcbi.1004859.pdf"></self-uri>
<abstract><p>Many everyday estimation tasks have an inherently discrete nature, whether the task is counting objects (e.g., a number of paint buckets) or estimating discretized continuous variables (e.g., the number of paint buckets needed to paint a room). While Bayesian inference is often used for modeling estimates made along continuous scales, discrete numerical estimates have not received as much attention, despite their common everyday occurrence. Using two tasks, a numerosity task and an area estimation task, we invoke Bayesian decision theory to characterize how people learn discrete numerical distributions and make numerical estimates. Across three experiments with novel stimulus distributions we found that participants fell between two common decision functions for converting their uncertain representation into a response: drawing a sample from their posterior distribution and taking the maximum of their posterior distribution. While this was consistent with the decision function found in previous work using continuous estimation tasks, surprisingly the prior distributions learned by participants in our experiments were much more adaptive: When making continuous estimates, participants have required thousands of trials to learn bimodal priors, but in our tasks participants learned discrete bimodal and even discrete quadrimodal priors within a few hundred trials. This makes discrete numerical estimation tasks good testbeds for investigating how people learn and make estimates.</p>
</abstract>
<abstract abstract-type="summary"><title>Author Summary</title>
<p>Studies of human perception and decision making have traditionally focused on scenarios where participants have to make estimates about continuous variables. However discrete variables are also common in our environment, potentially requiring different theoretical models. We describe ways to model such scenarios within the statistical framework of Bayesian inference and explain how aspects of such models can be teased apart experimentally. Using two experimental setups, a numerosity task and an area estimation task, we show that human participants do indeed rely on combinations of specific model components. Specifically we show that human learning in discrete tasks can be surprisingly fast and that participants can use the learned information in a way that is either optimal or near-optimal.</p>
</abstract>
<funding-group><funding-statement>ANS was funded by the UK Economic and Social Research Council (<ext-link ext-link-type="uri" xlink:href="http://www.esrc.ac.uk/">http://www.esrc.ac.uk/</ext-link>
), from grant number ES/K004948/1. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts><fig-count count="6"></fig-count>
<table-count count="0"></table-count>
<page-count count="28"></page-count>
</counts>
<custom-meta-group><custom-meta id="data-availability"><meta-name>Data Availability</meta-name>
<meta-value>All relevant data are within the paper and its Supporting Information files.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes><title>Data Availability</title>
<p>All relevant data are within the paper and its Supporting Information files.</p>
</notes>
</front>
<body><sec sec-type="intro" id="sec001"><title>Introduction</title>
<p>People are often asked questions that require discrete numerical estimates. When judging from a glance “How many people are in the room?” or “How many dots are on a screen?” the quantity to estimate is discrete and any sensible answer must be a whole number. Discrete numerical estimates are also often required when the underlying quantity is continuous, very commonly when buying items. For example, painters often quickly assess a wall, whose area is a continuous quantity, and then buy a discrete number of paint cans; likewise at parties hosts may have to assess their guests’ hunger and order a discrete number of pizzas.</p>
<p>To understand how discrete numerical estimates are made, a formal framework is needed. Perhaps the most prevalent formal framework for characterizing decision making is Bayesian decision theory, which has provided a normative standard against which to measure behavior in economics [<xref rid="pcbi.1004859.ref001" ref-type="bibr">1</xref>
–<xref rid="pcbi.1004859.ref003" ref-type="bibr">3</xref>
], biology [<xref rid="pcbi.1004859.ref004" ref-type="bibr">4</xref>
, <xref rid="pcbi.1004859.ref005" ref-type="bibr">5</xref>
] and for a wide variety of tasks in psychology such as categorization, memory, multi-sensory and sensorimotor integration, and reasoning [<xref rid="pcbi.1004859.ref006" ref-type="bibr">6</xref>
–<xref rid="pcbi.1004859.ref011" ref-type="bibr">11</xref>
]. Bayesian decision theory prescribes how to combine prior beliefs about states of the world, the likelihood that a state of the world generated an observation, and the decision function—the function which converts an uncertain representation into the response that maximizes expected reward. The interaction between these components is fixed in Bayesian decision theory, but the prior, likelihood, and decision function can each take many forms. This freedom allows Bayesian decision theory to correspond to a wide range of decision making behavior [<xref rid="pcbi.1004859.ref012" ref-type="bibr">12</xref>
, <xref rid="pcbi.1004859.ref013" ref-type="bibr">13</xref>
], and in the right experimental design each component can be identified [<xref rid="pcbi.1004859.ref014" ref-type="bibr">14</xref>
–<xref rid="pcbi.1004859.ref017" ref-type="bibr">17</xref>
]. Identifying the prior characterizes how people represent past experience, identifying the likelihood indicates how people represent the evidential value of a new observation, and identifying the decision function characterizes how people convert uncertain beliefs into an estimate.</p>
<p>Despite the prevalence of real-world decisions requiring discrete numerical estimates, not much work has been done to characterize them using Bayesian decision theory to determine the priors, likelihoods, and decision functions that people use (cf. [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
]). This stands in contrast to investigations of continuous estimates, such as pointing to a location, which have received much more attention [<xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
, <xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
, <xref rid="pcbi.1004859.ref019" ref-type="bibr">19</xref>
–<xref rid="pcbi.1004859.ref025" ref-type="bibr">25</xref>
]. In this paper we address this gap by characterizing how people make discrete numerical estimates. We first describe Bayesian decision theory and how continuous estimates have been characterized. Next we review previous research into discrete numerical estimation, finding it to be sparse but suggestive of differences in how continuous and discrete numerical estimates are made. We then present three new experiments that investigate discrete numerical estimates using bimodal training distributions, which allowed us to identify both the prior and decision function used. Our first experiment uses a numerosity task in which both the ground truth and response are discrete. In our second experiment, we generalize our results to a rectangle area estimation task in which the ground truth is continuous, but participants must give discrete estimates in whole square centimeters. Participants in the first two experiments learn bimodal priors quite quickly, so the third experiment investigates learning of more complex quadrimodal training distribution, which allows us to better distinguish how participants build their priors from experience. Finally we discuss our results, comparing continuous and discrete numerical estimation, exploring the implications for how people learn priors, and discussing how people convert uncertain beliefs into a numerical estimate.</p>
<sec id="sec002"><title>Characterizing continuous estimates</title>
<p>To characterize how people make continuous estimates, first we outline Bayesian decision theory, which prescribes how to maximize expected rewards. Bayesian decision theory is composed of three components that each need to be specified: the prior probability, the likelihood, and the decision function. There are a variety of distributions and functions that can be used for each component, but how the components are combined is fixed by the laws of probability [<xref rid="pcbi.1004859.ref026" ref-type="bibr">26</xref>
].</p>
<p>The decision maker begins with a prior <italic>P</italic>
(<italic>S</italic>
), which gives the prior probability of each state of the world, <italic>S</italic>
. For simplicity, we assume that the states of the world all are arranged along a single dimension and each state has a one-to-one mapping to a response. On each trial, the decision maker observes some data, <italic>X</italic>
, that are noisy or ambiguous, and the likelihood, <italic>P</italic>
(<italic>X</italic>
|<italic>S</italic>), is the probability of the observed data given each of the possible states of the world. The prior and the likelihood are combined via Bayes’ rule to determine the posterior probability of the states of the world having observed the data,
<disp-formula id="pcbi.1004859.e001"><alternatives><graphic xlink:href="pcbi.1004859.e001.jpg" id="pcbi.1004859.e001g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M1"><mml:mrow><mml:mi>P</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>∝</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
<label>(1)</label>
</disp-formula>
where equality is achieved if the right-hand side is divided by <italic>P</italic>
(<italic>X</italic>
). In a sequential task, this posterior distribution is used as the prior distribution for the next trial, so the prior reflects a participant’s accumulated experience throughout the task.</p>
<p>The best estimate depends not only on what is believed to be true about the world; because <italic>S</italic>
 given <italic>X</italic>
 is uncertain, it also depends on what happens if an incorrect response is made. The dependence of rewards on the response is given by the loss function <italic>L</italic>
(<italic>R</italic>
; <italic>S</italic>
), which captures the loss (negative reward) for making response <italic>R</italic>
 if the state of the world is <italic>S</italic>
. The decision function, <italic>D</italic>
<sub><italic>L</italic>
</sub>, then maps the posterior probabilities onto the response with the smallest expected loss
<disp-formula id="pcbi.1004859.e002"><alternatives><graphic xlink:href="pcbi.1004859.e002.jpg" id="pcbi.1004859.e002g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M2"><mml:mrow><mml:msub><mml:mi>D</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder><mml:mo form="prefix">arg min</mml:mo>
<mml:mi>R</mml:mi>
</mml:munder>
<mml:msub><mml:mo>∫</mml:mo>
<mml:mi>S</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
<label>(2)</label>
</disp-formula>
</p>
<p>Continuous estimates have been modeled using a particular set of priors, likelihoods, and decision functions. <italic>Likelihoods</italic>
 are often assumed to be Gaussian because this density is a good match to the perceptual noise in many tasks, and participants have been shown to correctly adapt to the amount of noise in their perception. For example, in multi-sensory integration and sensorimotor tasks, the normative weight applied to each sensory cue depends on the variance of Gaussian-distributed perceptual noise, and participants’ weights come close to matching these normative weights [<xref rid="pcbi.1004859.ref008" ref-type="bibr">8</xref>
, <xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
, <xref rid="pcbi.1004859.ref027" ref-type="bibr">27</xref>
].</p>
<p>For the <italic>prior</italic>
, a standard choice for the training distribution in continuous estimation tasks is a Gaussian density because it makes the analysis analytically tractable: combining a Gaussian prior with a Gaussian likelihood results in a Gaussian posterior. However more flexible schemes for learning priors exist, and a common way to introduce flexibility is to use a non-parametric prior which grows in complexity as more data are observed. Kernel density estimation is a standard choice for building a non-parametric prior for continuous training data [<xref rid="pcbi.1004859.ref028" ref-type="bibr">28</xref>
] and this kind of representation has been used in many models of human categorization [<xref rid="pcbi.1004859.ref029" ref-type="bibr">29</xref>
, <xref rid="pcbi.1004859.ref030" ref-type="bibr">30</xref>
]. In kernel density estimation, the nonparametric prior is constructed from a weighted sum of component parametric densities, one for each previously observed data point. Mixture priors, which have also been used in models of categorization [<xref rid="pcbi.1004859.ref007" ref-type="bibr">7</xref>
, <xref rid="pcbi.1004859.ref031" ref-type="bibr">31</xref>
, <xref rid="pcbi.1004859.ref032" ref-type="bibr">32</xref>
], provide another representation that allows more flexibility in the number of parametric components than kernel density estimation. This representation operates between the simple parametric and the kernel density cases, grouping similar data together into the same component, but allowing different components for data that are dissimilar.</p>
<p>In continuous estimation tasks, participants learn Gaussian and other unimodal training distributions quickly: in various continuous estimation tasks, unimodal training distributions have been learned in hundreds of trials [<xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
, <xref rid="pcbi.1004859.ref019" ref-type="bibr">19</xref>
, <xref rid="pcbi.1004859.ref021" ref-type="bibr">21</xref>
, <xref rid="pcbi.1004859.ref023" ref-type="bibr">23</xref>
]. Participants can also learn bimodal training distributions, demonstrating that they do not use just simple parametric priors, but they are slower to do so. Experimenters have had to train participants on bimodal distributions for thousands of trials [<xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
, <xref rid="pcbi.1004859.ref022" ref-type="bibr">22</xref>
], because fewer training trials do not result in clear evidence of learning [<xref rid="pcbi.1004859.ref019" ref-type="bibr">19</xref>
]. Participants were able to use bimodal distributions when presented with an explicit summary, and this work also showed participants are better described as using a mixture prior rather than kernel density estimate with a very narrow kernel [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
].</p>
<p>For the <italic>decision function</italic>
 in continuous estimation, a small number of simple functions have been considered, each of which can be motivated by one or more loss functions. An all-or-none loss function leads to choosing the response with the highest posterior probability, a quadratic loss function leads to taking the mean of the posterior, and a linear loss function yields the posterior median as the best response. A fourth decision function, drawing a sample from the posterior, requires a more complex motivation. One route is to speculate that participants assume that the computer is adaptively responding to their input in a competitive fashion, so that a stochastic decision function can increase their expected reward [<xref rid="pcbi.1004859.ref033" ref-type="bibr">33</xref>
, <xref rid="pcbi.1004859.ref034" ref-type="bibr">34</xref>
]. Another is to assume that participants are maximizing expected reward subject to particular computational costs: if participants draw samples from the posterior and these samples require time or effort to generate, it can be better to make a quick and less accurate decision rather than a slow and effortful accumulation of enough samples to calculate the maximum of the posterior [<xref rid="pcbi.1004859.ref035" ref-type="bibr">35</xref>
, <xref rid="pcbi.1004859.ref036" ref-type="bibr">36</xref>
].</p>
<p>Comparison of these decision functions in continuous estimation has yielded task-dependent conclusions. Some researchers have found evidence for the mean decision function, finding that participants used a loss function that was quadratic near the correct value but more linear far from the correct value, giving it robustness to outliers [<xref rid="pcbi.1004859.ref024" ref-type="bibr">24</xref>
]. Other work has found evidence for a mean decision function despite feedback which did not encourage this decision function [<xref rid="pcbi.1004859.ref023" ref-type="bibr">23</xref>
], but a later analysis showed that this particular task does not discriminate well between decision functions [<xref rid="pcbi.1004859.ref014" ref-type="bibr">14</xref>
].</p>
<p>Recent research has incentivized the max decision function and then investigated the decision function actually used. Work using Gaussian priors found that instead of the max, participants were performing an interpolation between drawing a single sample and taking the max of the posterior [<xref rid="pcbi.1004859.ref020" ref-type="bibr">20</xref>
]. There are various mechanisms that could produce this interpolation: participants could be drawing a number of samples from the posterior distribution and taking the mean of these samples as their estimate, they could be raising the posterior to a power greater than one and then sampling their estimate from this exponentiated posterior distribution, or perhaps they combine the two by taking the mean of a number of samples from an exponentiated posterior. While these explanations are indistinguishable for Gaussian posteriors [<xref rid="pcbi.1004859.ref020" ref-type="bibr">20</xref>
], work using a bimodal training distribution has successfully tested two of these possibilities, the mean of a number of samples versus a sample from an exponentiated posterior, and found that participants were drawing a single sample from an exponentiated posterior distribution [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
].</p>
</sec>
<sec id="sec003"><title>Characterizing discrete numerical estimates</title>
<p>Though researchers have occasionally investigated discrete numerical estimation, it has not received much attention, possibly because it has been viewed as no different from continuous estimation. However if we look at the work that has been done, there are suggestions that people make these two types of estimates differently. Bayesian decision theory is useful here for cataloguing the similarities and differences.</p>
<p>The <italic>likelihood</italic>
 in discrete numerical estimation is similar to that found in continuous estimation. As in continuous tasks, participants find it more difficult to discriminate stimuli that are closer physically even though they are naturally discrete. Indeed, in perceptual numerosity tasks, discrimination performance nearly follows Weber’s law, implying that the standard deviation of the perceptual noise distribution is proportional to its mean [<xref rid="pcbi.1004859.ref037" ref-type="bibr">37</xref>
], which has been modeled as Gaussian noise on the log-transformed numbers [<xref rid="pcbi.1004859.ref038" ref-type="bibr">38</xref>
–<xref rid="pcbi.1004859.ref041" ref-type="bibr">41</xref>
]. Like in continuous tasks, participants appear to use knowledge of their own perceptual noise to set their likelihood in numerosity tasks [<xref rid="pcbi.1004859.ref038" ref-type="bibr">38</xref>
], a point we also address in the <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
.</p>
<p>Investigations into the <italic>priors</italic>
 used in discrete numerical estimation have shown that participants can learn unimodal distributions of stimuli quickly, as in continuous estimation. Participants are able to reconstruct the frequency of events from unimodal distributions from just a few trials [<xref rid="pcbi.1004859.ref042" ref-type="bibr">42</xref>
] and their estimations of new events quickly show an influence of the mean in a changing sequence of numbers [<xref rid="pcbi.1004859.ref043" ref-type="bibr">43</xref>
]. In the similar task of absolute identification, in which participants are asked to identify a series of perceptual stimuli with numerical labels [<xref rid="pcbi.1004859.ref044" ref-type="bibr">44</xref>
], participants are also influenced by unimodal distributions of stimuli [<xref rid="pcbi.1004859.ref045" ref-type="bibr">45</xref>
].</p>
<p>However, tasks training bimodal prior distributions point to potential differences between continuous and discrete numerical estimation. The first potential difference is in the speed of learning bimodal priors. In one task, participants asked to reconstruct bimodal prior distributions were able to do so within a few hundred training trials [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
], and in another participants could do so for some bimodal distributions after only 12 trials [<xref rid="pcbi.1004859.ref046" ref-type="bibr">46</xref>
]. Though this suggests that participants have a speed advantage in learning priors for discrete numerical estimates, these priors were assessed through reconstruction and it needs to be established whether the same priors are used in estimation.</p>
<p>A potential difference in the <italic>decision function</italic>
 was also found in [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
], in an experiment in which participants were asked to estimate the revenues associated with trained and novel company names. Participants’ estimates for novel companies were either at the lower edge of the range of trained revenues or were in the middle of the range, results which were modeled as drawing a set of samples from the prior combined with a mixture of two decision functions. One decision function was to use the lowest sample in the set as the estimate (because unknown companies are likely to have low revenue), and the other was to take the mean of the set of samples as the estimate [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
]. The use of the mean of a small number of samples as the decision function contrasts with the exponentiated posterior supported by work in continuous estimation, and this is another potential difference. However, revenue estimation is very different from the perceptual tasks used in continuous estimation, so it would help to investigate the decision function in a perceptual discrete numerical estimation task.</p>
<p>Here we investigate these potential differences in two different discrete numerical estimation tasks: estimating the number of dots on a screen and estimating the area of a rectangle. We are particularly interested in whether participants can quickly learn complex multimodal prior distributions and what decision function they use to make their estimates. In exploring this, we go further than previous studies by investigating whether participants use a kernel density estimate, a mixture model, or a categorical distribution as a prior. Through both combinatorial model comparison and fitting of nested models we examine the decision function, investigating whether the mean, max, the mean of a number of samples from the posterior, a sample from an exponentiated version of the posterior distribution, or perhaps a more complex decision function best explains participants’ estimates. We compare our findings to the results from continuous estimation in the discussion as well as explore the implications for what priors people can learn and what decision functions they use.</p>
</sec>
</sec>
<sec sec-type="results" id="sec004"><title>Results</title>
<sec id="sec005"><title>Experiment 1</title>
<p>To characterize how participants make discrete numerical responses, we ran a new experiment on numerosity estimation in which participants were trained on a bimodal distribution. Participants were asked to estimate the number of dots that briefly appeared on a screen in a series of trials, receiving feedback about whether they were correct and what the correct answer was after each trial as shown in <xref ref-type="fig" rid="pcbi.1004859.g001">Fig 1A</xref>
. They were not told anything about which numbers to expect in addition to the feedback. On each trial, the number of dots on the screen was drawn from a sharp bimodal distribution with two peaks on either side of a region of lower probability values (e.g., the distribution shown in the top left corner of <xref ref-type="fig" rid="pcbi.1004859.g002">Fig 2</xref>
). A sharp bimodal distribution allows us to better identify the prior used. If participants are not generalizing beyond the numbers that were given as feedback, then their prior should eventually match the training distribution and they will not respond outside the range of the stimuli. However, if participants are using a parametric or kernel density prior distribution, then the prior distribution will have some spillover outside the range of stimuli, and participants will respond outside this range even after hundreds of training trials. Using a mixture prior will also result in responses outside the range, but they will likely be fewer in number. Examples of these four possibilities are shown in the top row of <xref ref-type="fig" rid="pcbi.1004859.g002">Fig 2</xref>
.</p>
<fig id="pcbi.1004859.g001" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1004859.g001</object-id>
<label>Fig 1</label>
<caption><title>Illustrations of the tasks used in the experiments.</title>
<p>A) Estimation trials for dots. B) Estimation trials for rectangle area. C) Discrimination trials for dots. D) Discrimination trials for rectangle area.</p>
</caption>
<graphic xlink:href="pcbi.1004859.g001"></graphic>
</fig>
<fig id="pcbi.1004859.g002" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1004859.g002</object-id>
<label>Fig 2</label>
<caption><title>The predicted data for various combinations of prior distribution and decision function for Experiments 1 and 2.</title>
<p>The first row shows how a strongly bimodal distribution would be represented by each type of prior after training, with the categorical distribution reflecting the true prior distribution. The remaining rows of plots each show conditional response distributions (CRDs), in which the area of each square represents the expected response probability given a particular presented number of dots. The prior in the first row is combined with each row’s decision function to produce the CRDs.</p>
</caption>
<graphic xlink:href="pcbi.1004859.g002"></graphic>
</fig>
<p>Before and after the main task, we included a separate discrimination task that allowed us to characterize the likelihood distribution for each participant. This removed a degree of freedom from the process of characterizing the prior and decision function. The noise in numerosity judgments is well-known to follow Weber’s law with a standard deviation proportional to the mean [<xref rid="pcbi.1004859.ref037" ref-type="bibr">37</xref>
], and has been modeled in past research as a lognormal distribution [<xref rid="pcbi.1004859.ref038" ref-type="bibr">38</xref>
–<xref rid="pcbi.1004859.ref041" ref-type="bibr">41</xref>
]. We assumed that the likelihood distribution was accurately calibrated and thus equivalent to the noise distribution. The scale of the lognormal distribution <italic>σ</italic>
 (i.e., the standard deviation of the natural logarithm of a lognormally distributed variable), which can be determined from the Weber fraction <italic>w</italic>
 using the formula <inline-formula id="pcbi.1004859.e003"><alternatives><graphic xlink:href="pcbi.1004859.e003.jpg" id="pcbi.1004859.e003g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M3"><mml:mrow><mml:mi>σ</mml:mi>
<mml:mo>=</mml:mo>
<mml:msqrt><mml:mrow><mml:mo form="prefix">log</mml:mo>
<mml:mo>(</mml:mo>
<mml:msup><mml:mi>w</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, was estimated in a discrimination task in which participants were asked which of two screens contained more dots (shown in <xref ref-type="fig" rid="pcbi.1004859.g001">Fig 1C</xref>
). Participants’ discrimination judgments were well fit with a standard deviation that ranged from 0.18 to 0.53, with a median of 0.22. These estimates are in reasonable agreement with previous research that found for numerosity estimates that discriminability was equivalent to <italic>σ</italic>
 ≈ 0.16 [<xref rid="pcbi.1004859.ref037" ref-type="bibr">37</xref>
, <xref rid="pcbi.1004859.ref047" ref-type="bibr">47</xref>
].</p>
<p>After fixing the lognormal standard deviation, we can make predictions for the responses after the training distribution has been learned for each pairing of prior and decision function. Using the median estimate of <italic>σ</italic>
 = 0.22, predictions from pairings of possible priors and decision functions are shown in <xref ref-type="fig" rid="pcbi.1004859.g002">Fig 2</xref>
 in the form of conditional response distributions (CRDs): for trials on which a particular number of dots are presented, each panel shows the distribution over the responses expected by the combination of prior and decision function. Two qualitative features stand out in these plots. The first is the number of modes in each CRD. If the mean decision function is used or if the prior is a Gaussian distribution then the CRD will be unimodal, otherwise it will be bimodal. The second qualitative feature is whether responses occur outside the range of values that was presented. Responses will always be within the range of presented values for the categorical prior, while for other priors responses can occur outside of the range if participants sample from the posterior distribution.</p>
<p>In the main task, participants were assigned to one of three groups, with each group participating in a series of trials in which the sharp bimodal prior distribution covered a larger or smaller range. This was done to ensure that the results were not strongly dependent on the distance between peaks or on the particular numbers assigned to the modes of the distributions. The average and individual CRDs for each of the three groups are shown in <xref ref-type="fig" rid="pcbi.1004859.g003">Fig 3</xref>
, along with the distribution of the training trials given to participants. In order to plot stable performance, the first 300 trials were not included in these plots. The empirical CRDs for each group show a strong bimodality, implying that neither the mean decision function nor the Gaussian prior characterize human data. Responses are not made only at the modes of the training distribution, however, as a large number of responses are found between the two peaks. These middle responses are evidence against the max decision rule (which would be very unlikely to result in an intermediate response), so from qualitative inspection posterior sampling is left as the best characterization of the decision function.</p>
<fig id="pcbi.1004859.g003" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1004859.g003</object-id>
<label>Fig 3</label>
<caption><title>Training distributions and conditional response distributions for each group of participants after 300 trials in Experiment 1.</title>
<p>Each group of participants is in a separate row. The left column of plots shows the probability with which each number was presented to participants. The middle column shows aggregate conditional response distributions with the area of each square representing the relative frequency of making a particular response given a particular presented stimulus. The right column shows the conditional response distributions for each participant. The letters D and G mark participants who are using a categorical (Dirichlet prior) or Gaussian kernel prior respectively and the letters A, M, and S mark the participants who were characterized as using the mean (Average), Max, or Sampling decision functions respectively.</p>
</caption>
<graphic xlink:href="pcbi.1004859.g003"></graphic>
</fig>
<p>Further inspection of the average data shows very few responses outside of the range of presented values: the narrow group shows no such responses, and the medium and wide groups show few responses of these types. For the medium and wide groups, the responses outside the range of presented values only appear for a small subset of the participants: the third and fifth participant in the medium group, and the third and fourth participant in the wide group. The lack of responses outside of the range of presented values, combined with the identification of participants’ decision function as consistent with posterior sampling, implicates the use of a categorical prior, though this is not easy to distinguish from a mixture prior, as shown in <xref ref-type="fig" rid="pcbi.1004859.g002">Fig 2</xref>
.</p>
<p>We fit a set of computational models (see <xref ref-type="sec" rid="sec014">Methods</xref>
; Model comparison) to provide quantitative evidence that individual participants were using categorical priors and sampling from the posterior. Each model was fit to all of the trials and the prior was updated after each instance of feedback was given. We specifically tested the combinations of prior updating (categorical (Dirichlet) or Gaussian kernel) and decision functions (mean (Average), Max or Sample). We performed a model comparison using the Bayes Information Criterion that adjusts the fit of the model with a penalty for complexity [<xref rid="pcbi.1004859.ref048" ref-type="bibr">48</xref>
]. Eighteen of the twenty participants in this experiment were best described by the categorical prior and a decision function that drew a single sample from the posterior. For the remaining two participants, one was best described by a Gaussian kernel and a max decision function and the other by a categorical prior and a max decision function. The best models for each participant are indicated in <xref ref-type="fig" rid="pcbi.1004859.g003">Fig 3</xref>
 and the BIC values transformed into approximate posterior probabilities are shown in the <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
.</p>
<p>To allow for a wider range of possible behaviors, we also fit computational models that allowed for “trembling hand” noise and models that allowed the posterior distribution to be raised to a power before the decision function was applied (see <xref ref-type="sec" rid="sec014">Methods</xref>
). Once we included this set of models, we found that nineteen of the twenty participants were best described by raising the posterior distribution to an exponent larger than one before the decision function was applied, while the remaining participant was best described with the original model (implying an exponent of one). Once the posterior distribution was raised to a power, behavior was best described as a single sample for ten participants and as the mean of the exponentiated posterior for nine participants. This generalization elaborates on what was found with the first set of computational models: exponentiating the posterior means that participants lie between sampling and the max decision function, and the individual differences in using a single sample or the mean reflect individual differences in the amount of stochasticity in the estimates and in the tendency to sometimes respond near the middle of the presented range of stimuli.</p>
<p>A final generalization was to fit a ‘super-model’ to each participant’s data (see <xref ref-type="sec" rid="sec014">Methods</xref>
) that allows us to further investigate the individual differences in stochasticity that participants have in their estimates by quantifying the number of samples they use. The individual best fits and an exercise showing that these parameters are identifiable are given in the <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
. Reinforcing the model comparison above, nineteen of the twenty participants used an exponentiated posterior distribution to make their decisions: the exponent was well above 1.0 for all but one participant. This one participant was best fit by a single sample, so there was no evidence that any participants were taking the mean of a small number of samples from the untransformed posterior, instead participants were sampling from an interpolation between the posterior and a distribution that was entirely on the max of the posterior. This analysis allows us to further investigate those participants found to be using the mean of an exponentiated posterior. Half of these participants were best fit by taking the mean of between 2–4 samples, while the remainder were taking the mean of a larger number (i.e., about 30) samples.</p>
<p>In summary, this experiment established that the great majority of participants could learn essentially categorical priors when using discrete numerical responses and tended to respond using a decision function that was either a single sample or the average of multiple samples drawn from an exponentiated posterior distribution. A question then arises about the generality of the results. Are people using a categorical prior distribution because the number of dots is necessarily a discrete quantity? Or are they using a categorical prior distribution because of the discreteness of the responses?</p>
</sec>
<sec id="sec006"><title>Experiment 2</title>
<p>To test whether the results of Experiment 1 were driven by the discreteness of dots or the discreteness of the response, we ran essentially the same experiment, but instead of asking participants to estimate the numbers of dots we asked them to estimate the area of rectangles. Like in the example of buying paint to cover a wall, rectangle area is a continuous quantity but we forced participants to make discrete responses: they were required to estimate the area of rectangles in whole square centimeters.</p>
<p>Two groups of participants were run in this experiment and the results are shown in <xref ref-type="fig" rid="pcbi.1004859.g004">Fig 4</xref>
. When we fit their discrimination data, we found a median of <italic>σ</italic>
 = 0.41. As in Experiment 1, the average results in the estimation task show a bimodal distribution. Responses often fall in the middle but hardly ever fall outside the range of presented values. This pattern again is qualitatively most consistent with a categorical prior and a decision function that draws a single sample from the posterior.</p>
<fig id="pcbi.1004859.g004" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1004859.g004</object-id>
<label>Fig 4</label>
<caption><title>Training distributions and conditional response distributions for each group of participants after 300 trials in Experiment 2 (estimates of the area of a rectangle).</title>
<p>Each group of participants is in a separate row. The left column of plots shows the probability with which each number was presented to participants. The middle column shows aggregate conditional response distributions with the area of each square representing the relative frequency of making a particular response given a particular presented stimulus. The right column shows the conditional response distributions for each participant. The letters D and G mark participants who are using a categorical (Dirichlet prior) or Gaussian kernel prior respectively and the letters A, M, and S mark the participants who were characterized as using the mean (Average), Max, or Sampling decision functions respectively.</p>
</caption>
<graphic xlink:href="pcbi.1004859.g004"></graphic>
</fig>
<p>We used the same analysis approach of successive generalization with the same models as we used in Experiment 1, with all individual results given in the <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
. In the simplest model comparison, we found that every participant was best described by a categorical prior distribution and a decision function that was a single sample from the posterior. This result is given next to each individual in <xref ref-type="fig" rid="pcbi.1004859.g004">Fig 4</xref>
. When we generalized the comparison to allow the posterior distribution to be raised to a power before the decision function was applied, we found that every participant was better described by exponentiating their posterior distribution, bringing it closer to a distribution that was entirely on the max. As in Experiment 1, half of participants were best described by taking a sample from this exponentiated posterior, while the other half were best described by taking the mean of the exponentiated posterior reflecting a tendency to sometimes respond near the middle of the presented range of stimuli. The more general ‘super-model’ analysis provided more detail on how many samples were being taken from the exponentiated posterior, and thus the amount of stochasticity in the estimates. All of the participants were best fit by taking the mean of between 3 and 100 samples from an exponentiated posterior distribution, with the participants best described as taking the mean of the exponentiated posterior tending to be on the higher end of this range.</p>
<p>In both Experiments 1 and 2 participants used near-categorical prior distributions and either take a single or multiple samples from an exponentiated posterior when they are asked to make discrete responses, regardless of whether the underlying quantity was discrete or continuous. They clearly were able to learn a bimodal distribution surprisingly quickly, so these results lead to the question of how flexible this representation is. As shown in <xref ref-type="fig" rid="pcbi.1004859.g002">Fig 2</xref>
 a mixture of Gaussians prior can quite closely imitate the categorical prior that was best supported by the data. As the mixture model interpolates between a Gaussian prior and kernel density estimation, it is difficult to provide evidence against this model. More generally, allowing mixtures of other types of distributions, such as uniform distributions, makes the problem even more difficult. In order to test whether participants were using mixture models, Experiment 3 investigates whether participants can learn a more complex prior distribution.</p>
</sec>
<sec id="sec007"><title>Experiment 3</title>
<p>In order to further investigate how complex a prior distribution participants can learn within a few hundred trials, participants in this experiment were trained on a quadrimodal prior distribution. As shown in <xref ref-type="fig" rid="pcbi.1004859.g005">Fig 5</xref>
, this distribution was designed to test whether participants were using simple mixture models. If participants are assigning all of the trials with 23–25 dots to one mixture component and all of the trials with 29–31 dots to a separate mixture component, then the predictions of the categorical prior and the mixture prior are clearly distinguishable: the mixture model always predicts a peak in response frequency at numbers 24 and 30, while the categorical prior distribution predicts that these numbers will be selected less often than the peaks. These same predictions would also be made if participants are using other distributions in a mixture model, such as uniform distributions, with the same assignments also leads to the prediction that numbers 24 and 30 will be selected at least as often as the other presented numbers.</p>
<fig id="pcbi.1004859.g005" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1004859.g005</object-id>
<label>Fig 5</label>
<caption><title>The predicted data for various combinations of prior distribution and decision function for Experiment 3.</title>
<p>The first row shows how a strongly bimodal distribution would be represented by each type of prior after training, with the categorical distribution reflecting the true prior distribution. The remaining rows of plots each show conditional response distributions (CRDs), in which the area of each square represents the expected response probability given a particular presented number of dots. The prior in the first row is combined with each row’s decision function to produce the CRDs.</p>
</caption>
<graphic xlink:href="pcbi.1004859.g005"></graphic>
</fig>
<p>Three groups of participants were run in this experiment: one group completed a perceptually easier numerosity task, one group a perceptually more difficult numerosity task, and one group completed a rectangle task. When we fit the discrimination data, we found medians of <italic>σ</italic>
 = 0.19, <italic>σ</italic>
 = 0.14, and <italic>σ</italic>
 = 0.14 for the Difficult Numerosity, Easier Numerosity, and Rectangle groups respectively (first estimation task, see <xref ref-type="sec" rid="sec014">Methods</xref>
) and used the latter value for generating <xref ref-type="fig" rid="pcbi.1004859.g005">Fig 5</xref>
. The mean results for each of the groups are shown in <xref ref-type="fig" rid="pcbi.1004859.g006">Fig 6</xref>
 and the qualitative results in this experiment again look like a combination of a categorical prior with sampling from the posterior distribution.</p>
<fig id="pcbi.1004859.g006" orientation="portrait" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1004859.g006</object-id>
<label>Fig 6</label>
<caption><title>Training distributions and conditional response distributions for each group of participants after 300 trials in Experiment 3.</title>
<p>Each group of participants is in a separate row. The left column of plots shows the probability with which each number was presented to participants. The middle column shows aggregate conditional response distributions with the area of each square representing the relative frequency of making a particular response given a particular presented stimulus. The right column shows the conditional response distributions for each participant. The stars mark participants that made significantly fewer responses to lower (but non-zero) probability numbers than higher probability numbers. The letters D and G mark participants who are using a categorical (Dirichlet prior) or Gaussian kernel prior respectively and the letters A, M, and S mark the participants who were characterized as using the mean (Average), Max, or Sampling decision functions respectively.</p>
</caption>
<graphic xlink:href="pcbi.1004859.g006"></graphic>
</fig>
<p>We ran the same analysis as was done for Experiments 1 and 2 (model comparison and fitting parameters of the ‘super-model’ with individual results given in the <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
) to determine which model explained participants responses best. Using the simplest model comparison, 15 of 21 participants were best described by a categorical prior and by sampling their estimates from the posterior. The remaining six participants were better described by using a Gaussian kernel prior, with four of them taking the max of the posterior and two sampling. The correspondence of these results to the individual data is shown in <xref ref-type="fig" rid="pcbi.1004859.g006">Fig 6</xref>
. The less restrictive model comparison again showed that a categorical prior and a single sample from an exponentiated posterior distribution was the best description of the largest number of participants (12 of 21). The other participants were best fit by a variety of models. For the ‘super-model’ analysis, which allows us to better investigate the stochasticity in the decision function, 11 out of the 21 participants drew a single sample from a posterior distribution that had an exponent clearly above 1.0, while the remainder either drew a single sample from a posterior distribution with an exponent not much different from 1.0 or took the mean of a larger number of samples from an exponentiated posterior distribution. No participants were best fit by averaging multiple samples from the unexponentiated posterior distribution.</p>
<p>To test whether participants were using a simple mixture model which assigned the trials with dots 23–25 to one mixture component and trials with dots 29–31 to separate mixture component, we looked at whether participants produced fewer responses of 24 and 30 compared to the peaks of the distribution. If participants were equally likely to respond with any of the presented numbers (after trial 300 and ignoring what the actual presented value was), then participants should have picked numbers 24 or 30 on at least 1/3 of trials. Using 1/3 of trials as a null hypothesis we ran binomial tests to determine if the actual number of responses was significantly lower than this value for each participant. Overall, 14 of 21 participants produced significantly fewer responses than the null hypothesis predicted (<italic>p</italic>
 < 0.05). The participant showing significant differences are marked with stars in <xref ref-type="fig" rid="pcbi.1004859.g006">Fig 6</xref>
. Clearly a number of participants were not using this simple mixture model as their prior distribution.</p>
<p>Mixture models that are closer to the categorical prior are harder to rule out. For example, mixture model components might consist of separate components for every number, except for a single pair of numbers that are represented with the same component. For the prior distribution trained in this experiment, this would be a mixture model consisting of five component densities to represent the six presented responses. We simulated how often responses just outside the presented range would appear if there were separate mixture components for every number except for one single pair of adjacent numbers for a variety of choices of the adjacent pairs and values of <italic>σ</italic>
 and found that participants would be expected to produce a response just outside the range on perhaps as few as 0.6% of trials. This low rate means that it is not possible to say that any individual participant produced significantly fewer responses: the probability of producing zero of these responses on 200 trials assuming a 0.6% probability is 0.3. However, we do note that 11 of 21 participants did not produce any of these just-outside-the-range responses (and four additional participants produced only one). If all participants were consistently grouping adjacent numbers together the probability of observing this many participants with zero responses of this type is low (<italic>p</italic>
 = .027).</p>
<p>Overall, Experiment 3 demonstrates that participants are accurately learning very complex quadrimodal prior distribution within a few hundred trials. The complexity of the prior learned allowed us to even rule out for most participants a simple mixture model that could have explained behavior in the first two experiments.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="sec008"><title>Discussion</title>
<p>In the preceding pages we have characterized the components necessary for optimal Bayesian decision making with discrete numerical stimuli and explained how model comparison and model fitting allows us to tease apart these components given the right experimental setup.</p>
<p>The results from three experiments show that most participants were better described by a perfect match to the training distribution, rather than a parametric distribution, a kernel density estimate, or some forms of mixture model. Generally speaking participants were best characterized as raising their posterior distribution to a power, which interpolates between the original posterior distribution and a distribution entirely on the max on the posterior. Using this transformed distribution, participants’ estimates were best described as averaging one or more of samples from an exponentiated posterior distribution, with the number of samples reflecting individual differences in the stochasticity of their estimates and in the probability of responding near the middle of the presented range of stimuli. As participants tended to either draw a single sample or use a large exponent to transform their posterior distribution, this ruled out the mean or the mean of a number of samples as viable decision functions. Experiments 1 and 2 established this pattern across both the numerosity and area estimation tasks, pointing toward the discreteness of the response rather than the discreteness of the stimuli as the driver of this behavior. Experiment 3 expanded this to even more complicated stimulus distributions, providing evidence against other mechanisms for updating the prior. We now compare our results to continuous and discrete estimation in previous experiments, and discuss the conclusions we can draw about priors and decision functions.</p>
<sec id="sec009"><title>Comparing continuous and discrete numerical estimation</title>
<p>Previous investigations had suggested that the decision function used in discrete numerical estimation might be different from that used in continuous estimation tasks. In tasks in which the max decision function was incentivized, work with continuous estimation tasks has shown that participants exponentiated the posterior distribution and drew a sample from this exponentiated distribution [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
, <xref rid="pcbi.1004859.ref020" ref-type="bibr">20</xref>
]. This result contrasts with the findings from discrete numerical tasks showing that even when incentivized to use the max rule, participants still appeared to use the mean of a small number of samples [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
].</p>
<p>Our results for discrete numerical estimation were different. We encouraged the max decision rule (by giving participants feedback of ‘correct’ or ‘incorrect’), and we found that participants were using an exponentiated version of the posterior distribution. Our ‘super-model’ analysis allowed for both exponentiation of the posterior and for the mean to be taken of a number of samples, and we found strong evidence for exponentiation in a large majority of participants and individual differences in whether a single or multiple samples were used. Despite the individual differences, no participants were best fit by the mean of a small number of samples from the untransformed posterior.</p>
<p>The divergent results between our task and the revenue estimation task of [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
] need explanation. One potential key difference is that the likelihood was characterized in our task but not in the revenue estimation task. Pronounceable company names induce different expectations about company stock performance than non-pronounceable names [<xref rid="pcbi.1004859.ref049" ref-type="bibr">49</xref>
], and these expectations could be reflected in a variety of likelihoods that cause the resulting estimates to resemble those coming from a mixture of decision functions. This is of course speculative, and the divergence may be due to other task differences, but it highlights the importance of characterizing all of the components of Bayesian decision theory.</p>
<p>Overall, our findings for the decision function roughly correspond with those found in continuous estimation, so there seems to be no strong dividing line between the decision function used to make these two types of estimates. Of course decision functions have usually not been characterized in much detail nor have been characterized across a range of tasks, so later investigations could reveal subtle differences in how the task shapes the decision function used.</p>
<p>In terms of the prior, across three experiments we found the novel result that participants were better characterized as using a categorical prior than by a simple parametric distribution or by a kernel density estimate with any appreciable width. Mixture priors could possibly explain the results of Experiments 1 and 2, but Experiment 3 showed that a simple implementation of a mixture prior did not match the data as well for most participants.</p>
<p>The use of a categorical prior was supported by participants’ ability to learn complex multimodal distributions very quickly. The speed and flexibility of participants’ prior learning stands in contrast to work in continuous tasks, where it is difficult to find evidence for quick learning of bimodal priors. One task required 4,000 feedback training trials to teach participants a bimodal distribution [<xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
], and another required 1,700 trials [<xref rid="pcbi.1004859.ref022" ref-type="bibr">22</xref>
]. However, when [<xref rid="pcbi.1004859.ref019" ref-type="bibr">19</xref>
] used 1,500 training trials in an interval timing task, there was some suggestion of bimodality if the peaks were well-separated, but the data could also be explained by a uniform prior.</p>
<p>The only example of equally fast learning of a bimodal prior without giving participants hints comes from other experiments using discrete numerical responses. In the revenue estimation experiments of [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
], it was found that a bimodal prior distribution could be reconstructed within 400 training trials. More impressive was the demonstration that bimodal priors could be reconstructed after as little as 12 trials by individual participants [<xref rid="pcbi.1004859.ref046" ref-type="bibr">46</xref>
]. However, these demonstrations come from tasks in which participants are asked to reconstruct the distribution rather than make an estimate, so this work is the first to show that participants do use bimodal priors when making estimates, and can learn to do so more quickly than in continuous tasks. In addition the priors that participants learned were impressively accurate: we showed that quadrimodal prior distributions could be learned, and that their priors were better described by a categorical distribution rather than a kernel density estimate or some forms of mixture models.</p>
<p>Given these differences in speed of learning, it is interesting to speculate whether there are particular properties of tasks that require discrete numerical responses that make it easier to learn a complex prior. One difference between discrete numerical and continuous estimates is that it is easier to provide clear feedback for discrete numerical estimates. Both [<xref rid="pcbi.1004859.ref019" ref-type="bibr">19</xref>
] and [<xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
] used visual position as feedback in their sensorimotor and interval timing tasks, and noise in vision and memory makes this feedback less certain. In the orientation estimation task of [<xref rid="pcbi.1004859.ref022" ref-type="bibr">22</xref>
] participants were told their average deviation every 20 trials, while this feedback is digital it does not provide as much information as the true orientation used on each trial. It is difficult to see how the feedback could be improved for tasks that require continuous responses: feedback either needs to be susceptible to noise (perhaps both sensory noise and noise in encoding and remembering the feedback) or it is not directly mapped to the responses. Participants cannot perfectly be shown what response they should have made.</p>
<p>In contrast, our experiments and the experiments of [<xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
] showed participants the correct response after every trial in essentially a noise-free fashion. This is a real advantage of using discrete numerical responses and feedback because feedback can be given uncorrupted by sensory noise after every trial and it is easily mapped to the responses than participants make. In fact, [<xref rid="pcbi.1004859.ref046" ref-type="bibr">46</xref>
] explicitly showed this difference when participants were asked to reproduce a distribution: for experiments in which numbered stimuli were replaced by circles of various sizes, participants required more trials and greater separation between the modes to learn the bimodal distributions. It is possible that the differences in clarity of feedback explain the rates at which participants learn bimodal priors in different tasks. If participants are using a form of Occam’s razor when constructing their prior distribution, then the more informative trials would more quickly convince them to abandon a simpler prior in favor of a more complex representation.</p>
</sec>
<sec id="sec010"><title>Implications for learning of priors</title>
<p>The priors learned in our experiments, especially Experiment 3, were much more complex than those taught to participants in other estimation tasks [<xref rid="pcbi.1004859.ref010" ref-type="bibr">10</xref>
, <xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
, <xref rid="pcbi.1004859.ref018" ref-type="bibr">18</xref>
, <xref rid="pcbi.1004859.ref019" ref-type="bibr">19</xref>
, <xref rid="pcbi.1004859.ref022" ref-type="bibr">22</xref>
, <xref rid="pcbi.1004859.ref046" ref-type="bibr">46</xref>
]. In addition to having four modes, our prior had a pattern of low-probability and no-probability responses that participants’ responses matched. Participants were not just representing the prior as a mixture of two parametric components, but were learning the prior probabilities associated with individual responses.</p>
<p>Work using other tasks has demonstrated fairly complex prior learning, but in other tasks it is generally not clear whether participants are learning a prior or a mapping. For example using a categorization task, a subset of participants learned to discriminate a multidimensional quadrimodal distribution from a multidimensional mixture of two Gaussian distributions [<xref rid="pcbi.1004859.ref050" ref-type="bibr">50</xref>
]. While participants were able learn these complex discriminations and their behaviour could be described by a model that approximates Bayesian inference, this work did not rule out a complex decision bound model (i.e., a mapping) as an alternative [<xref rid="pcbi.1004859.ref050" ref-type="bibr">50</xref>
, <xref rid="pcbi.1004859.ref051" ref-type="bibr">51</xref>
].</p>
<p>In Experiment 3 we ran additional trials to test whether participants were actually learning and using a prior or if instead they were learning a mapping from the stimuli to the responses. As discussed in the <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
, we found that the responses of more than half of participants were best explained by a prior rather than a mapping. Use of a prior is also supported by recent work that demonstrated that participants take into account the reliability of various senses in a multisensory numerosity task [<xref rid="pcbi.1004859.ref038" ref-type="bibr">38</xref>
].</p>
<p>Our results contrast with other work showing that participants do not learn a categorical prior. In a continuous estimation task with a wide range of possible responses, a categorical prior did not explain the data as well as a mixture [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
]. Likewise in a numerosity task that showed participants a much wider range of numbers than our experiments, a mixture model provided a better fit to their participants’ data than just using the trained examples as a prior [<xref rid="pcbi.1004859.ref052" ref-type="bibr">52</xref>
].</p>
<p>The key difference is likely the variety of correct responses in each experiment. As the number of potential responses increases it is hard to imagine that participants would precisely track the frequency with which every single number appeared. For example, if every number from 100 to 200 appeared in a random order with the exception of number 134, it is implausible that participants would notice.</p>
<p>This contrast raises questions about where the transition between a categorical prior and a mixture model occurs, and even if there is a distinction between the two. It is possible that participants represent a small set of numbers symbolically and use a categorical prior, but represent a large set of numbers as a mixture prior over a continuous variable. Alternatively, it could be that our categorical prior is simply a mixture with a separate component for each response. In this case, there would likely be a smoother transition between representations of the prior for small and large sets of responses.</p>
</sec>
<sec id="sec011"><title>Implications for how people perform inference</title>
<p>Very few of our participants were best fit by a simple decision function: the max or mean of the untransformed posterior distribution. Instead it appeared that the large majority of participants were performing some kind of approximate inference by drawing one or more samples. Previous work has put forward mechanisms with which this could be done. For example, [<xref rid="pcbi.1004859.ref020" ref-type="bibr">20</xref>
] showed that participants’ responses were consistent with either taking the mean of a number of samples in a continuous estimation task or drawing a single sample from an exponentiated posterior distribution. Later, [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
] disambiguated these two operations with a bimodal prior, showing that raising the posterior distribution to an exponent was the better description.</p>
<p>Both of these mechanisms have been touted as tradeoffs between effort and accuracy, and possibly a rational use of cognitive resources [<xref rid="pcbi.1004859.ref035" ref-type="bibr">35</xref>
], though there is always the possibility that participants have particularly complex hypotheses about the computer’s behavior instead. Drawing a sample may take time or effort, and a small number of samples may provide the best tradeoff between effort and accuracy to yield the highest overall reward [<xref rid="pcbi.1004859.ref036" ref-type="bibr">36</xref>
]. Similarly, raising a posterior distribution to a power has also been cast as a tradeoff between effort and accuracy, but one that assumes effort is required to perform the exponentiation that transforms the belief distribution into a response distribution [<xref rid="pcbi.1004859.ref053" ref-type="bibr">53</xref>
].</p>
<p>While this makes for a nice contrast, the picture is complicated by two additional mechanisms that are essentially indistinguishable from exponentiating the posterior distribution, even for bimodal priors: taking the maximum of a number of samples drawn from the posterior distribution, and taking the maximum of a posterior distribution that has been corrupted with noise [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
]. This last mechanism may well differ from the others if it is assumed that the amount of noise in the posterior is not under the control of the participant; in this case sampling-like behavior would not be a tradeoff between effort and accuracy.</p>
<p>We add to this literature by showing that while an exponentiated posterior distribution is necessary to explain the data as in [<xref rid="pcbi.1004859.ref015" ref-type="bibr">15</xref>
], additionally a large number of participants appear to be taking the mean of a number of samples drawn from this exponentiated posterior distribution. Despite the fact that the maximum of the posterior was asked for by identifying only exactly correct responses as ‘correct’, participants still showed some tendency to produce some responses near the mean of the posterior.</p>
<p>It is interesting to speculate what sort of mechanism could support both a tendency to respond with the max and a tendency to respond with the mean. Our best fitting combination of the mean of samples from an exponentiated posterior distribution is one possibility. It could be that participants are using an exponentiated posterior helps to emphasize the mode which is most likely under the posterior distribution. The later sampling operation helps to select the best response in that mode, trading off the need to pick the highest posterior with the uncertainty introduced by having several highly likely responses in close proximity. It may even be that responses near the mean of the posterior are an accidental byproduct of this two-stage process.</p>
<p>However, the difficult-to-distinguish alternatives to an exponentiated posterior point toward alternative combinations. One of these is a pure sampling approach: participants draw samples from the posterior distribution and sometimes take the maximum of the samples and sometimes take the mean. Another alternative combination is to ascribe all the variability to noise in the posterior distribution: using a noisy posterior distribution, sometimes participants take the maximum and sometimes they take the mean.</p>
<p>To gain additional purchase on this question, we correlated the average response times of participants with the model parameters. It might be expected that if participants were using any of tradeoffs between effort and accuracy that there would be correlations between each participant’s average response time and the number of samples or the exponent that the super-model recovered. This kind of correlation has been found in previous work when looking two-alternative responses [<xref rid="pcbi.1004859.ref036" ref-type="bibr">36</xref>
]. However, both within each experiment and across experiments, we found no reliable relationship between the either of these model parameters and the response times of participants (see <xref ref-type="supplementary-material" rid="pcbi.1004859.s001">S1 Methods</xref>
 for details).</p>
<p>On the surface, this null result could be considered evidence that participants use the maximum or mean of a noisy posterior distribution to produce their estimates and that the amount of noise in the posterior does not depend on participant effort. However, it could also be that participants have such differing goals for effort / accuracy tradeoffs that this washes out whatever correlations there are between response time and model parameters. Future work would provide stronger tests of these mechanisms using within-participant designs that manipulate rewards and time-pressure, along with emphasizing that the computer is not responding to participant behavior.</p>
</sec>
<sec id="sec012"><title>Connections to everyday tasks</title>
<p>There are many examples of discrete numerical tasks in everyday life, such as the examples of painters quickly assessing the size of a wall in order to buy the right number of paint cans or of the party hosts assessing the hunger of their guests when buying a discrete number of pizzas. In our experiments, we used a numerosity task and an area estimation tasks because both of these tasks are well studied and the likelihood distributions have been well characterized. This allowed us to quickly measure the standard deviation of the likelihood for each participant. If we had used less controlled stimuli, then we might have had to measure the full distribution of responses that each individual stimulus evoked in order to characterize the likelihood.</p>
<p>Our laboratory tasks are similar to some everyday tasks. The numerosity task we used is similar to estimating the number of visible stars in the sky (which does vary depending on the time of day and light pollution), and estimating the size of a rectangle shares some similarities to the example of the painter who needs to assess the area of wall. However, there are differences as well: the stars in the sky do not differ in size from night to night and painters can view a wall from many distances and angles before producing an estimate. With the right stimuli, it would be interesting to investigate real-life performance in discrete numerical estimation tasks.</p>
</sec>
<sec id="sec013"><title>Conclusions</title>
<p>Our results demonstrate that people represent a surprising amount of complexity in their prior distribution with relatively few training trials and use this complex prior when making new estimates. Training complex priors has multiple benefits: we can more easily observe how people represent priors and we can investigate some of the more complex schemes describing how people convert the posterior distribution into a single estimate.</p>
<p>This work raises many questions about how prior and posterior distributions are represented and how estimates are made. Discrete numerical estimation tasks, which are simple to implement and quick to train, are well suited for future work in this area.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="sec014"><title>Methods</title>
<sec id="sec015"><title>Experiment 1</title>
<p>Twenty-one University of Warwick students participated in this experiment for course credit. Participants gave written informed consent and the experiment was approved by the University of Warwick Humanities and Social Sciences Research Ethics Committee. Participants were divided into three groups and each participated in one version of the experiment, as outlined below. One participant was excluded because of computer error and second was only given one block of discrimination trials but was included in the analysis.</p>
<p>The stimuli consisted of displays of a number of identical dots. Each display of dots consisted of white dots on a black background, visible for 500 ms. Dot radius and dot density were randomized for each display to encourage participants to make numerosity judgments instead of judging the amount of light produced by the display, the density of the display, or the area occupied by the dots. In a single display all dots had a single common radius of between 3 and 9 pixels that was chosen randomly with equal probability on each trial. Dots were positioned randomly within a circular available region which was centered on the display, subject to the constraint that no dot could lay within one-dot-diameter of another dot. The available region randomly varied between 150 and 450 pixels in radius. A uniform draw was made over the possible values of dot density (where density equaled the maximum number of dots that could appear in that block divided by the area of the available region), which determined the radius of the available region on a trial.</p>
<p>The experiment consisted of a single session with three blocks. The estimation trials, in which participants saw a single display of dots and responded with their estimate of the number of dots in each display, were presented in the second block. The estimation trials differed for the three groups of participants: the narrow group, the medium group, and the wide group. The narrow group consisted of eight participants who saw 800 estimation trials in which the number of dots varied between 23 and 29. For this group, displays with 23 and 29 dots each appeared with probability 0.3 and the displays with the remaining numbers appeared with probability 0.08. The medium group consisted of six participants who saw 700 estimation trials in which the number of dots varied between 23 and 32. For this group, displays with 23 and 32 dots each appeared with probability 0.3 and the displays with the remaining numbers appeared with probability 0.05. The wide group consisted of six participants who saw 700 estimation trials in which the number of dots varied between 23 and 35. For this group, displays with 23 and 35 dots each appeared with probability 0.28 and the displays with the remaining numbers appeared with probability 0.04. Every participant saw 10 practice estimation trials displaying between one and four dots before beginning the main phase of the experiment.</p>
<p>The first and third block consisted of 128 discrimination trials each (always proceeded by 4 practice trials), in which participants saw two sequential displays of dots and picked the display that contained the larger number of dots. On every discrimination trial, one of the displays had a specific high or low number of dots. These anchor numbers were set to be either 11 dots below or above the lowest number seen in the estimation trials. The other display consisted of a number of dots that was equal to the anchor plus an offset. The offset was randomly chosen with equal probability from the set of {-8, -4, -2, -1, 1, 2, 4, 8}. Because of computer error one participant, in the group that had a range of 23 to 29 in the estimation trials, was given anchor trials of 18 and 54, in his or her first block. This participant received the correct anchor trials (12 and 40) in the third block.</p>
<p>On estimation trials, after the dot display disappeared, participants were asked to enter the number of dots that they saw. After entering their response, participants received feedback about whether they were correct and the actual number of dots that were shown. On discrimination trials, participants only received feedback about whether they were correct or not.</p>
</sec>
<sec id="sec016"><title>Experiment 2</title>
<p>Twelve University of Warwick students participated in this experiment for £6 apiece. Participants gave written informed consent and the experiment was approved by the University of Warwick Humanities and Social Sciences Research Ethics Committee. Participants were divided into two groups and each participated in one version of the experiment, as outlined below.</p>
<p>The stimuli consisted of displays of rectangles of particular areas. For each display, the width of the rectangle and its position were randomized to encourage participants to judge the area of the rectangles without exclusively relying on its length, width, or position on the screen. Rectangle width was chosen from a continuous uniform distribution between 2cm and 10cm, with length chosen to achieve the desired area given the width. A fixed positional jitter was chosen for each trial uniformly from a 6cm square. On estimation trials the rectangle was on average in the center of the screen and appeared for 500ms. On discrimination trials, the first rectangle appeared 10cm left of center plus positional jitter for 500ms, and after a 500ms delay the second rectangle appeared 10cm right of center plus positional jitter for 500ms.</p>
<p>The procedure was identical to Experiment 1 with the following exceptions. Participants responded in the estimation trials with the area of the rectangle in square cm. A narrow group and a medium group was run in this experiment consisting of six participants each, with equivalent numbers and probabilities to the groups with the same names in Experiment 1. Every participant saw 700 estimation trials in this experiment.</p>
</sec>
<sec id="sec017"><title>Experiment 3</title>
<p>Twenty-four University of Warwick students participated in this experiment for £6 apiece. Participants gave written informed consent and the experiment was approved by the University of Warwick Humanities and Social Sciences Research Ethics Committee. Participants were divided into three groups and each participated in one version of the experiment, as outlined below. Two participants were excluded from the rectangle group and one from the easier numerosity group because of computer error. Two additional participants from the easier numerosity group saw between ten and fifteen additional non-feedback trials at the beginning of the second estimation block and their data (excluding these trials) were included in the analyses.</p>
<p>For all groups the experiment consisted of four blocks: the first discrimination block of 128 trials, the first estimation block of 500 trials, the second estimation block of 200 trials, and finally the second discrimination block of 128 trials. Feedback was given for all blocks except for the second estimation block which served as a test of whether a prior had been learned. The first discrimination and estimation blocks used easier-to-see displays than the second discrimination and estimation blocks. The three groups of participants differed in the details of the displays they were shown. The difficult numerosity group were run with the same display parameters as in Experiment 1 during the first discrimination and estimation blocks. During the second estimation and discrimination blocks, the first numerosity group was shown each dots display for 50ms and at a much reduced luminance. The easier numerosity group was given different display parameters during the first discrimination and estimation blocks: the range of the common radius of dots was from 6 to 9 pixels, while the available region randomly varied between 225 and 375 pixels. The easier numerosity group had a shorter display (50ms) as well as more variability on the second discrimination and estimation blocks: the range of the common radius of dots was from 1 to 11 pixels, while the available region randomly varied between 150 and 450 pixels. The third group, the rectangle group, was given rectangles that randomly varied in width between 4.8 and 6.5cm during the first discrimination and estimation blocks. In the second discrimination and estimation blocks, this group was given a shorter (50ms) and slightly dimmer (gray instead of white) display with rectangles that randomly varied in width between 2 and 10cm.</p>
<p>All participants in this experiment were given the same trial structure during the estimation trials. The distribution that generated the trials was quadrimodal with a 20% chance of drawing each of the numbers 23, 25, 29, or 31. In addition, there was a 10% chance of drawing each of the numbers 24 and 30.</p>
</sec>
<sec id="sec018"><title>Analysis—Discrimination data</title>
<p>To estimate the variability in participants’ internal estimates, <italic>X</italic>
, we analyzed the discrimination data in order to utilize fitted parameters for the estimation task. Specifically we assumed that the internal estimate was distributed according to a log-normal distribution (in accordance with Weber’s law) log(<italic>X</italic>
)∼<italic>N</italic>
(log(<italic>X</italic>
); log(<italic>S</italic>
), <italic>σ</italic>
<sup>2</sup>
) where <italic>X</italic>
 and <italic>S</italic>
 are positive integers. Participants were presented with two stimuli, <italic>S</italic>
<sub>1</sub>
 and <italic>S</italic>
<sub>2</sub>
 (as in the 2AFC discrimination trials) and had to estimate which one was larger.</p>
<p>In order to fit the variable <italic>σ</italic>
 we maximized the log-likelihood across trials (or rather minimized the negative log-likelihood) using Matlab’s fminbnd function <inline-formula id="pcbi.1004859.e004"><alternatives><graphic xlink:href="pcbi.1004859.e004.jpg" id="pcbi.1004859.e004g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M4"><mml:mrow><mml:mover accent="true"><mml:mi>σ</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:mo form="prefix">arg max</mml:mo>
<mml:msub><mml:mo>Σ</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo form="prefix">log</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>R</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
. The model likelihood <italic>P</italic>
(<italic>R</italic>
<sub><italic>i</italic>
</sub>
|<italic>S</italic>
<sub><italic>i</italic>
,1: 2</sub>
, <italic>σ</italic>
) was estimated numerically for each trial and condition by sampling <italic>X</italic>
<sub>1</sub>
 and <italic>X</italic>
<sub>2</sub>
 10,000 times and for each set generating a fictitious response. <italic>P</italic>
(<italic>R</italic>
<sub><italic>i</italic>
</sub>
|<italic>S</italic>
<sub><italic>i</italic>
,1: 2</sub>
, <italic>σ</italic>
) = (1/10000)Σ<sub><italic>l</italic>
</sub>
<italic>H</italic>
(<italic>X</italic>
<sub><italic>l</italic>
,1</sub>
 − <italic>X</italic>
<sub><italic>l</italic>
,2</sub>
) where H is the Heaviside function and log(<italic>X</italic>
<sub><italic>l</italic>
,1</sub>
)∼<italic>N</italic>
(log(<italic>X</italic>
<sub><italic>l</italic>
,1</sub>
); log(<italic>S</italic>
<sub>1</sub>
), <italic>σ</italic>
<sup>2</sup>
) and log(<italic>X</italic>
<sub><italic>l</italic>
,2</sub>
)∼<italic>N</italic>
(log(<italic>X</italic>
<sub><italic>l</italic>
,2</sub>
); log(<italic>S</italic>
<sub>2</sub>
), <italic>σ</italic>
<sup>2</sup>
) are samples from the generative model above. This analysis assumed that participants chose the most likely response on each trial, which is what was found in a recent analysis of 2AFC choice tasks [<xref rid="pcbi.1004859.ref054" ref-type="bibr">54</xref>
].</p>
</sec>
<sec id="sec019"><title>Analysis—Estimation data</title>
<p>The purpose of our analysis for the Estimation data is to compare and rule out different models of human decision making (see Experiments 1–3 above). One common way of comparing perceptual models of different number of parameters (see e.g. [<xref rid="pcbi.1004859.ref055" ref-type="bibr">55</xref>
]) is to fit the parameters of each model through maximum likelihood and compensate for differences in model complexity by calculating the Bayesian Information Criterion (which penalizes models with large number of parameters).</p>
<p>Our secondary analysis instead created a single model that encompasses all of the candidate models as special cases, that is for certain parameter sets the larger model is equivalent to each of the nested candidate models. The best fit of the parameters therefore shows which of the models best describes the data.</p>
<p>We will first describe the generative model, how to perform inference over it, how to perform model comparison, and lastly we describe the ‘super-model’ and explain what specific models it encompasses.</p>
<sec id="sec020"><title>Generative model</title>
<p>Participants on trial <italic>t</italic>
 are presented with <italic>S</italic>
<sub><italic>t</italic>
</sub>
 number of dots drawn from a fixed categorical distribution <italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
) (see e.g. <xref ref-type="fig" rid="pcbi.1004859.g001">Fig 1</xref>
). In accordance with our description above participants have an internal estimate <italic>X</italic>
<sub><italic>t</italic>
</sub>
 given by log(<italic>X</italic>
<sub><italic>t</italic>
</sub>
)∼<italic>N</italic>
(log(<italic>X</italic>
<sub><italic>t</italic>
</sub>
); log(<italic>S</italic>
<sub><italic>t</italic>
</sub>
), <italic>σ</italic>
<sup>2</sup>
). This naturally leads to larger absolute variability in <italic>X</italic>
 for larger <italic>S</italic>
.</p>
<p>We assume that our participants have an understanding (at an intuitive level) of the process and are using Bayes’ rule to infer the best estimate of the number of dots <inline-formula id="pcbi.1004859.e005"><alternatives><graphic xlink:href="pcbi.1004859.e005.jpg" id="pcbi.1004859.e005g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M5"><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
 given their chosen decision function.</p>
</sec>
<sec id="sec021"><title>Inference</title>
<p>Optimal inference requires combining the likelihood and prior expectations about <italic>S</italic>
<sub><italic>t</italic>
</sub>
 using Bayes’ rule: <italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
|<italic>X</italic>
<sub><italic>t</italic>
</sub>
) ∝ <italic>P</italic>
(<italic>X</italic>
<sub><italic>t</italic>
</sub>
|<italic>S</italic>
<sub><italic>t</italic>
</sub>
)<italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
).</p>
<p>Based on the generative model the likelihood of <italic>S</italic>
<sub><italic>t</italic>
</sub> is given by
<disp-formula id="pcbi.1004859.e006"><alternatives><graphic xlink:href="pcbi.1004859.e006.jpg" id="pcbi.1004859.e006g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M6"><mml:mrow><mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msup><mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>∝</mml:mo>
<mml:mo form="prefix">exp</mml:mo>
<mml:mfenced close=")" open="(" separators=""><mml:mfrac><mml:mrow><mml:mo>-</mml:mo>
<mml:msup><mml:mrow><mml:mo>(</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
<mml:msup><mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
</mml:mrow>
</mml:math>
</alternatives>
<label>(3)</label>
</disp-formula>
where <italic>S</italic>
<sub><italic>t</italic>
</sub>
 is restricted to be integer values. This same representation was used in both the numerosity and rectangle area tasks.</p>
<p>The prior is <italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
) = ∫<italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
|<italic>ϕ</italic>
)<italic>P</italic>
(<italic>ϕ</italic>
)<italic>dϕ</italic>
 where <italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
|<italic>ϕ</italic>
) is a discrete categorical distribution and <italic>P</italic>
(<italic>ϕ</italic>) is a Dirichlet distribution:
<disp-formula id="pcbi.1004859.e007"><alternatives><graphic xlink:href="pcbi.1004859.e007.jpg" id="pcbi.1004859.e007g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M7"><mml:mrow><mml:msub><mml:mi>P</mml:mi>
<mml:mi>α</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>f</mml:mi>
<mml:mfenced close=")" open="(" separators=""><mml:msub><mml:mi>ϕ</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>⋯</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>ϕ</mml:mi>
<mml:mrow><mml:mi>K</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:msub><mml:mi>α</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>⋯</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>α</mml:mi>
<mml:mi>K</mml:mi>
</mml:msub>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mn>1</mml:mn>
<mml:mrow><mml:mi mathvariant="normal">B</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>α</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:munderover><mml:mo>∏</mml:mo>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>K</mml:mi>
</mml:munderover>
<mml:msubsup><mml:mi>ϕ</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow><mml:msub><mml:mi>α</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</alternatives>
<label>(4)</label>
</disp-formula>
where <italic>α</italic>
<sub><italic>i</italic>
</sub>
 are the parameters of the distribution and <italic>B</italic>
(<italic>α</italic>
) is the multinomial Beta function. The prior is then <inline-formula id="pcbi.1004859.e008"><alternatives><graphic xlink:href="pcbi.1004859.e008.jpg" id="pcbi.1004859.e008g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M8"><mml:mrow><mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:msub><mml:mi>α</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow><mml:msub><mml:mo>∑</mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub><mml:mi>α</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 which is a discrete categorical distribution (sometimes referred to as a multinomial with just one draw).</p>
<p>A property of <inline-formula id="pcbi.1004859.e009"><alternatives><graphic xlink:href="pcbi.1004859.e009.jpg" id="pcbi.1004859.e009g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M9"><mml:mrow><mml:mi>P</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 is that it is conjugate with the categorical distribution. This means that if categorical data <italic>S</italic>
<sub><italic>true</italic>
</sub>
 = <bold>K</bold>
 is observed (in terms of feedback provided to participants) the prior over <italic>α</italic> is updated as:
<disp-formula id="pcbi.1004859.e010"><alternatives><graphic xlink:href="pcbi.1004859.e010.jpg" id="pcbi.1004859.e010g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M10"><mml:mrow><mml:msub><mml:mi>P</mml:mi>
<mml:msup><mml:mi>α</mml:mi>
<mml:mo>′</mml:mo>
</mml:msup>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>∝</mml:mo>
<mml:msub><mml:mi>P</mml:mi>
<mml:mi>α</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub><mml:mi>P</mml:mi>
<mml:mi>K</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mrow><mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
<label>(5)</label>
</disp-formula>
where <italic>P</italic>
<sub><italic>K</italic>
</sub>
(<italic>S</italic>
<sub><italic>true</italic>
</sub>
) is a categorical distribution with 1 at the observed value and 0 elsewhere. The new set of parameters are <bold>α</bold>
<sup><bold>′</bold>
</sup>
 = <bold>α</bold>
 + <bold>K</bold>
.</p>
<p>Hence updating the prior after trial <italic>t</italic>
 merely requires increasing the variable <italic>α</italic>
<sub><italic>t</italic>
, <italic>i</italic>
</sub>
 = <italic>α</italic>
<sub><italic>t</italic>
 − 1, <italic>i</italic>
</sub>
 + 1 corresponding to the feedback value <italic>i</italic>
. We will refer to this as the <italic>Dirichlet updating prior</italic>
.</p>
<p>However it is possible that this updating is not done efficiently, leading to a broader updating that encompasses neighboring values. To allow for this we update according to <inline-formula id="pcbi.1004859.e011"><alternatives><graphic xlink:href="pcbi.1004859.e011.jpg" id="pcbi.1004859.e011g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M11"><mml:mrow><mml:mi>δ</mml:mi>
<mml:msub><mml:mi>α</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>N</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup><mml:mi>ψ</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow><mml:mo>Σ</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup><mml:mi>ψ</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 (where <italic>i</italic>
 and <italic>j</italic>
 are integers). We will refer to this as a <italic>kernel density update prior</italic>
. For <italic>ψ</italic>
 less than 0.0115 about 95 percent of the mass of the kernel would be placed at the location of feedback <italic>i</italic>
, <inline-formula id="pcbi.1004859.e012"><alternatives><graphic xlink:href="pcbi.1004859.e012.jpg" id="pcbi.1004859.e012g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M12"><mml:mrow><mml:msubsup><mml:mo>∫</mml:mo>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msup><mml:mrow><mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0115</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>j</mml:mi>
<mml:mo>≈</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>95</mml:mn>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 (for the largest value <italic>j</italic>
 = 32 presented in the experiment), thus we regain the previously described Dirichlet prior update mechanism as a special case. In contrast a very large <italic>ψ</italic>
 would indiscriminately update all values.</p>
<p>Finally, to transform the posterior into a response, three typical decision functions were compared for our Model Comparison (see below): <italic>Max</italic>
 (<inline-formula id="pcbi.1004859.e013"><alternatives><graphic xlink:href="pcbi.1004859.e013.jpg" id="pcbi.1004859.e013g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M13"><mml:mrow><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:msub><mml:mo form="prefix">arg max</mml:mo>
<mml:mi>S</mml:mi>
</mml:msub>
<mml:msub><mml:mi>P</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
), <italic>Mean</italic>
 (<inline-formula id="pcbi.1004859.e014"><alternatives><graphic xlink:href="pcbi.1004859.e014.jpg" id="pcbi.1004859.e014g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M14"><mml:mrow><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:msub><mml:mo>∑</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mi>m</mml:mi>
<mml:msub><mml:mi>P</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
) or <italic>Sampling</italic>
 (<inline-formula id="pcbi.1004859.e015"><alternatives><graphic xlink:href="pcbi.1004859.e015.jpg" id="pcbi.1004859.e015g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M15"><mml:mrow><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:msup><mml:mi>s</mml:mi>
<mml:mo>′</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 where <italic>s</italic>
′ is a sample from the transformed posterior: <italic>s</italic>
′ ∼ <italic>P</italic>
<sub><italic>n</italic>
</sub>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
|<italic>X</italic>
<sub><italic>t</italic>
</sub>
)).</p>
</sec>
</sec>
<sec id="sec022"><title>Model comparison</title>
<p>A traditional way of comparing models is by maximizing the likelihood of each model and correcting for number of parameters using the Bayesian Information Criterion [<xref rid="pcbi.1004859.ref048" ref-type="bibr">48</xref>
]. We factorially combined two different priors (Dirichlet or Gaussian kernel), three types of decision function (mean, max or sampling) and three types of noise models (none, trembling hand or softmax) to produce 16 different models for each participant (note that softmax is not defined for a max decision function removing two of the 2x3x3 = 18 combinations).</p>
<p>The priors were updated using either a variable width Gaussian kernel or “zero width” Dirichlet updating. Combined with the likelihood, this creates the posterior distribution <italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
|<italic>X</italic>
<sub><italic>t</italic>
</sub>
) upon which the subject bases their decision.</p>
<p>A softmax noise model performs a transformation of the posterior
<disp-formula id="pcbi.1004859.e016"><alternatives><graphic xlink:href="pcbi.1004859.e016.jpg" id="pcbi.1004859.e016g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M16"><mml:mrow><mml:msub><mml:mi>P</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>P</mml:mi>
<mml:msup><mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>β</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mrow><mml:msub><mml:mo>Σ</mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mi>P</mml:mi>
<mml:msup><mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>β</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</alternatives>
<label>(6)</label>
</disp-formula>
where <italic>β</italic>
 < 1 leads to a widening (or flattening) of the posterior, while <italic>β</italic>
 > 1 leads to a sharpening of the posterior. For noise models none or trembling hand we set <italic>β</italic>
 = 1.</p>
<p>Choices (as discussed above) are then made based on either max, mean (average), or sampling of the exponentiated posterior. Finally, the trembling hand noise model, included as an alternative to the softmax model, states that participants have a small probability <italic>ϵ</italic>
, of performing a random choice. I.e. <inline-formula id="pcbi.1004859.e017"><alternatives><graphic xlink:href="pcbi.1004859.e017.jpg" id="pcbi.1004859.e017g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M17"><mml:mrow><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>∼</mml:mo>
<mml:mi>U</mml:mi>
<mml:mrow><mml:mo>[</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mn>100</mml:mn>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 if <italic>e</italic>
 < <italic>ϵ</italic>
, where <italic>e</italic>
 is randomly sampled from <italic>U</italic>
[0: 1]. While the trembling hand noise model is structurally implemented after the decision is made, it has a similar effects as the softmax noise model in increasing the variability of the responses.</p>
<p>As the internal variable <italic>X</italic>
<sub><italic>t</italic>
</sub>
, was unknown to the experimenters we used ancestral sampling [<xref rid="pcbi.1004859.ref028" ref-type="bibr">28</xref>
], drawing 10,000 samples from the generative process for <italic>X</italic>
<sub><italic>t</italic>
</sub>
 ∼ <italic>P</italic>
(<italic>X</italic>
<sub><italic>t</italic>
</sub>
|<italic>S</italic>
<sub><italic>t</italic>
</sub>
) followed by the inference by the subject as described above (based on any model parameters). This generates 10,000 independent estimates of <inline-formula id="pcbi.1004859.e018"><alternatives><graphic xlink:href="pcbi.1004859.e018.jpg" id="pcbi.1004859.e018g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M18"><mml:mover accent="true"><mml:mi>S</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
 which we can use to numerically approximate the probability of response <inline-formula id="pcbi.1004859.e019"><alternatives><graphic xlink:href="pcbi.1004859.e019.jpg" id="pcbi.1004859.e019g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M19"><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
. As they are essentially discrete counts we describe them as a discrete categorical distribution which provides us with the model likelihood <inline-formula id="pcbi.1004859.e020"><alternatives><graphic xlink:href="pcbi.1004859.e020.jpg" id="pcbi.1004859.e020g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M20"><mml:mrow><mml:msub><mml:mi>P</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
 for trial <italic>t</italic>
 and any model <italic>m</italic>
 and its associated parameters <italic>par</italic>
.</p>
<p>For each model and parameter set for each subject the log-likelihood was thus calculated as <inline-formula id="pcbi.1004859.e021"><alternatives><graphic xlink:href="pcbi.1004859.e021.jpg" id="pcbi.1004859.e021g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M21"><mml:mrow><mml:mo form="prefix">log</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub><mml:mo>Σ</mml:mo>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo form="prefix">log</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:msub><mml:mi>R</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
. Note that the parameter <italic>σ</italic>
 had been fit independently for each subject through the discrimination experiment above and was thus not a free parameter. Any trials with subject responses of less than 5 dots (<italic>R</italic>
<sub><italic>t</italic>
</sub>
 < 5) were ignored as erroneous key presses given that the true number of dots presented were at least 23. Furthermore to avoid any singularities in calculating the likelihood we allowed for a small probability (0.001) that participants would make a random response in the range [1:100] (similar to a slight Trembling Hand).</p>
<p>Any parameters <italic>par</italic>
 = (<italic>β</italic>
, <italic>ϵ</italic>
, or <italic>ψ</italic>) were fit through maximum likelihood (Matlab’s fminsearch). For model comparison we calculated BIC for each subject and each model:
<disp-formula id="pcbi.1004859.e022"><alternatives><graphic xlink:href="pcbi.1004859.e022.jpg" id="pcbi.1004859.e022g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M22"><mml:mrow><mml:mi>B</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>-</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>*</mml:mo>
<mml:msub><mml:mo>Σ</mml:mo>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo form="prefix">log</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:msub><mml:mi>R</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>*</mml:mo>
<mml:mo form="prefix">log</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
<label>(7)</label>
</disp-formula>
where N is the number of model parameters in <italic>par</italic>
 (0, 1 or 2) and <italic>M</italic>
 is the total number of trials.</p>
</sec>
<sec id="sec023"><title>Parameter fitting (‘Super-model’)</title>
<p>As an alternative to the model comparison we can compare models factorially based on how they update the prior (given by parameter <italic>ψ</italic>
), how many samples are drawn (parameter <italic>n</italic>
), and how exponentiated the posterior is (parameter <italic>β</italic>
). The fitted parameter set for each subject encapsulates which model aspects best explain the subject’s behavior. Thus the specific models compared above become special cases within this larger parameter space, allowing us to extrapolate between the models.</p>
<p>We put all of these into a unified framework, which we refer as the ‘Super-model’ (as other models are sub-sets of it). Given the posterior <italic>P</italic>
(<italic>S</italic>
<sub><italic>t</italic>
</sub>
|<italic>X</italic>
<sub><italic>t</italic>
</sub>) we assume that participants choose their response by averaging over samples:
<disp-formula id="pcbi.1004859.e023"><alternatives><graphic xlink:href="pcbi.1004859.e023.jpg" id="pcbi.1004859.e023g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M23"><mml:mrow><mml:mover accent="true"><mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mn>1</mml:mn>
<mml:mi>n</mml:mi>
</mml:mfrac>
<mml:msubsup><mml:mo>Σ</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msub><mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
<label>(8)</label>
</disp-formula>
where the samples <italic>s</italic>
<sub><italic>k</italic>
</sub> are given by
<disp-formula id="pcbi.1004859.e024"><alternatives><graphic xlink:href="pcbi.1004859.e024.jpg" id="pcbi.1004859.e024g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M24"><mml:mrow><mml:msub><mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mo>∼</mml:mo>
<mml:msub><mml:mi>P</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>P</mml:mi>
<mml:msup><mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>β</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mrow><mml:msub><mml:mo>Σ</mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mi>P</mml:mi>
<mml:msup><mml:mrow><mml:mo>(</mml:mo>
<mml:msub><mml:mi>S</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub><mml:mi>X</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>β</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</alternatives>
<label>(9)</label>
</disp-formula>
where the parameters <italic>n</italic>
 and <italic>β</italic>
 are fitted for each subject (see below).</p>
<p>The summation over samples allows us to approximate properties of specific decision functions. For <italic>n</italic>
 = 1 a single sample is drawn, equivalent to the <italic>sampling decision function</italic>
. The averaging approximates the mean of the distribution for very large <italic>n</italic>
 (thus approximating the <italic>mean decision function</italic>
).</p>
<p>The sampling of <italic>s</italic>
<sub><italic>k</italic>
</sub>
 is from a softmax function (also known as the exponentiated Luce choice rule [<xref rid="pcbi.1004859.ref056" ref-type="bibr">56</xref>
, <xref rid="pcbi.1004859.ref057" ref-type="bibr">57</xref>
]) which causes all the probability density to be sharpened at the peaks of the posterior for larger values of <italic>β</italic>
. For large values of <italic>β</italic>
 the number of samples (<italic>n</italic>
) becomes of little consequence (for example for <italic>β</italic>
 > 2.7, with <italic>X</italic>
<sub><italic>t</italic>
</sub>
 = 23 and <italic>σ</italic>
 = 0.22 after learning for 300 trials, more than 95 percent of the probability is at the maximum a posteriori).</p>
<p>In this way specific parameters emulate the mean (average, <italic>n</italic>
 = 10000), max (<italic>β</italic>
 = 1000) and sampling (<italic>n</italic>
 = 1) decision functions.</p>
<p>In order to fit the variables (<italic>ψ</italic>
, <italic>n</italic>
, <italic>β</italic>
) we performed log-likelihood maximization on <italic>ψ</italic>
, <italic>β</italic>
 using Matlab’s fminsearch function (on −log<italic>L</italic>
 with 5 random initializations), for each of <italic>n</italic>
 = [1, 2, 3, 4, 5, 10, 30, 100, 1000]. For each subject this allowed us to find the parameter set with the maximum likelihood and, given that the models of interest are nested models of this model-parameter set, indirectly find the model that best describes the data.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="sec024"><title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pcbi.1004859.s001"><label>S1 Methods</label>
<caption><title>Includes an investigation of whether a prior or mapping is used, the individual results of model fits, a model verification exercise, and correlations between response time and model parameters.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1004859.s001.pdf"><caption><p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1004859.s002"><label>S1 Dataset</label>
<caption><title>The discrimination and estimation data from every reported participant in all experiments in Matlab format.</title>
<p>(ZIP)</p>
</caption>
<media xlink:href="pcbi.1004859.s002.zip"><caption><p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back><ack><p>The authors thank Luigi Acerbi and Peter Dayan for commenting on an earlier draft of the manuscript.</p>
</ack>
<ref-list><title>References</title>
<ref id="pcbi.1004859.ref001"><label>1</label>
<mixed-citation publication-type="journal"><name><surname>Bernoulli</surname>
<given-names>D</given-names>
</name>
 (<year>1954</year>
) <article-title>Exposition of a new theory on the measurement of risk</article-title>
. <source>Econometrica</source>
: <fpage>23</fpage>
–<lpage>36</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.2307/1909829">10.2307/1909829</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref002"><label>2</label>
<mixed-citation publication-type="book"><name><surname>Savage</surname>
<given-names>LJ</given-names>
</name>
 (<year>1954</year>
) <source>Foundations of statistics</source>
. <publisher-loc>New York</publisher-loc>
: <publisher-name>John Wiley & Sons</publisher-name>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref003"><label>3</label>
<mixed-citation publication-type="book"><name><surname>von Neumann</surname>
<given-names>LJ</given-names>
</name>
, <name><surname>Morgenstern</surname>
<given-names>O</given-names>
</name>
 (<year>1947</year>
) <source>Theory of games and economic behavior</source>
, <volume>volume 60</volume>
<publisher-loc>Princeton, NJ</publisher-loc>
: <publisher-name>Princeton University Press</publisher-name>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref004"><label>4</label>
<mixed-citation publication-type="book"><name><surname>Emlen</surname>
<given-names>JM</given-names>
</name>
 (<year>1966</year>
) <chapter-title>The role of time and energy in food preference</chapter-title>
<source>American Naturalist</source>
: <fpage>611</fpage>
–<lpage>617</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref005"><label>5</label>
<mixed-citation publication-type="book"><name><surname>MacArthur</surname>
<given-names>RH</given-names>
</name>
, <name><surname>Pianka</surname>
<given-names>ER</given-names>
</name>
 (<year>1966</year>
) <chapter-title>On optimal use of a patchy environment</chapter-title>
<source>American Naturalist</source>
: <fpage>603</fpage>
–<lpage>609</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref006"><label>6</label>
<mixed-citation publication-type="book"><name><surname>Anderson</surname>
<given-names>JR</given-names>
</name>
 (<year>1990</year>
) <source>The adaptive character of thought</source>
. <publisher-loc>Hillsdale, NJ</publisher-loc>
: <publisher-name>Erlbaum</publisher-name>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref007"><label>7</label>
<mixed-citation publication-type="journal"><name><surname>Anderson</surname>
<given-names>JR</given-names>
</name>
 (<year>1991</year>
) <article-title>The adaptive nature of human categorization</article-title>
. <source>Psychological Review</source>
<volume>98</volume>
: <fpage>409</fpage>
–<lpage>429</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1037/0033-295X.98.3.409">10.1037/0033-295X.98.3.409</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref008"><label>8</label>
<mixed-citation publication-type="journal"><name><surname>Ernst</surname>
<given-names>MO</given-names>
</name>
, <name><surname>Banks</surname>
<given-names>MS</given-names>
</name>
 (<year>2002</year>
) <article-title>Humans integrate visual and haptic information in a statistically optimal fashion</article-title>
. <source>Nature</source>
<volume>415</volume>
: <fpage>429</fpage>
–<lpage>433</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/415429a">10.1038/415429a</ext-link>
</comment>
<pub-id pub-id-type="pmid">11807554</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref009"><label>9</label>
<mixed-citation publication-type="journal"><name><surname>Geisler</surname>
<given-names>WS</given-names>
</name>
 (<year>1989</year>
) <article-title>Sequential ideal-observer analysis of visual discriminations</article-title>
. <source>Psychological Review</source>
<volume>96</volume>
: <fpage>267</fpage>
–<lpage>314</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1037/0033-295X.96.2.267">10.1037/0033-295X.96.2.267</ext-link>
</comment>
<pub-id pub-id-type="pmid">2652171</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref010"><label>10</label>
<mixed-citation publication-type="journal"><name><surname>Körding</surname>
<given-names>K</given-names>
</name>
, <name><surname>Wolpert</surname>
<given-names>DM</given-names>
</name>
 (<year>2004</year>
) <article-title>Bayesian integration in sensorimotor learning</article-title>
. <source>Nature</source>
<volume>427</volume>
: <fpage>244</fpage>
–<lpage>247</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature02169">10.1038/nature02169</ext-link>
</comment>
<pub-id pub-id-type="pmid">14724638</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref011"><label>11</label>
<mixed-citation publication-type="journal"><name><surname>Oaksford</surname>
<given-names>M</given-names>
</name>
, <name><surname>Chater</surname>
<given-names>N</given-names>
</name>
 (<year>1994</year>
) <article-title>A rational analysis of the selection task as optimal data selection</article-title>
. <source>Psychological Review</source>
<volume>101</volume>
: <fpage>608</fpage>
–<lpage>631</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1037/0033-295X.101.4.608">10.1037/0033-295X.101.4.608</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref012"><label>12</label>
<mixed-citation publication-type="journal"><name><surname>Jones</surname>
<given-names>M</given-names>
</name>
, <name><surname>Love</surname>
<given-names>BC</given-names>
</name>
 (<year>2011</year>
) <article-title>Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition</article-title>
. <source>Behavioral and Brain Sciences</source>
<volume>34</volume>
: <fpage>169</fpage>
–<lpage>188</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1017/S0140525X10003134">10.1017/S0140525X10003134</ext-link>
</comment>
<pub-id pub-id-type="pmid">21864419</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref013"><label>13</label>
<mixed-citation publication-type="journal"><name><surname>Bowers</surname>
<given-names>JS</given-names>
</name>
, <name><surname>Davis</surname>
<given-names>CJ</given-names>
</name>
 (<year>2012</year>
) <article-title>Bayesian just-so stories in psychology and neuroscience</article-title>
. <source>Psychological Bulletin</source>
<volume>138</volume>
: <fpage>389</fpage>
–<lpage>414</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1037/a0026450">10.1037/a0026450</ext-link>
</comment>
<pub-id pub-id-type="pmid">22545686</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref014"><label>14</label>
<mixed-citation publication-type="book"><name><surname>Acerbi</surname>
<given-names>L</given-names>
</name>
, <name><surname>Ma</surname>
<given-names>WJ</given-names>
</name>
, <name><surname>Vijayakumar</surname>
<given-names>S</given-names>
</name>
 (<year>2014</year>
) <chapter-title>A framework for testing identifiability of bayesian models of perception</chapter-title>
 In: <source>Advances in Neural Information Processing Systems</source>
. pp. <fpage>1026</fpage>
–<lpage>1034</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref015"><label>15</label>
<mixed-citation publication-type="journal"><name><surname>Acerbi</surname>
<given-names>L</given-names>
</name>
, <name><surname>Vijayakumar</surname>
<given-names>S</given-names>
</name>
, <name><surname>Wolpert</surname>
<given-names>DM</given-names>
</name>
 (<year>2014</year>
) <article-title>On the origins of suboptimality in human probabilistic inference</article-title>
. <source>PLoS Computational Biology</source>
<volume>10</volume>
: <fpage>e1003661</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1003661">10.1371/journal.pcbi.1003661</ext-link>
</comment>
<pub-id pub-id-type="pmid">24945142</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref016"><label>16</label>
<mixed-citation publication-type="journal"><name><surname>Griffiths</surname>
<given-names>TL</given-names>
</name>
, <name><surname>Chater</surname>
<given-names>N</given-names>
</name>
, <name><surname>Norris</surname>
<given-names>D</given-names>
</name>
, <name><surname>Pouget</surname>
<given-names>A</given-names>
</name>
 (<year>2012</year>
) <article-title>How the Bayesians got their beliefs (and what those beliefs actually are): comment on Bowers and Davis (2012)</article-title>
. <source>Psychological Bulletin</source>
<volume>138</volume>
: <fpage>415</fpage>
–<lpage>422</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1037/a0026884">10.1037/a0026884</ext-link>
</comment>
<pub-id pub-id-type="pmid">22545687</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref017"><label>17</label>
<mixed-citation publication-type="other">Tauber S, Steyvers M (2013) Inferring subjective prior knowledge: An integrative Bayesian approach. In: Proceedings of the 35th Annual Conference of the Cognitive Science Society.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref018"><label>18</label>
<mixed-citation publication-type="journal"><name><surname>Lindskog</surname>
<given-names>M</given-names>
</name>
, <name><surname>Winman</surname>
<given-names>A</given-names>
</name>
, <name><surname>Juslin</surname>
<given-names>P</given-names>
</name>
 (<year>2013</year>
) <article-title>Naïve point estimation</article-title>
. <source>Journal of Experimental Psychology: Learning, Memory, and Cognition</source>
<volume>39</volume>
: <fpage>782</fpage>
<pub-id pub-id-type="pmid">22905935</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref019"><label>19</label>
<mixed-citation publication-type="journal"><name><surname>Acerbi</surname>
<given-names>L</given-names>
</name>
, <name><surname>Wolpert</surname>
<given-names>DM</given-names>
</name>
, <name><surname>Vijayakumar</surname>
<given-names>S</given-names>
</name>
 (<year>2012</year>
) <article-title>Internal representations of temporal statistics and feedback calibrate motor-sensory interval timing</article-title>
. <source>PLoS Computational Biology</source>
<volume>8</volume>
: <fpage>e1002771</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1002771">10.1371/journal.pcbi.1002771</ext-link>
</comment>
<pub-id pub-id-type="pmid">23209386</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref020"><label>20</label>
<mixed-citation publication-type="journal"><name><surname>Battaglia</surname>
<given-names>PW</given-names>
</name>
, <name><surname>Kersten</surname>
<given-names>D</given-names>
</name>
, <name><surname>Schrater</surname>
<given-names>PR</given-names>
</name>
 (<year>2011</year>
) <article-title>How haptic size sensations improve distance perception</article-title>
. <source>PLoS Computational Biology</source>
<volume>7</volume>
: <fpage>e1002080</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1002080">10.1371/journal.pcbi.1002080</ext-link>
</comment>
<pub-id pub-id-type="pmid">21738457</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref021"><label>21</label>
<mixed-citation publication-type="journal"><name><surname>Berniker</surname>
<given-names>M</given-names>
</name>
, <name><surname>Voss</surname>
<given-names>M</given-names>
</name>
, <name><surname>Kording</surname>
<given-names>K</given-names>
</name>
 (<year>2010</year>
) <article-title>Learning priors for Bayesian computations in the nervous system</article-title>
. <source>PLoS ONE</source>
<volume>5</volume>
: <fpage>e12686</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0012686">10.1371/journal.pone.0012686</ext-link>
</comment>
<pub-id pub-id-type="pmid">20844766</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref022"><label>22</label>
<mixed-citation publication-type="journal"><name><surname>Chalk</surname>
<given-names>M</given-names>
</name>
, <name><surname>Seitz</surname>
<given-names>AR</given-names>
</name>
, <name><surname>Seriès</surname>
<given-names>P</given-names>
</name>
 (<year>2010</year>
) <article-title>Rapidly learned stimulus expectations alter perception of motion</article-title>
. <source>Journal of Vision</source>
<volume>10</volume>
: <fpage>2</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1167/10.8.2">10.1167/10.8.2</ext-link>
</comment>
<pub-id pub-id-type="pmid">20884577</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref023"><label>23</label>
<mixed-citation publication-type="journal"><name><surname>Jazayeri</surname>
<given-names>M</given-names>
</name>
, <name><surname>Shadlen</surname>
<given-names>MN</given-names>
</name>
 (<year>2010</year>
) <article-title>Temporal context calibrates interval timing</article-title>
. <source>Nature Neuroscience</source>
<volume>13</volume>
: <fpage>1020</fpage>
–<lpage>1026</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nn.2590">10.1038/nn.2590</ext-link>
</comment>
<pub-id pub-id-type="pmid">20581842</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref024"><label>24</label>
<mixed-citation publication-type="journal"><name><surname>Körding</surname>
<given-names>KP</given-names>
</name>
, <name><surname>Wolpert</surname>
<given-names>DM</given-names>
</name>
 (<year>2004</year>
) <article-title>The loss function of sensorimotor learning</article-title>
. <source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>101</volume>
: <fpage>9839</fpage>
–<lpage>9842</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.0308394101">10.1073/pnas.0308394101</ext-link>
</comment>
<pub-id pub-id-type="pmid">15210973</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref025"><label>25</label>
<mixed-citation publication-type="journal"><name><surname>Wozny</surname>
<given-names>DR</given-names>
</name>
, <name><surname>Beierholm</surname>
<given-names>UR</given-names>
</name>
, <name><surname>Shams</surname>
<given-names>L</given-names>
</name>
 (<year>2010</year>
) <article-title>Probability matching as a computational strategy used in perception</article-title>
. <source>PLoS Computational Biology</source>
<volume>6</volume>
: <fpage>e1000871</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1000871">10.1371/journal.pcbi.1000871</ext-link>
</comment>
<pub-id pub-id-type="pmid">20700493</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref026"><label>26</label>
<mixed-citation publication-type="book"><name><surname>Lehmann</surname>
<given-names>EL</given-names>
</name>
, <name><surname>Casella</surname>
<given-names>G</given-names>
</name>
 (<year>1998</year>
) <source>Theory of point estimation</source>
, <volume>volume 31</volume>
<publisher-name>Springer</publisher-name>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref027"><label>27</label>
<mixed-citation publication-type="journal"><name><surname>Tassinari</surname>
<given-names>H</given-names>
</name>
, <name><surname>Hudson</surname>
<given-names>TE</given-names>
</name>
, <name><surname>Landy</surname>
<given-names>MS</given-names>
</name>
 (<year>2006</year>
) <article-title>Combining priors and noisy visual cues in a rapid pointing task</article-title>
. <source>The Journal of Neuroscience</source>
<volume>26</volume>
: <fpage>10154</fpage>
–<lpage>10163</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1523/JNEUROSCI.2779-06.2006">10.1523/JNEUROSCI.2779-06.2006</ext-link>
</comment>
<pub-id pub-id-type="pmid">17021171</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref028"><label>28</label>
<mixed-citation publication-type="book"><name><surname>Bishop</surname>
<given-names>CM</given-names>
</name>
 (<year>2006</year>
) <source>Pattern recognition and machine learning</source>
. <publisher-loc>New York, NY</publisher-loc>
: <publisher-name>Springer</publisher-name>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref029"><label>29</label>
<mixed-citation publication-type="journal"><name><surname>Ashby</surname>
<given-names>FG</given-names>
</name>
, <name><surname>Alfonso-Reese</surname>
<given-names>LA</given-names>
</name>
 (<year>1995</year>
) <article-title>Categorization as probability density estimation</article-title>
. <source>Journal of Mathematical Psychology</source>
<volume>39</volume>
: <fpage>216</fpage>
–<lpage>233</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1006/jmps.1995.1021">10.1006/jmps.1995.1021</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref030"><label>30</label>
<mixed-citation publication-type="journal"><name><surname>Jäkel</surname>
<given-names>F</given-names>
</name>
, <name><surname>Schölkopf</surname>
<given-names>B</given-names>
</name>
, <name><surname>Wichmann</surname>
<given-names>Fa</given-names>
</name>
 (<year>2008</year>
) <article-title>Generalization and similarity in exemplar models of categorization: Insights from machine learning</article-title>
. <source>Psychonomic Bulletin & Review</source>
<volume>15</volume>
: <fpage>256</fpage>
–<lpage>271</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3758/PBR.15.2.256">10.3758/PBR.15.2.256</ext-link>
</comment>
<pub-id pub-id-type="pmid">18488638</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref031"><label>31</label>
<mixed-citation publication-type="journal"><name><surname>Rosseel</surname>
<given-names>Y</given-names>
</name>
 (<year>2002</year>
) <article-title>Mixture models of categorization</article-title>
. <source>Journal of Mathematical Psychology</source>
<volume>46</volume>
: <fpage>178</fpage>
–<lpage>210</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1006/jmps.2001.1379">10.1006/jmps.2001.1379</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref032"><label>32</label>
<mixed-citation publication-type="journal"><name><surname>Vanpaemel</surname>
<given-names>W</given-names>
</name>
, <name><surname>Storms</surname>
<given-names>G</given-names>
</name>
 (<year>2008</year>
) <article-title>In search of abstraction: the varying abstraction model of categorization</article-title>
. <source>Psychonomic Bulletin & Review</source>
<volume>15</volume>
: <fpage>732</fpage>
–<lpage>749</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3758/PBR.15.4.732">10.3758/PBR.15.4.732</ext-link>
</comment>
<pub-id pub-id-type="pmid">18792499</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref033"><label>33</label>
<mixed-citation publication-type="book"><name><surname>Fretwell</surname>
<given-names>SD</given-names>
</name>
 (<year>1972</year>
) <chapter-title>Populations in a seasonal environment</chapter-title>
<source>Number 5 in Monographs in Population Biology</source>
. <publisher-name>Princeton University Press</publisher-name>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref034"><label>34</label>
<mixed-citation publication-type="journal"><name><surname>Schulze</surname>
<given-names>C</given-names>
</name>
, <name><surname>van Ravenzwaaij</surname>
<given-names>D</given-names>
</name>
, <name><surname>Newell</surname>
<given-names>BR</given-names>
</name>
 (<year>2015</year>
) <article-title>Of matchers and maximizers: How competition shapes choice under risk and uncertainty</article-title>
. <source>Cognitive Psychology</source>
<volume>78</volume>
: <fpage>78</fpage>
–<lpage>98</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.cogpsych.2015.03.002">10.1016/j.cogpsych.2015.03.002</ext-link>
</comment>
<pub-id pub-id-type="pmid">25868112</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref035"><label>35</label>
<mixed-citation publication-type="journal"><name><surname>Griffiths</surname>
<given-names>TL</given-names>
</name>
, <name><surname>Lieder</surname>
<given-names>F</given-names>
</name>
, <name><surname>Goodman</surname>
<given-names>ND</given-names>
</name>
 (<year>2015</year>
) <article-title>Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic</article-title>
. <source>Topics in Cognitive Science</source>
<volume>7</volume>
: <fpage>217</fpage>
–<lpage>229</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1111/tops.12142">10.1111/tops.12142</ext-link>
</comment>
<pub-id pub-id-type="pmid">25898807</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref036"><label>36</label>
<mixed-citation publication-type="journal"><name><surname>Vul</surname>
<given-names>E</given-names>
</name>
, <name><surname>Goodman</surname>
<given-names>N</given-names>
</name>
, <name><surname>Griffiths</surname>
<given-names>TL</given-names>
</name>
, <name><surname>Tenenbaum</surname>
<given-names>JB</given-names>
</name>
 (<year>2014</year>
) <article-title>One and done? Optimal decisions from very few samples</article-title>
. <source>Cognitive Science</source>
<volume>38</volume>
: <fpage>599</fpage>
–<lpage>637</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1111/cogs.12101">10.1111/cogs.12101</ext-link>
</comment>
<pub-id pub-id-type="pmid">24467492</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref037"><label>37</label>
<mixed-citation publication-type="journal"><name><surname>Burgess</surname>
<given-names>A</given-names>
</name>
, <name><surname>Barlow</surname>
<given-names>H</given-names>
</name>
 (<year>1983</year>
) <article-title>The precision of numerosity discrimination in arrays of random dots</article-title>
. <source>Vision Research</source>
<volume>23</volume>
: <fpage>811</fpage>
–<lpage>820</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/0042-6989(83)90204-3">10.1016/0042-6989(83)90204-3</ext-link>
</comment>
<pub-id pub-id-type="pmid">6623941</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref038"><label>38</label>
<mixed-citation publication-type="journal"><name><surname>Kanitscheider</surname>
<given-names>I</given-names>
</name>
, <name><surname>Brown</surname>
<given-names>A</given-names>
</name>
, <name><surname>Pouget</surname>
<given-names>A</given-names>
</name>
, <name><surname>Churchland</surname>
<given-names>AK</given-names>
</name>
 (<year>2015</year>
) <article-title>Multisensory decisions provide support for probabilistic number representations</article-title>
. <source>Journal of Neurophysiology</source>
<volume>113</volume>
: <fpage>3490</fpage>
–<lpage>3498</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1152/jn.00787.2014">10.1152/jn.00787.2014</ext-link>
</comment>
<pub-id pub-id-type="pmid">25744886</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref039"><label>39</label>
<mixed-citation publication-type="journal"><name><surname>Krueger</surname>
<given-names>LE</given-names>
</name>
 (<year>1982</year>
) <article-title>Single judgments of numerosity</article-title>
. <source>Perception & Psychophysics</source>
<volume>31</volume>
: <fpage>175</fpage>
–<lpage>182</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3758/BF03206218">10.3758/BF03206218</ext-link>
</comment>
<pub-id pub-id-type="pmid">7079098</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref040"><label>40</label>
<mixed-citation publication-type="journal"><name><surname>Izard</surname>
<given-names>V</given-names>
</name>
, <name><surname>Dehaene</surname>
<given-names>S</given-names>
</name>
 (<year>2008</year>
) <article-title>Calibrating the mental number line</article-title>
. <source>Cognition</source>
<volume>106</volume>
: <fpage>1221</fpage>
–<lpage>1247</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.cognition.2007.06.004">10.1016/j.cognition.2007.06.004</ext-link>
</comment>
<pub-id pub-id-type="pmid">17678639</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref041"><label>41</label>
<mixed-citation publication-type="journal"><name><surname>Piazza</surname>
<given-names>M</given-names>
</name>
, <name><surname>Izard</surname>
<given-names>V</given-names>
</name>
, <name><surname>Pinel</surname>
<given-names>P</given-names>
</name>
, <name><surname>Le Bihan</surname>
<given-names>D</given-names>
</name>
, <name><surname>Dehaene</surname>
<given-names>S</given-names>
</name>
 (<year>2004</year>
) <article-title>Tuning curves for approximate numerosity in the human intraparietal sulcus</article-title>
. <source>Neuron</source>
<volume>44</volume>
: <fpage>547</fpage>
–<lpage>555</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.neuron.2004.10.014">10.1016/j.neuron.2004.10.014</ext-link>
</comment>
<pub-id pub-id-type="pmid">15504333</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref042"><label>42</label>
<mixed-citation publication-type="journal"><name><surname>Goldstein</surname>
<given-names>DG</given-names>
</name>
, <name><surname>Rothschild</surname>
<given-names>D</given-names>
</name>
 (<year>2014</year>
) <article-title>Lay understanding of probability distributions</article-title>
. <source>Judgment and Decision Making</source>
<volume>9</volume>
: <fpage>1</fpage>
–<lpage>14</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref043"><label>43</label>
<mixed-citation publication-type="journal"><name><surname>Nassar</surname>
<given-names>MR</given-names>
</name>
, <name><surname>Wilson</surname>
<given-names>RC</given-names>
</name>
, <name><surname>Heasly</surname>
<given-names>B</given-names>
</name>
, <name><surname>Gold</surname>
<given-names>JI</given-names>
</name>
 (<year>2010</year>
) <article-title>An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment</article-title>
. <source>The Journal of Neuroscience</source>
<volume>30</volume>
: <fpage>12366</fpage>
–<lpage>12378</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1523/JNEUROSCI.0822-10.2010">10.1523/JNEUROSCI.0822-10.2010</ext-link>
</comment>
<pub-id pub-id-type="pmid">20844132</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref044"><label>44</label>
<mixed-citation publication-type="journal"><name><surname>Pollack</surname>
<given-names>I</given-names>
</name>
 (<year>1952</year>
) <article-title>The information of elementary auditory displays</article-title>
. <source>The Journal of the Acoustical Society of America</source>
<volume>24</volume>
: <fpage>745</fpage>
–<lpage>749</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1121/1.1906969">10.1121/1.1906969</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref045"><label>45</label>
<mixed-citation publication-type="journal"><name><surname>Petrov</surname>
<given-names>AA</given-names>
</name>
, <name><surname>Anderson</surname>
<given-names>JR</given-names>
</name>
 (<year>2005</year>
) <article-title>The dynamics of scaling: a memory-based anchor model of category rating and absolute identification</article-title>
. <source>Psychological Review</source>
<volume>112</volume>
: <fpage>383</fpage>
–<lpage>416</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1037/0033-295X.112.2.383">10.1037/0033-295X.112.2.383</ext-link>
</comment>
<pub-id pub-id-type="pmid">15783291</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref046"><label>46</label>
<mixed-citation publication-type="other">Yeung S, Whalen A (2015) Learning of bimodally distributed quantities. In: Proceedings of the 37th Annual Conference of the Cognitive Science Society. pp. 2745–2750.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref047"><label>47</label>
<mixed-citation publication-type="journal"><name><surname>Van Oeffelen</surname>
<given-names>MP</given-names>
</name>
, <name><surname>Vos</surname>
<given-names>PG</given-names>
</name>
 (<year>1982</year>
) <article-title>A probabilistic model for the discrimination of visual number</article-title>
. <source>Perception & Psychophysics</source>
<volume>32</volume>
: <fpage>163</fpage>
–<lpage>170</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3758/BF03204275">10.3758/BF03204275</ext-link>
</comment>
<pub-id pub-id-type="pmid">7145586</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref048"><label>48</label>
<mixed-citation publication-type="journal"><name><surname>Kass</surname>
<given-names>RE</given-names>
</name>
, <name><surname>Raftery</surname>
<given-names>AE</given-names>
</name>
 (<year>1995</year>
) <article-title>Bayes factors</article-title>
. <source>Journal of the American Statistical Association</source>
<volume>90</volume>
: <fpage>773</fpage>
–<lpage>795</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1080/01621459.1995.10476572">10.1080/01621459.1995.10476572</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref049"><label>49</label>
<mixed-citation publication-type="journal"><name><surname>Alter</surname>
<given-names>AL</given-names>
</name>
, <name><surname>Oppenheimer</surname>
<given-names>DM</given-names>
</name>
 (<year>2006</year>
) <article-title>Predicting short-term stock fluctuations by using processing fluency</article-title>
. <source>Proceedings of the National Academy of Sciences</source>
<volume>103</volume>
: <fpage>9369</fpage>
–<lpage>9372</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.0601071103">10.1073/pnas.0601071103</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref050"><label>50</label>
<mixed-citation publication-type="journal"><name><surname>McKinley</surname>
<given-names>SC</given-names>
</name>
, <name><surname>Nosofsky</surname>
<given-names>RM</given-names>
</name>
 (<year>1995</year>
) <article-title>Investigations of exemplar and decision bound models in large, ill-defined category structures</article-title>
. <source>Journal of Experimental Psychology: Human Perception and Performance</source>
<volume>21</volume>
: <fpage>128</fpage>
<pub-id pub-id-type="pmid">7707028</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref051"><label>51</label>
<mixed-citation publication-type="journal"><name><surname>Shi</surname>
<given-names>L</given-names>
</name>
, <name><surname>Griffiths</surname>
<given-names>TL</given-names>
</name>
, <name><surname>Feldman</surname>
<given-names>NH</given-names>
</name>
, <name><surname>Sanborn</surname>
<given-names>AN</given-names>
</name>
 (<year>2010</year>
) <article-title>Exemplar models as a mechanism for performing Bayesian inference</article-title>
. <source>Psychological Bulletin and Review</source>
<volume>17</volume>
: <fpage>443</fpage>
–<lpage>464</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3758/PBR.17.4.443">10.3758/PBR.17.4.443</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref052"><label>52</label>
<mixed-citation publication-type="journal"><name><surname>Gershman</surname>
<given-names>SJ</given-names>
</name>
, <name><surname>Niv</surname>
<given-names>Y</given-names>
</name>
 (<year>2013</year>
) <article-title>Perceptual estimation obeys Occam’s razor</article-title>
. <source>Frontiers in psychology</source>
<volume>4</volume>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3389/fpsyg.2013.00623">10.3389/fpsyg.2013.00623</ext-link>
</comment>
<pub-id pub-id-type="pmid">24137136</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref053"><label>53</label>
<mixed-citation publication-type="journal"><name><surname>Ortega</surname>
<given-names>PA</given-names>
</name>
, <name><surname>Braun</surname>
<given-names>DA</given-names>
</name>
 (<year>2013</year>
) <source>Thermodynamics as a theory of decision-making with information-processing costs</source>
<volume>469</volume>
: <fpage>20120683</fpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref054"><label>54</label>
<mixed-citation publication-type="journal"><name><surname>Acuna</surname>
<given-names>DE</given-names>
</name>
, <name><surname>Berniker</surname>
<given-names>M</given-names>
</name>
, <name><surname>Fernandes</surname>
<given-names>HL</given-names>
</name>
, <name><surname>Kording</surname>
<given-names>KP</given-names>
</name>
 (<year>2015</year>
) <article-title>Using psychophysics to ask if the brain samples or maximizes</article-title>
. <source>Journal of Vision</source>
<volume>15</volume>
: <fpage>7</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1167/15.3.7">10.1167/15.3.7</ext-link>
</comment>
<pub-id pub-id-type="pmid">25767093</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref055"><label>55</label>
<mixed-citation publication-type="journal"><name><surname>Körding</surname>
<given-names>K</given-names>
</name>
, <name><surname>Beierholm</surname>
<given-names>U</given-names>
</name>
, <name><surname>Ma</surname>
<given-names>W</given-names>
</name>
, <name><surname>Quartz</surname>
<given-names>S</given-names>
</name>
, <name><surname>Tenenbaum</surname>
<given-names>J</given-names>
</name>
, <etal>et al</etal>
 (<year>2007</year>
) <article-title>Causal inference in multisensory perception</article-title>
. <source>Plos One</source>
<volume>2</volume>
: <fpage>e943</fpage>
<comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0000943">10.1371/journal.pone.0000943</ext-link>
</comment>
<pub-id pub-id-type="pmid">17895984</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref056"><label>56</label>
<mixed-citation publication-type="journal"><name><surname>Ashby</surname>
<given-names>FG</given-names>
</name>
, <name><surname>Maddox</surname>
<given-names>WT</given-names>
</name>
 (<year>1993</year>
) <article-title>Relations between prototype, exemplar, and decision bound models of categorization</article-title>
. <source>Journal of Mathematical Psychology</source>
<volume>37</volume>
: <fpage>372</fpage>
–<lpage>400</lpage>
. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1006/jmps.1993.1023">10.1006/jmps.1993.1023</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1004859.ref057"><label>57</label>
<mixed-citation publication-type="book"><name><surname>Luce</surname>
<given-names>RD</given-names>
</name>
 (<year>1963</year>
) <chapter-title>Detection and recognition</chapter-title>
 In: <name><surname>Luce</surname>
<given-names>RD</given-names>
</name>
, <name><surname>Bush</surname>
<given-names>RR</given-names>
</name>
, <name><surname>Galanter</surname>
<given-names>E</given-names>
</name>
, editors, <source>Handbook of Mathematical Psychology</source>
, <volume>Volume 1</volume>
, <publisher-loc>New York and London</publisher-loc>
: <publisher-name>John Wiley and Sons, Inc.</publisher-name>
 pp. <fpage>103</fpage>
–<lpage>190</lpage>
.</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/HapticV1/Data/Pmc/Curation

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000538 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000538 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    HapticV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4829178
   |texte=   Fast and Accurate Learning When Making Discrete Numerical Estimates
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:27070155" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a HapticV1

This area was generated with Dilib version V0.6.23.
Data generation: Mon Jun 13 01:09:46 2016. Site generation: Wed Mar 6 09:54:07 2024

	Serveur d'exploration sur les dispositifs haptiques
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur les dispositifs haptiques

Fast and Accurate Learning When Making Discrete Numerical Estimates

Fast and Accurate Learning When Making Discrete Numerical Estimates

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki