Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000A28 ( Pmc/Corpus ); précédent : 000A279; suivant : 000A290 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">An Experimentally Determined Evolutionary Model Dramatically Improves Phylogenetic Fit</title>
<author>
<name sortKey="Bloom, Jesse D" sort="Bloom, Jesse D" uniqKey="Bloom J" first="Jesse D." last="Bloom">Jesse D. Bloom</name>
<affiliation>
<nlm:aff id="msu173-AFF1">Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24859245</idno>
<idno type="pmc">4104320</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4104320</idno>
<idno type="RBID">PMC:4104320</idno>
<idno type="doi">10.1093/molbev/msu173</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000A28</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000A28</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">An Experimentally Determined Evolutionary Model Dramatically Improves Phylogenetic Fit</title>
<author>
<name sortKey="Bloom, Jesse D" sort="Bloom, Jesse D" uniqKey="Bloom J" first="Jesse D." last="Bloom">Jesse D. Bloom</name>
<affiliation>
<nlm:aff id="msu173-AFF1">Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Molecular Biology and Evolution</title>
<idno type="ISSN">0737-4038</idno>
<idno type="eISSN">1537-1719</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here, I demonstrate an alternative: Experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. Emerging high-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic and genetic analyses.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Araya, Cl" uniqKey="Araya C">CL Araya</name>
</author>
<author>
<name sortKey="Fowler, Dm" uniqKey="Fowler D">DM Fowler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashenberg, O" uniqKey="Ashenberg O">O Ashenberg</name>
</author>
<author>
<name sortKey="Gong, Li" uniqKey="Gong L">LI Gong</name>
</author>
<author>
<name sortKey="Bloom, Jd" uniqKey="Bloom J">JD Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bao, Y" uniqKey="Bao Y">Y Bao</name>
</author>
<author>
<name sortKey="Bolotov, P" uniqKey="Bolotov P">P Bolotov</name>
</author>
<author>
<name sortKey="Dernovoy, D" uniqKey="Dernovoy D">D Dernovoy</name>
</author>
<author>
<name sortKey="Kiryutin, B" uniqKey="Kiryutin B">B Kiryutin</name>
</author>
<author>
<name sortKey="Zaslavsky, L" uniqKey="Zaslavsky L">L Zaslavsky</name>
</author>
<author>
<name sortKey="Tatusova, T" uniqKey="Tatusova T">T Tatusova</name>
</author>
<author>
<name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
<author>
<name sortKey="Lipman, D" uniqKey="Lipman D">D Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bershtein, S" uniqKey="Bershtein S">S Bershtein</name>
</author>
<author>
<name sortKey="Segal, M" uniqKey="Segal M">M Segal</name>
</author>
<author>
<name sortKey="Bekerman, R" uniqKey="Bekerman R">R Bekerman</name>
</author>
<author>
<name sortKey="Tokuriki, N" uniqKey="Tokuriki N">N Tokuriki</name>
</author>
<author>
<name sortKey="Tawfik, Ds" uniqKey="Tawfik D">DS Tawfik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, Jd" uniqKey="Bloom J">JD Bloom</name>
</author>
<author>
<name sortKey="Gong, Li" uniqKey="Gong L">LI Gong</name>
</author>
<author>
<name sortKey="Baltimore, D" uniqKey="Baltimore D">D Baltimore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, Jd" uniqKey="Bloom J">JD Bloom</name>
</author>
<author>
<name sortKey="Labthavikul, St" uniqKey="Labthavikul S">ST Labthavikul</name>
</author>
<author>
<name sortKey="Otey, Cr" uniqKey="Otey C">CR Otey</name>
</author>
<author>
<name sortKey="Arnold, Fh" uniqKey="Arnold F">FH Arnold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, Jd" uniqKey="Bloom J">JD Bloom</name>
</author>
<author>
<name sortKey="Raval, A" uniqKey="Raval A">A Raval</name>
</author>
<author>
<name sortKey="Wilke, Co" uniqKey="Wilke C">CO Wilke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, Jd" uniqKey="Bloom J">JD Bloom</name>
</author>
<author>
<name sortKey="Silberg, Jj" uniqKey="Silberg J">JJ Silberg</name>
</author>
<author>
<name sortKey="Wilke, Co" uniqKey="Wilke C">CO Wilke</name>
</author>
<author>
<name sortKey="Drummond, Da" uniqKey="Drummond D">DA Drummond</name>
</author>
<author>
<name sortKey="Adami, C" uniqKey="Adami C">C Adami</name>
</author>
<author>
<name sortKey="Arnold, Fh" uniqKey="Arnold F">FH Arnold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cirino, Pc" uniqKey="Cirino P">PC Cirino</name>
</author>
<author>
<name sortKey="Mayer, Km" uniqKey="Mayer K">KM Mayer</name>
</author>
<author>
<name sortKey="Umeno, D" uniqKey="Umeno D">D Umeno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crooks, Ge" uniqKey="Crooks G">GE Crooks</name>
</author>
<author>
<name sortKey="Hon, G" uniqKey="Hon G">G Hon</name>
</author>
<author>
<name sortKey="Chandonia, Jm" uniqKey="Chandonia J">JM Chandonia</name>
</author>
<author>
<name sortKey="Brenner, Se" uniqKey="Brenner S">SE Brenner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dang, Cc" uniqKey="Dang C">CC Dang</name>
</author>
<author>
<name sortKey="Le, Qs" uniqKey="Le Q">QS Le</name>
</author>
<author>
<name sortKey="Gascuel, O" uniqKey="Gascuel O">O Gascuel</name>
</author>
<author>
<name sortKey="Le, Vs" uniqKey="Le V">VS Le</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Maio, N" uniqKey="De Maio N">N De Maio</name>
</author>
<author>
<name sortKey="Holmes, I" uniqKey="Holmes I">I Holmes</name>
</author>
<author>
<name sortKey="Schlotterer, C" uniqKey="Schlotterer C">C Schlötterer</name>
</author>
<author>
<name sortKey="Kosiol, C" uniqKey="Kosiol C">C Kosiol</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Desai, Mm" uniqKey="Desai M">MM Desai</name>
</author>
<author>
<name sortKey="Fisher, Ds" uniqKey="Fisher D">DS Fisher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Firnberg, E" uniqKey="Firnberg E">E Firnberg</name>
</author>
<author>
<name sortKey="Ostermeier, M" uniqKey="Ostermeier M">M Ostermeier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fowler, Dm" uniqKey="Fowler D">DM Fowler</name>
</author>
<author>
<name sortKey="Araya, Cl" uniqKey="Araya C">CL Araya</name>
</author>
<author>
<name sortKey="Fleishman, Sj" uniqKey="Fleishman S">SJ Fleishman</name>
</author>
<author>
<name sortKey="Kellogg, Eh" uniqKey="Kellogg E">EH Kellogg</name>
</author>
<author>
<name sortKey="Stephany, Jj" uniqKey="Stephany J">JJ Stephany</name>
</author>
<author>
<name sortKey="Baker, D" uniqKey="Baker D">D Baker</name>
</author>
<author>
<name sortKey="Fields, S" uniqKey="Fields S">S Fields</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gil, M" uniqKey="Gil M">M Gil</name>
</author>
<author>
<name sortKey="Zanetti, Ms" uniqKey="Zanetti M">MS Zanetti</name>
</author>
<author>
<name sortKey="Zoller, S" uniqKey="Zoller S">S Zoller</name>
</author>
<author>
<name sortKey="Anisimova, M" uniqKey="Anisimova M">M Anisimova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Thorne, Jl" uniqKey="Thorne J">JL Thorne</name>
</author>
<author>
<name sortKey="Jones, Dt" uniqKey="Jones D">DT Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gong, Li" uniqKey="Gong L">LI Gong</name>
</author>
<author>
<name sortKey="Suchard, Ma" uniqKey="Suchard M">MA Suchard</name>
</author>
<author>
<name sortKey="Bloom, Jd" uniqKey="Bloom J">JD Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goto, H" uniqKey="Goto H">H Goto</name>
</author>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Halligan, Dl" uniqKey="Halligan D">DL Halligan</name>
</author>
<author>
<name sortKey="Keightley, Pd" uniqKey="Keightley P">PD Keightley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Halpern, Al" uniqKey="Halpern A">AL Halpern</name>
</author>
<author>
<name sortKey="Bruno, Wj" uniqKey="Bruno W">WJ Bruno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hiatt, Jb" uniqKey="Hiatt J">JB Hiatt</name>
</author>
<author>
<name sortKey="Patwardhan, Rp" uniqKey="Patwardhan R">RP Patwardhan</name>
</author>
<author>
<name sortKey="Turner, Eh" uniqKey="Turner E">EH Turner</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffmann, E" uniqKey="Hoffmann E">E Hoffmann</name>
</author>
<author>
<name sortKey="Neumann, G" uniqKey="Neumann G">G Neumann</name>
</author>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
<author>
<name sortKey="Hobom, G" uniqKey="Hobom G">G Hobom</name>
</author>
<author>
<name sortKey="Webster, Rg" uniqKey="Webster R">RG Webster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hou, Y" uniqKey="Hou Y">Y Hou</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Miranda, L" uniqKey="Miranda L">L Miranda</name>
</author>
<author>
<name sortKey="Lin, S" uniqKey="Lin S">S Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huelsenbeck, Jp" uniqKey="Huelsenbeck J">JP Huelsenbeck</name>
</author>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F Ronquist</name>
</author>
<author>
<name sortKey="Nielsen, R" uniqKey="Nielsen R">R Nielsen</name>
</author>
<author>
<name sortKey="Bollback, Jp" uniqKey="Bollback J">JP Bollback</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jain, Pc" uniqKey="Jain P">PC Jain</name>
</author>
<author>
<name sortKey="Varadarajan, R" uniqKey="Varadarajan R">R Varadarajan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Joosten, Rp" uniqKey="Joosten R">RP Joosten</name>
</author>
<author>
<name sortKey="Te Beek, Ta" uniqKey="Te Beek T">TA te Beek</name>
</author>
<author>
<name sortKey="Krieger, E" uniqKey="Krieger E">E Krieger</name>
</author>
<author>
<name sortKey="Hekkelman, Ml" uniqKey="Hekkelman M">ML Hekkelman</name>
</author>
<author>
<name sortKey="Hooft, Rw" uniqKey="Hooft R">RW Hooft</name>
</author>
<author>
<name sortKey="Schneider, R" uniqKey="Schneider R">R Schneider</name>
</author>
<author>
<name sortKey="Sander, C" uniqKey="Sander C">C Sander</name>
</author>
<author>
<name sortKey="Vriend, G" uniqKey="Vriend G">G Vriend</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kabsch, W" uniqKey="Kabsch W">W Kabsch</name>
</author>
<author>
<name sortKey="Sander, C" uniqKey="Sander C">C Sander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kleinman, Cl" uniqKey="Kleinman C">CL Kleinman</name>
</author>
<author>
<name sortKey="Rodrigue, N" uniqKey="Rodrigue N">N Rodrigue</name>
</author>
<author>
<name sortKey="Lartillot, N" uniqKey="Lartillot N">N Lartillot</name>
</author>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kosiol, C" uniqKey="Kosiol C">C Kosiol</name>
</author>
<author>
<name sortKey="Holmes, I" uniqKey="Holmes I">I Holmes</name>
</author>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krasnitz, M" uniqKey="Krasnitz M">M Krasnitz</name>
</author>
<author>
<name sortKey="Levine, Aj" uniqKey="Levine A">AJ Levine</name>
</author>
<author>
<name sortKey="Rabadan, R" uniqKey="Rabadan R">R Rabadan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lartillot, N" uniqKey="Lartillot N">N Lartillot</name>
</author>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le, Sq" uniqKey="Le S">SQ Le</name>
</author>
<author>
<name sortKey="Lartillot, N" uniqKey="Lartillot N">N Lartillot</name>
</author>
<author>
<name sortKey="Gascuel, O" uniqKey="Gascuel O">O Gascuel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lou, Di" uniqKey="Lou D">DI Lou</name>
</author>
<author>
<name sortKey="Hussmann, Ja" uniqKey="Hussmann J">JA Hussmann</name>
</author>
<author>
<name sortKey="Mcbee, Rm" uniqKey="Mcbee R">RM McBee</name>
</author>
<author>
<name sortKey="Acevedo, A" uniqKey="Acevedo A">A Acevedo</name>
</author>
<author>
<name sortKey="Andino, R" uniqKey="Andino R">R Andino</name>
</author>
<author>
<name sortKey="Press, Wh" uniqKey="Press W">WH Press</name>
</author>
<author>
<name sortKey="Sawyer, Sl" uniqKey="Sawyer S">SL Sawyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lunzer, M" uniqKey="Lunzer M">M Lunzer</name>
</author>
<author>
<name sortKey="Golding, Gb" uniqKey="Golding G">GB Golding</name>
</author>
<author>
<name sortKey="Dean, Am" uniqKey="Dean A">AM Dean</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marsh, Ga" uniqKey="Marsh G">GA Marsh</name>
</author>
<author>
<name sortKey="Rabadan, R" uniqKey="Rabadan R">R Rabadán</name>
</author>
<author>
<name sortKey="Levine, Aj" uniqKey="Levine A">AJ Levine</name>
</author>
<author>
<name sortKey="Palese, P" uniqKey="Palese P">P Palese</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Melamed, D" uniqKey="Melamed D">D Melamed</name>
</author>
<author>
<name sortKey="Young, Dl" uniqKey="Young D">DL Young</name>
</author>
<author>
<name sortKey="Gamble, Ce" uniqKey="Gamble C">CE Gamble</name>
</author>
<author>
<name sortKey="Miller, Cr" uniqKey="Miller C">CR Miller</name>
</author>
<author>
<name sortKey="Fields, S" uniqKey="Fields S">S Fields</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Metropolis, N" uniqKey="Metropolis N">N Metropolis</name>
</author>
<author>
<name sortKey="Rosenbluth, Aw" uniqKey="Rosenbluth A">AW Rosenbluth</name>
</author>
<author>
<name sortKey="Rosenbluth, Mn" uniqKey="Rosenbluth M">MN Rosenbluth</name>
</author>
<author>
<name sortKey="Teller, Ah" uniqKey="Teller A">AH Teller</name>
</author>
<author>
<name sortKey="Teller, E" uniqKey="Teller E">E Teller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Neumann, G" uniqKey="Neumann G">G Neumann</name>
</author>
<author>
<name sortKey="Watanabe, T" uniqKey="Watanabe T">T Watanabe</name>
</author>
<author>
<name sortKey="Ito, H" uniqKey="Ito H">H Ito</name>
</author>
<author>
<name sortKey="Watanabe, S" uniqKey="Watanabe S">S Watanabe</name>
</author>
<author>
<name sortKey="Goto, H" uniqKey="Goto H">H Goto</name>
</author>
<author>
<name sortKey="Gao, P" uniqKey="Gao P">P Gao</name>
</author>
<author>
<name sortKey="Hughes, M" uniqKey="Hughes M">M Hughes</name>
</author>
<author>
<name sortKey="Perez, Dr" uniqKey="Perez D">DR Perez</name>
</author>
<author>
<name sortKey="Donis, R" uniqKey="Donis R">R Donis</name>
</author>
<author>
<name sortKey="Hoffmann, E" uniqKey="Hoffmann E">E Hoffmann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Neylon, C" uniqKey="Neylon C">C Neylon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ogliore, R" uniqKey="Ogliore R">R Ogliore</name>
</author>
<author>
<name sortKey="Huss, G" uniqKey="Huss G">G Huss</name>
</author>
<author>
<name sortKey="Nagashima, K" uniqKey="Nagashima K">K Nagashima</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parvin, J" uniqKey="Parvin J">J Parvin</name>
</author>
<author>
<name sortKey="Moscona, A" uniqKey="Moscona A">A Moscona</name>
</author>
<author>
<name sortKey="Pan, W" uniqKey="Pan W">W Pan</name>
</author>
<author>
<name sortKey="Leider, J" uniqKey="Leider J">J Leider</name>
</author>
<author>
<name sortKey="Palese, P" uniqKey="Palese P">P Palese</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pearson, K" uniqKey="Pearson K">K Pearson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pond, Sk" uniqKey="Pond S">SK Pond</name>
</author>
<author>
<name sortKey="Delport, W" uniqKey="Delport W">W Delport</name>
</author>
<author>
<name sortKey="Muse, Sv" uniqKey="Muse S">SV Muse</name>
</author>
<author>
<name sortKey="Scheffler, K" uniqKey="Scheffler K">K Scheffler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pond, Sl" uniqKey="Pond S">SL Pond</name>
</author>
<author>
<name sortKey="Frost, Sd" uniqKey="Frost S">SD Frost</name>
</author>
<author>
<name sortKey="Muse, Sv" uniqKey="Muse S">SV Muse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Portela, A" uniqKey="Portela A">A Portela</name>
</author>
<author>
<name sortKey="Digard, P" uniqKey="Digard P">P Digard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Posada, D" uniqKey="Posada D">D Posada</name>
</author>
<author>
<name sortKey="Buckley, Tr" uniqKey="Buckley T">TR Buckley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Potapov, V" uniqKey="Potapov V">V Potapov</name>
</author>
<author>
<name sortKey="Cohen, M" uniqKey="Cohen M">M Cohen</name>
</author>
<author>
<name sortKey="Schreiber, G" uniqKey="Schreiber G">G Schreiber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rice, P" uniqKey="Rice P">P Rice</name>
</author>
<author>
<name sortKey="Longden, I" uniqKey="Longden I">I Longden</name>
</author>
<author>
<name sortKey="Bleasby, A" uniqKey="Bleasby A">A Bleasby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodrigue, N" uniqKey="Rodrigue N">N Rodrigue</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodrigue, N" uniqKey="Rodrigue N">N Rodrigue</name>
</author>
<author>
<name sortKey="Kleinman, Cl" uniqKey="Kleinman C">CL Kleinman</name>
</author>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
<author>
<name sortKey="Lartillot, N" uniqKey="Lartillot N">N Lartillot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodrigue, N" uniqKey="Rodrigue N">N Rodrigue</name>
</author>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
<author>
<name sortKey="Lartillot, N" uniqKey="Lartillot N">N Lartillot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roscoe, Bp" uniqKey="Roscoe B">BP Roscoe</name>
</author>
<author>
<name sortKey="Thayer, Km" uniqKey="Thayer K">KM Thayer</name>
</author>
<author>
<name sortKey="Zeldovich, Kb" uniqKey="Zeldovich K">KB Zeldovich</name>
</author>
<author>
<name sortKey="Fushman, D" uniqKey="Fushman D">D Fushman</name>
</author>
<author>
<name sortKey="Bolon, Dn" uniqKey="Bolon D">DN Bolon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmitt, Mw" uniqKey="Schmitt M">MW Schmitt</name>
</author>
<author>
<name sortKey="Kennedy, Sr" uniqKey="Kennedy S">SR Kennedy</name>
</author>
<author>
<name sortKey="Salk, Jj" uniqKey="Salk J">JJ Salk</name>
</author>
<author>
<name sortKey="Fox, Ej" uniqKey="Fox E">EJ Fox</name>
</author>
<author>
<name sortKey="Hiatt, Jb" uniqKey="Hiatt J">JB Hiatt</name>
</author>
<author>
<name sortKey="Loeb, La" uniqKey="Loeb L">LA Loeb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schneider, Td" uniqKey="Schneider T">TD Schneider</name>
</author>
<author>
<name sortKey="Stephens, Rm" uniqKey="Stephens R">RM Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Serrano, L" uniqKey="Serrano L">L Serrano</name>
</author>
<author>
<name sortKey="Day, Ag" uniqKey="Day A">AG Day</name>
</author>
<author>
<name sortKey="Fersht, Ar" uniqKey="Fersht A">AR Fersht</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Starita, Lm" uniqKey="Starita L">LM Starita</name>
</author>
<author>
<name sortKey="Pruneda, Jn" uniqKey="Pruneda J">JN Pruneda</name>
</author>
<author>
<name sortKey="Lo, Rs" uniqKey="Lo R">RS Lo</name>
</author>
<author>
<name sortKey="Fowler, Dm" uniqKey="Fowler D">DM Fowler</name>
</author>
<author>
<name sortKey="Kim, Hj" uniqKey="Kim H">HJ Kim</name>
</author>
<author>
<name sortKey="Hiatt, Jb" uniqKey="Hiatt J">JB Hiatt</name>
</author>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
<author>
<name sortKey="Brzovic, Ps" uniqKey="Brzovic P">PS Brzovic</name>
</author>
<author>
<name sortKey="Fields, S" uniqKey="Fields S">S Fields</name>
</author>
<author>
<name sortKey="Klevit, Re" uniqKey="Klevit R">RE Klevit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tamuri, Au" uniqKey="Tamuri A">AU Tamuri</name>
</author>
<author>
<name sortKey="Dos Reis, M" uniqKey="Dos Reis M">M dos Reis</name>
</author>
<author>
<name sortKey="Goldstein, Ra" uniqKey="Goldstein R">RA Goldstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tamuri, Au" uniqKey="Tamuri A">AU Tamuri</name>
</author>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Dos Reis, M" uniqKey="Dos Reis M">M Dos Reis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thorne, Jl" uniqKey="Thorne J">JL Thorne</name>
</author>
<author>
<name sortKey="Choi, Sc" uniqKey="Choi S">SC Choi</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Higgs, Pg" uniqKey="Higgs P">PG Higgs</name>
</author>
<author>
<name sortKey="Kishino, H" uniqKey="Kishino H">H Kishino</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thorne, Jl" uniqKey="Thorne J">JL Thorne</name>
</author>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Jones, Dt" uniqKey="Jones D">DT Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tien, M" uniqKey="Tien M">M Tien</name>
</author>
<author>
<name sortKey="Meyer, Ag" uniqKey="Meyer A">AG Meyer</name>
</author>
<author>
<name sortKey="Spielman, Sj" uniqKey="Spielman S">SJ Spielman</name>
</author>
<author>
<name sortKey="Wilke, Co" uniqKey="Wilke C">CO Wilke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Traxlmayr, Mw" uniqKey="Traxlmayr M">MW Traxlmayr</name>
</author>
<author>
<name sortKey="Hasenhindl, C" uniqKey="Hasenhindl C">C Hasenhindl</name>
</author>
<author>
<name sortKey="Hackl, M" uniqKey="Hackl M">M Hackl</name>
</author>
<author>
<name sortKey="Stadlmayr, G" uniqKey="Stadlmayr G">G Stadlmayr</name>
</author>
<author>
<name sortKey="Rybka, Jd" uniqKey="Rybka J">JD Rybka</name>
</author>
<author>
<name sortKey="Borth, N" uniqKey="Borth N">N Borth</name>
</author>
<author>
<name sortKey="Grillari, J" uniqKey="Grillari J">J Grillari</name>
</author>
<author>
<name sortKey="Ruker, F" uniqKey="Ruker F">F Rüker</name>
</author>
<author>
<name sortKey="Obinger, C" uniqKey="Obinger C">C Obinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Hc" uniqKey="Wang H">HC Wang</name>
</author>
<author>
<name sortKey="Li, K" uniqKey="Li K">K Li</name>
</author>
<author>
<name sortKey="Susko, E" uniqKey="Susko E">E Susko</name>
</author>
<author>
<name sortKey="Roger, Aj" uniqKey="Roger A">AJ Roger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Ch" uniqKey="Wu C">CH Wu</name>
</author>
<author>
<name sortKey="Suchard, Ma" uniqKey="Suchard M">MA Suchard</name>
</author>
<author>
<name sortKey="Drummond, Aj" uniqKey="Drummond A">AJ Drummond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
<author>
<name sortKey="Nielsen, R" uniqKey="Nielsen R">R Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
<author>
<name sortKey="Nielsen, R" uniqKey="Nielsen R">R Nielsen</name>
</author>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Pedersen, Amk" uniqKey="Pedersen A">AMK Pedersen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
<author>
<name sortKey="Wong, Ws" uniqKey="Wong W">WS Wong</name>
</author>
<author>
<name sortKey="Nielsen, R" uniqKey="Nielsen R">R Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ye, Q" uniqKey="Ye Q">Q Ye</name>
</author>
<author>
<name sortKey="Krug, Rm" uniqKey="Krug R">RM Krug</name>
</author>
<author>
<name sortKey="Tao, Yj" uniqKey="Tao Y">YJ Tao</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Mol Biol Evol</journal-id>
<journal-id journal-id-type="iso-abbrev">Mol. Biol. Evol</journal-id>
<journal-id journal-id-type="publisher-id">molbev</journal-id>
<journal-id journal-id-type="hwp">molbiolevol</journal-id>
<journal-title-group>
<journal-title>Molecular Biology and Evolution</journal-title>
</journal-title-group>
<issn pub-type="ppub">0737-4038</issn>
<issn pub-type="epub">1537-1719</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24859245</article-id>
<article-id pub-id-type="pmc">4104320</article-id>
<article-id pub-id-type="doi">10.1093/molbev/msu173</article-id>
<article-id pub-id-type="publisher-id">msu173</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Fast Tracks</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An Experimentally Determined Evolutionary Model Dramatically Improves Phylogenetic Fit</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Bloom</surname>
<given-names>Jesse D.</given-names>
</name>
<xref ref-type="corresp" rid="msu173-COR1">*</xref>
<xref ref-type="aff" rid="msu173-AFF1">
<sup>1</sup>
</xref>
</contrib>
<aff id="msu173-AFF1">
<sup>1</sup>
Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA</aff>
</contrib-group>
<author-notes>
<corresp id="msu173-COR1">
<bold>*Corresponding author:</bold>
E-mail:
<email>jbloom@fhcrc.org</email>
.</corresp>
<fn>
<p>
<bold>Associate editor:</bold>
Jeffrey Thorne</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<month>8</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>24</day>
<month>5</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>24</day>
<month>5</month>
<year>2014</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>31</volume>
<issue>8</issue>
<fpage>1956</fpage>
<lpage>1978</lpage>
<permissions>
<copyright-statement>© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.</copyright-statement>
<copyright-year>2014</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/3.0/" license-type="creative-commons">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here, I demonstrate an alternative: Experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. Emerging high-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic and genetic analyses.</p>
</abstract>
<kwd-group>
<kwd>phylogenetics</kwd>
<kwd>codon model</kwd>
<kwd>substitution model</kwd>
<kwd>influenza</kwd>
<kwd>nucleoprotein</kwd>
<kwd>deep mutational scanning</kwd>
</kwd-group>
<counts>
<page-count count="23"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>The phylogenetic analysis of gene sequences is one of the most important and widely used computational techniques in all of biology. All modern phylogenetic algorithms require a quantitative evolutionary model that specifies the rate at which each site substitutes from one identity to another. These evolutionary models can be used to calculate the statistical likelihood of the sequences given a particular phylogenetic tree (
<xref rid="msu173-B14" ref-type="bibr">Felsenstein 1973</xref>
). Phylogenetic relationships are typically inferred by finding the tree that maximizes this likelihood (
<xref rid="msu173-B15" ref-type="bibr">Felsenstein 1981</xref>
) or by combining the likelihood with a prior to compute posterior probabilities of possible trees (
<xref rid="msu173-B28" ref-type="bibr">Huelsenbeck et al. 2001</xref>
).</p>
<p>Actual sequence evolution is governed by the rates at which mutations arise and the selection that subsequently acts upon them (
<xref rid="msu173-B24" ref-type="bibr">Halpern and Bruno 1998</xref>
;
<xref rid="msu173-B64" ref-type="bibr">Thorne et al. 2007</xref>
). Unfortunately, neither of these aspects of the evolutionary process are traditionally known a priori. The standard approach in molecular phylogenetics is therefore to assume that sites evolve independently and identically, and then construct an evolutionary model that contains free parameters designed to represent features of mutation and selection (
<xref rid="msu173-B20" ref-type="bibr">Goldman and Yang 1994</xref>
;
<xref rid="msu173-B70" ref-type="bibr">Yang 1994</xref>
;
<xref rid="msu173-B72" ref-type="bibr">Yang et al. 2000</xref>
;
<xref rid="msu173-B33" ref-type="bibr">Kosiol et al. 2007</xref>
). This approach suffers from two major problems. First, although adding parameters enhances a model’s fit to data, the parameter values must be estimated from the same sequences that are being analyzed phylogenetically—and so complex models can overfit the data (
<xref rid="msu173-B50" ref-type="bibr">Posada and Buckley 2004</xref>
). Second, even complex models do not contain enough parameters to realistically represent selection, which is highly idiosyncratic to specific sites within a protein. Attempts to predict site-specific selection from protein structure have had limited success (
<xref rid="msu173-B54" ref-type="bibr">Rodrigue et al. 2009</xref>
;
<xref rid="msu173-B32" ref-type="bibr">Kleinman et al. 2010</xref>
), probably because even sophisticated computer programs cannot reliably predict the impact of mutations (
<xref rid="msu173-B51" ref-type="bibr">Potapov et al. 2009</xref>
).</p>
<p>Methods have been developed to infer site-specific selection from naturally occurring sequences (
<xref rid="msu173-B55" ref-type="bibr">Rodrigue et al. 2010</xref>
;
<xref rid="msu173-B62" ref-type="bibr">Tamuri et al. 2012</xref>
,
<xref rid="msu173-B63" ref-type="bibr">2014</xref>
). Because the number of possible mutations is large, steps must be taken to ensure that these methods do not overfit the data (
<xref rid="msu173-B53" ref-type="bibr">Rodrigue 2013</xref>
). However, even when such steps are taken, the inferred site-specific selection parameters cannot easily be applied to phylogenetic analyses. The reason is that the selection parameters are generally inferred from the same naturally occurring sequences that are of phylogenetic interest—and parameters inferred from a data set cannot be used to analyze that same data set without additional procedures to avoid overfitting. The procedures that have been devised to restrain this problem of proliferating free parameters are complex and generally require assuming that sites fall into only a limited number of different classes (
<xref rid="msu173-B35" ref-type="bibr">Lartillot and Philippe 2004</xref>
;
<xref rid="msu173-B36" ref-type="bibr">Le et al. 2008</xref>
;
<xref rid="msu173-B68" ref-type="bibr">Wang et al. 2008</xref>
;
<xref rid="msu173-B69" ref-type="bibr">Wu et al. 2013</xref>
). Therefore, estimating site-specific selection from natural sequences is an imperfect method for inferring realistic evolutionary models for phylogenetic analyses.</p>
<p>Here, I demonstrate a radically different approach for constructing quantitative evolutionary models: Direct experimental measurement. This approach bypasses the aforementioned problem of proliferating free parameters because site-specific selection is measured experimentally without consideration of naturally occurring sequences. The evolutionary models constructed from these experiments therefore do not contain any parameters that must be estimated from the natural sequences that are being analyzed phylogenetically.</p>
<p>Specifically, using influenza nucleoprotein (NP) as an example, I experimentally estimate mutation rates via limiting-dilution passage and site-specific selection via deep mutational scanning (
<xref rid="msu173-B17" ref-type="bibr">Fowler et al. 2010</xref>
;
<xref rid="msu173-B1" ref-type="bibr">Araya and Fowler 2011</xref>
), a combination of high-throughput mutagenesis, functional selection, and deep sequencing. I then show that these experimental measurements can be used to create a parameter-free evolutionary model describes the NP gene phylogeny far better than existing models with numerous free parameters. Finally, I discuss how the increasing availability of data from high-throughput experimental strategies such as the one employed here has the potential to transform analyses of genetic data by augmenting generic statistical models of evolution with detailed molecular level information.</p>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>Components of an Experimentally Determined Evolutionary Model</title>
<p>A phylogenetic evolutionary model specifies the rate at which one genotype is replaced by another. These rates of genotype substitution are determined by the underlying rates at which new mutations arise and the subsequent selection that acts upon them (
<xref rid="msu173-B24" ref-type="bibr">Halpern and Bruno 1998</xref>
;
<xref rid="msu173-B64" ref-type="bibr">Thorne et al. 2007</xref>
). A standard assumption in molecular phylogenetics is that the rate of genotype substitution can be decomposed into independent substitution rates at individual sites. Here, I make this assumption at the level of codon sites, and use
<italic>P
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
to denote the rate that site
<italic>r</italic>
substitutes from codon
<italic>x</italic>
to
<italic>y</italic>
given that the identity is already
<italic>x</italic>
. I further assume that it is possible to decompose
<italic>P
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
as
<disp-formula id="msu173-M1">
<label>(1)</label>
<graphic xlink:href="msu173m1.jpg" position="float"></graphic>
</disp-formula>
where
<italic>Q
<sub>xy</sub>
</italic>
is the rate of mutation from
<italic>x</italic>
to
<italic>y</italic>
(assumed to be constant across sites) and
<italic>F
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
is the site-specific probability that a mutation from
<italic>x</italic>
to
<italic>y</italic>
will fix at site
<italic>r</italic>
if it arises. Both are assumed to be constant over time.</p>
<p>Given the evolutionary model described by
<xref ref-type="disp-formula" rid="msu173-M1">equation (1)</xref>
, the challenge is to experimentally estimate the mutation rates
<italic>Q
<sub>xy</sub>
</italic>
and the fixation probabilities
<italic>F
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
. In the following sections, I describe these experiments.</p>
</sec>
<sec>
<title>Measurement of Mutation Rates</title>
<p>A general challenge in quantifying mutation rates is the difficulty of separating mutations from the subsequent selection that acts upon them. To decouple mutation from selection, I utilized a previously described method for growing influenza viruses that package green fluorescent protein (GFP) in the PB1 segment (
<xref rid="msu173-B5" ref-type="bibr">Bloom et al. 2010</xref>
). The GFP does not contribute to viral growth and so is not under functional selection—therefore, substitutions in this gene accumulate at the mutation rate.</p>
<p>To drive the rapid accumulation of substitutions in the GFP gene, I performed limiting-dilution mutation-accumulation experiments (
<xref rid="msu173-B23" ref-type="bibr">Halligan and Keightley 2009</xref>
). Specifically, I passaged 24 replicate populations of GFP-carrying influenza viruses by limiting dilution in tissue culture, at each passage serially diluting the virus to the lowest concentration capable of infecting target cells. Because each limiting dilution bottlenecks the population to one or a few infectious virions, mutations fix rapidly. After 25 rounds of passage, the GFP gene was Sanger sequenced for each replicate to identify 24 substitutions (
<xref ref-type="table" rid="msu173-T1">tables 1</xref>
and
<xref ref-type="table" rid="msu173-T2">2</xref>
), for an overall rate of
<inline-formula>
<inline-graphic xlink:href="msu173i2.jpg"></inline-graphic>
</inline-formula>
mutations per nucleotide per tissue-culture generation—a value similar to that estimated previously by others using a somewhat different experimental approach (
<xref rid="msu173-B45" ref-type="bibr">Parvin et al. 1986</xref>
). The rates of different types of mutations are in
<xref ref-type="table" rid="msu173-T3">table 3</xref>
and possess expected features such as an elevation of transitions over transversions.
<table-wrap id="msu173-T1" position="float">
<label>Table 1.</label>
<caption>
<p>Mutations Identified by Sequencing the 720-Nucleotide GFP Gene Packaged in the PB1 Segment After 25 Limiting-Dilution Passages for 24 Independent Replicates.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Clone</th>
<th rowspan="1" colspan="1">Mutations</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Clone 1</td>
<td align="left" rowspan="1" colspan="1">G62T (G21V), T693C (synonymous), del153-522 (indel)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 2</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 3</td>
<td align="left" rowspan="1" colspan="1">C29T (T10I)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 4</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 5</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 6</td>
<td align="left" rowspan="1" colspan="1">G429A (synonymous), C447T (synonymous)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 7</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 8</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 9</td>
<td align="left" rowspan="1" colspan="1">C646A (R216S)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 10</td>
<td align="left" rowspan="1" colspan="1">G471T (K157N), G703A (D235N)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 11</td>
<td align="left" rowspan="1" colspan="1">T111C (synonymous), T718G (*240E)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 12</td>
<td align="left" rowspan="1" colspan="1">T25C (F9L), T26C (F9S)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 13</td>
<td align="left" rowspan="1" colspan="1">C45T (synonymous), C549T (synonymous)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 14</td>
<td align="left" rowspan="1" colspan="1">T319C (Y107H), C372T (synonymous), C539T (A180V)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 15</td>
<td align="left" rowspan="1" colspan="1">A488C (K163T)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 16</td>
<td align="left" rowspan="1" colspan="1">G274T (G92C)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 17</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 18</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 19</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 20</td>
<td align="left" rowspan="1" colspan="1">G527A (S176N), A676G (T226A)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 21</td>
<td align="left" rowspan="1" colspan="1">G4A (V2I)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 22</td>
<td align="left" rowspan="1" colspan="1">T266C (M89T)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 23</td>
<td align="left" rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Clone 24</td>
<td align="left" rowspan="1" colspan="1">C30T (synonymous), del45-590 (indel)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF1">
<p>N
<sc>ote</sc>
.—The numbering is sequential beginning with the first nucleotide of the GFP start codon. For nonsynonymous mutations, the induced amino acid change is indicated in parentheses.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="msu173-T2" position="float">
<label>Table 2.</label>
<caption>
<p>Counts for Different Types of Mutations After the 25 Limiting-Dilution Passages.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Mutation Type</th>
<th rowspan="1" colspan="1">Number of Occurrences</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Total substitutions</td>
<td rowspan="1" colspan="1">24</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Transversions</td>
<td rowspan="1" colspan="1">6</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Transitions</td>
<td rowspan="1" colspan="1">18</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Nonsynonymous</td>
<td rowspan="1" colspan="1">15</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Synonymous</td>
<td rowspan="1" colspan="1">8</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Stop codons</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Indels</td>
<td rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td rowspan="1" colspan="1">T → G</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">T → C</td>
<td rowspan="1" colspan="1">6</td>
</tr>
<tr>
<td rowspan="1" colspan="1">T → A</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">G → T</td>
<td rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">G → C</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">G → A</td>
<td rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">C → T</td>
<td rowspan="1" colspan="1">7</td>
</tr>
<tr>
<td rowspan="1" colspan="1">C → G</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">C → A</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">A → T</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">A → G</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">A → C</td>
<td rowspan="1" colspan="1">1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF2">
<p>N
<sc>ote</sc>
.—The numbers are calculated from
<xref ref-type="table" rid="msu173-T1">table 1</xref>
. Given that GFP is 720 nucleotides long, the data suggest a viral mutation rate of
<inline-formula>
<inline-graphic xlink:href="msu173i1.jpg"></inline-graphic>
</inline-formula>
mutations per nucleotide per tissue-culture generation.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="msu173-T3" position="float">
<label>Table 3.</label>
<caption>
<p>Influenza Mutation Rates.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Mutation Type</th>
<th rowspan="1" colspan="1">Rate</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">A → G, T → C (transition)</td>
<td rowspan="1" colspan="1">
<inline-formula>
<inline-graphic xlink:href="msu173i3.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">G → A, C → T (transition)</td>
<td rowspan="1" colspan="1">
<inline-formula>
<inline-graphic xlink:href="msu173i4.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">A → C, T → G (transversion)</td>
<td rowspan="1" colspan="1">
<inline-formula>
<inline-graphic xlink:href="msu173i5.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">C → A, G → T (transversion)</td>
<td rowspan="1" colspan="1">
<inline-formula>
<inline-graphic xlink:href="msu173i6.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">A → T, T → A (transversion)</td>
<td rowspan="1" colspan="1">
<inline-formula>
<inline-graphic xlink:href="msu173i7.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">G → C, C → G (transversion)</td>
<td rowspan="1" colspan="1">
<inline-formula>
<inline-graphic xlink:href="msu173i8.jpg"></inline-graphic>
</inline-formula>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF3">
<p>N
<sc>ote</sc>
.—Numbers represent the probability a site that has the parent identity will mutate to the specified nucleotide in a single tissue-culture generation and are calculated from
<xref ref-type="table" rid="msu173-T1">tables 1</xref>
and
<xref ref-type="table" rid="msu173-T2">2</xref>
after adding one pseudocount to each mutation type. Mutations are in pairs because an observed change of A → G can derive either from this mutation on the sequenced strand or a T → C on the complementary strand, and so the paired mutations are indistinguishable assuming that the same mutational process applies to both strands of the replicated nucleic acid molecule. The numbers are the estimated rates of each individual mutation, so, for example, the observed rate of change from A → G is
<inline-formula>
<inline-graphic xlink:href="msu173i9.jpg"></inline-graphic>
</inline-formula>
because this change can arise from either of the two mutations A → G and T → C.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>Because an observed mutation of A → G can arise from either this change on the sequenced strand or a change of T → C on the complementary strand, then assuming that the same molecular mutation process affects both strands, there are only the six different mutation rates shown in
<xref ref-type="table" rid="msu173-T3">table 3</xref>
. Specifically, let
<italic>R
<sub>m</sub>
</italic>
<sub></sub>
<italic>
<sub>n</sub>
</italic>
represent the rate at which nucleotide
<italic>m</italic>
mutates to
<italic>n</italic>
given that the identity is already
<italic>m</italic>
, and let
<italic>m
<sub>c</sub>
</italic>
denote the complement of
<italic>m</italic>
(e.g., A
<italic>
<sub>c</sub>
</italic>
is
<italic>T</italic>
). The assumption that the same molecular mutation process affects both strands means that
<inline-formula>
<inline-graphic xlink:href="msu173i10.jpg"></inline-graphic>
</inline-formula>
. An additional empirical observation from
<xref ref-type="table" rid="msu173-T3">table 3</xref>
is that the mutation rates for influenza are approximately symmetric, with the rate of each mutation approximately equal to its reversal (
<inline-formula>
<inline-graphic xlink:href="msu173i11.jpg"></inline-graphic>
</inline-formula>
). Because it somewhat simplifies computational aspects of the subsequent phylogenetic analyses, I enforce this empirical observation of approximately symmetric mutation rates to be exactly true by taking the rates of mutations and their reversals to be the average of the two. With the further assumption that codon mutations occur a single nucleotide at a time, the mutation rates
<italic>Q
<sub>xy</sub>
</italic>
from codon
<italic>x</italic>
to
<italic>y</italic>
are estimated from the experimental data in
<xref ref-type="table" rid="msu173-T3">table 3</xref>
as
<disp-formula id="msu173-M2">
<label>(2)</label>
<graphic xlink:href="msu173m2.jpg" position="float"></graphic>
</disp-formula>
These mutation rates define the first term in the evolutionary model specified by
<xref ref-type="disp-formula" rid="msu173-M1">equation (1)</xref>
.</p>
</sec>
<sec>
<title>Deep Mutational Scanning to Assess Effects of Mutations on NP</title>
<p>Estimation of the fixation probabilities
<italic>F
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
in
<xref ref-type="disp-formula" rid="msu173-M1">equation (1)</xref>
requires quantifying the effects of all
<inline-formula>
<inline-graphic xlink:href="msu173i12.jpg"></inline-graphic>
</inline-formula>
possible amino acid mutations to NP. Such large-scale assessments of mutational effects are feasible with the advent of deep mutational scanning, a recently developed experimental strategy of high-throughput mutagenesis, selection, and deep sequencing (
<xref rid="msu173-B17" ref-type="bibr">Fowler et al. 2010</xref>
;
<xref rid="msu173-B1" ref-type="bibr">Araya and Fowler 2011</xref>
) that has now been applied to several genes (
<xref rid="msu173-B17" ref-type="bibr">Fowler et al. 2010</xref>
;
<xref rid="msu173-B67" ref-type="bibr">Traxlmayr et al. 2012</xref>
;
<xref rid="msu173-B40" ref-type="bibr">Melamed et al. 2013</xref>
;
<xref rid="msu173-B56" ref-type="bibr">Roscoe et al. 2013</xref>
;
<xref rid="msu173-B61" ref-type="bibr">Starita et al. 2013</xref>
). Applying this experimental strategy to NP requires creating large libraries of random gene mutants, using these genes to generate pools of mutant influenza viruses which are then passaged at low multiplicity of infection (MOI) to select for functional variants, and finally using Illumina sequencing to assess the frequency of each mutation in the input mutant genes and the resulting viruses. Because NP plays an essential role in influenza genome packaging, replication, and transcription (
<xref rid="msu173-B49" ref-type="bibr">Portela and Digard 2002</xref>
;
<xref rid="msu173-B74" ref-type="bibr">Ye et al. 2006</xref>
), mutations that interfere with NP function or stability will impair or ablate viral growth. Such mutations will therefore be depleted in the mutant viruses relative to the input mutant genes.</p>
<p>Most previous applications of deep mutational scanning have examined single-nucleotide mutations to genes, because such mutations can easily be generated by error-prone polymerase chain reaction (PCR) or other nucleotide-level mutagenesis techniques. However, many amino acid mutations are not accessible by single-nucleotide changes. I therefore used a PCR-based strategy to construct codon-mutant libraries that contained multinucleotide (i.e.,
<monospace>GGC</monospace>
<monospace>
<underline>ACT</underline>
</monospace>
) and single-nucleotide (i.e.,
<monospace>GGC</monospace>
<monospace>
<underline>A</underline>
</monospace>
<monospace>GC</monospace>
) mutations. The use of codon-mutant libraries has an added benefit during the subsequent analysis of the deep sequencing when trying to separate true mutations from errors, because the majority (54 of 63) possible codon mutations involve multinucleotide changes, whereas sequencing and PCR errors generate almost exclusively single-nucleotide changes. I used identical experimental procedures to construct two codon-mutant libraries of NP from the wild-type (WT) human H3N2 strain A/Aichi/2/1968 and two from a variant of this NP with a single amino acid substitution (N334H) that enhances protein stability (
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
;
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
). These codon-mutant libraries are termed WT-1, WT-2, N334H-1, and N334H-2. Each of these four mutant libraries contained more than 10
<sup>6</sup>
unique plasmid clones. Sanger sequencing of 30 clones drawn roughly equally from the four libraries revealed that the number of codon mutations per clone followed a Poisson distribution with a mean of 2.7 (
<xref ref-type="fig" rid="msu173-F1">fig. 1</xref>
). These codon mutations were distributed roughly uniformly along the gene sequence and showed no obvious biases toward specific mutations (
<xref ref-type="fig" rid="msu173-F1">fig. 1</xref>
). Most of the ≈10
<sup>4</sup>
unique amino acid mutations to NP therefore occur in numerous different clones in the four libraries, both individually and in combination with other mutations.
<fig id="msu173-F1" position="float">
<label>F
<sc>ig</sc>
. 1.</label>
<caption>
<p>The codon-mutant libraries as assessed by Sanger sequencing 30 individual clones. (
<italic>A</italic>
) The clones have an average of 2.7 codon mutations and 0.1 indels per full-length NP coding sequence, with the number of mutated codons per gene following an approximately a Poisson distribution. (
<italic>B</italic>
) The number of nucleotide changes per codon mutation is roughly as expected if each codon is randomly mutated to any of the other 63 codons, with a slight elevation in single-nucleotide mutations. (
<italic>C</italic>
) The mutant codons have a uniform base composition. (
<italic>D</italic>
) Mutations occur uniformly along the primary sequence. (
<italic>E</italic>
) In clones with multiple mutations, there is no tendency for mutations to cluster. Shown is the actual distribution of pairwise distances between mutations in all multiply mutated clones compared with the distribution generated by 1,000 simulations where mutations are placed randomly along the primary sequence of each multiple-mutant clone. The data and code for this figure are available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/jbloom/SangerMutantLibraryAnalysis/tree/v0.21">https://github.com/jbloom/SangerMutantLibraryAnalysis/tree/v0.21</ext-link>
(last accessed May 31, 2014).</p>
</caption>
<graphic xlink:href="msu173f1p"></graphic>
</fig>
</p>
<p>To assess effects of the mutations on viral replication, the plasmid mutant libraries were used to create pools of mutant influenza viruses by reverse genetics (
<xref rid="msu173-B26" ref-type="bibr">Hoffmann et al. 2000</xref>
). The viruses were passaged twice in tissue culture at low MOI to enforce a linkage between genotype and phenotype. The NP gene was reverse transcribed and PCR amplified from viral RNA after each passage, and similar PCR amplicons were generated from the plasmid mutant libraries and a variety of controls designed to quantify errors associated with sequencing, reverse transcription, and viral passage (
<xref ref-type="fig" rid="msu173-F2">fig. 2</xref>
). The entire process outlined in
<xref ref-type="fig" rid="msu173-F2">figure 2</xref>
was performed in parallel but separately for each of the four mutant libraries (WT-1, WT-2, N334H-1, and N334H-2) in what will be termed one experimental replicate. This entire process of viral creation, passaging, and sequencing was then repeated independently for all four libraries in a second experimental replicate. The two independent replicates will be termed replicate A and replicate B.
<fig id="msu173-F2" position="float">
<label>F
<sc>ig</sc>
. 2.</label>
<caption>
<p>Design of the deep mutational scanning experiment. The sequenced samples are in yellow. Blue text indicates sources of mutation and selection; red text indicates sources of errors. The comparison of interest is between the mutation frequencies in the mutDNA and mutvirus samples, because changes in frequencies between these samples represent the action of selection. However, because some of the experimental techniques have the potential to introduce errors, the other samples are also sequenced to quantify these unintended sources of error. Each of the two experimental replicates (replicates A and B) involved independently repeating the entire viral rescue, viral passaging, and sequencing process for each of the four plasmid mutant libraries (WT-1, WT-2, N334H-1, and N334H-2).</p>
</caption>
<graphic xlink:href="msu173f2p"></graphic>
</fig>
</p>
<p>The mutation frequencies in all samples were quantified by Illumina sequencing, using overlapping paired-end reads to reduce errors (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">supplementary fig. S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">Supplementary Material</ext-link>
online). Each sample produced ≈10
<sup>7</sup>
paired reads that could be aligned to NP, providing an average of ≈5 × 10
<sup>5</sup>
calls per codon (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">supplementary fig. S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">Supplementary Material</ext-link>
online). Sequencing of unmutated NP plasmid revealed a low rate of errors, which were almost exclusively single-nucleotide changes (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
). As expected, the plasmid mutant libraries contained a high frequency of single and multinucleotide codon mutations (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
). Mutation frequencies for unmutated RNA or viruses created from unmutated NP plasmid were only slightly above the sequencing error rate (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
), indicating that reverse transcription and viral replication introduced few mutations relative to the targeted mutagenesis in the plasmid libraries. Mutation frequencies were reduced in the mutant viruses relative to the mutant plasmids used to create these viruses, particularly for nonsynonymous and stop-codon mutations (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
)—consistent with selection purging deleterious mutations. These results indicate that the deep mutational scanning experiment successfully introduced many of the NP variants in the plasmid mutant libraries into mutant viruses, which were then subjected to purifying selection against mutations that interfered with viral replication.
<fig id="msu173-F3" position="float">
<label>F
<sc>ig</sc>
. 3.</label>
<caption>
<p>Per-codon mutation frequencies for each library (WT-1, WT-2, N334H-1, and N334H-2) in (
<italic>A</italic>
) replicate A or (
<italic>B</italic>
) replicate B. The samples are named as in
<xref ref-type="fig" rid="msu173-F2">figure 2</xref>
. Errors due to Illumina sequencing (DNA sample), reverse transcription (RNA sample), and viral replication (virus-p1 and virus-p2 samples) are rare and are mostly single-nucleotide changes. The codon-mutant libraries (mutDNA) contain a high frequency of single- and multinucleotide changes as expected from Sanger sequencing (rightmost bars of this plot and
<xref ref-type="fig" rid="msu173-F1">fig. 1</xref>
; note that Sanger sequencing is not subject to Illumina sequencing errors that affect all other samples). Mutations are reduced in mutvirus samples relative to mutDNA plasmids used to create these mutant viruses, with most of the reduction in stop-codon and nonsynonymous mutations—as expected if deleterious mutations are purged by purifying selection. Details of the analysis used to generate these figures are at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html">http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html</ext-link>
(last accessed May 31, 2014).</p>
</caption>
<graphic xlink:href="msu173f3p"></graphic>
</fig>
</p>
<p>A key question is the extent to which the possible mutations were sampled in both the plasmid mutant libraries and the mutant viruses created from these plasmids. The deep mutational scanning would not achieve its goal if only a small fraction of possible mutations are sampled by the mutant plasmids or by the mutant viruses created from these plasmids (the latter might be the case if there is a bottleneck during virus creation, such that all viruses are generated from only a few plasmids). Fortunately,
<xref ref-type="fig" rid="msu173-F4">figure 4</xref>
shows that the sampling of mutations was quite extensive in both the mutant plasmids and the mutant viruses. Specifically,
<xref ref-type="fig" rid="msu173-F4">figure 4</xref>
suggests that for each replicate, nearly all codon mutations were sampled numerous times in the plasmid mutant libraries and that over 75% of codon mutations were sampled by the mutant viruses.
<xref ref-type="fig" rid="msu173-F4">Figure 4</xref>
also suggests that replicate A was technically superior to replicate B in the thoroughness with which mutations were sampled by the mutant viruses. Because most amino acids are encoded by multiple codons, the fraction of amino acid mutations sampled in each replicate is even higher than the >75% of sampled codon mutations. So although the experiments may not have exhaustively examined every possible codon mutation, the thoroughness of sampling is certainly sufficient to make the sort of statistical inferences about mutational effects that are necessary to construct a quantitative evolutionary model.
<fig id="msu173-F4" position="float">
<label>F
<sc>ig</sc>
. 4.</label>
<caption>
<p>The completeness with which mutations were sampled in the mutant plasmids and viruses, as assessed by the counts for each multinucleotide codon mutation in the combined libraries of (
<italic>A</italic>
) replicate A or (
<italic>B</italic>
) replicate B. Restricting these plots to multinucleotide codon mutations avoids confounding effects from sequencing errors, which typically generate single-nucleotide codon mutations. Very few multinucleotide codon mutations are observed more than once in the unmutagenized controls (DNA, RNA, virus-p1, and virus-p2). Nearly all multinucleotide codon mutations are observed many times in the mutant plasmid libraries (mutDNA). About half the multinucleotide codon mutations are found at least five times in the mutant viruses (mutvirus-p1 and mutvirus-p2), indicating that at least half the possible mutations were incorporated into a virus. However, this is only a lower bound, because deleterious mutations will be absent from the mutant viruses due to purifying selection. If the analysis is restricted to synonymous multinucleotide codon mutations (which are less likely to be deleterious), then over 75% of the possible mutations were incorporated into a virus. This is still only a lower bound, because even synonymous mutations are sometimes strongly deleterious to influenza (
<xref rid="msu173-B39" ref-type="bibr">Marsh et al. 2008</xref>
). The completeness with which amino acid mutations are sampled is higher due to the redundancy of the genetic code. Note that replicate A is superior to replicate B in terms of the completeness with which the mutations are sampled by the mutant viruses. Details of the analysis used to generate these figures are at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html">http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html</ext-link>
(last accessed May 31, 2014).</p>
</caption>
<graphic xlink:href="msu173f4p"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Inference of Site-Specific Amino Acid Preferences</title>
<p>Qualitatively, it is obvious that changes in mutation frequencies between the plasmid mutant libraries and the resulting mutant viruses reflect selection. However, it is less obvious how to quantitatively analyze this information. Selection acts on the full genomes of all viruses in the population. In contrast, the experiments only measure site-independent mutation frequencies averaged over the population. Here, I have analyzed this data by assuming that each site has an inherent preference for each possible amino acid. The motivation for envisioning site heterogenous but site-independent amino acid preferences comes from experiments suggesting that the dominant constraint on mutations that fix during NP evolution relates to protein stability (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
) and that mutational effects on stability tend to be conserved in a site-independent manner (
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
). Because the experiments generally examine each mutation in combination with several other mutations (the average clone has between two and three codon mutations;
<xref ref-type="fig" rid="msu173-F1">fig. 1</xref>
), the site-specific amino acid preferences are not simply selection coefficients for specific mutations. Instead, they reflect the effect of each mutation averaged over a set of genetic backgrounds.</p>
<p>Specifically, let
<inline-formula>
<inline-graphic xlink:href="msu173i13.jpg"></inline-graphic>
</inline-formula>
denote the preference of site
<italic>r</italic>
for amino acid
<italic>a</italic>
, with
<inline-formula>
<inline-graphic xlink:href="msu173i14.jpg"></inline-graphic>
</inline-formula>
.
<xref ref-type="fig" rid="msu173-F3">Figure 3</xref>
indicates that most observed mutations are the result of the desired codon mutagenesis but that there is also a low rate of apparent mutations arising from Illumina sequencing errors and reverse transcription. The expected frequency
<italic>f
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
of mutant codon
<italic>x</italic>
at site
<italic>r</italic>
in the mutant viruses is related to the preference
<inline-formula>
<inline-graphic xlink:href="msu173i15.jpg"></inline-graphic>
</inline-formula>
for its encoded amino acid
<inline-formula>
<inline-graphic xlink:href="msu173i16.jpg"></inline-graphic>
</inline-formula>
by
<inline-formula>
<inline-graphic xlink:href="msu173i17.jpg"></inline-graphic>
</inline-formula>
where
<italic>μ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
is the frequency that site
<italic>r</italic>
is mutagenized to codon
<italic>x</italic>
in the plasmid mutant library,
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
is the frequency the site is erroneously identified as
<italic>x</italic>
during sequencing,
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
is the frequency the site is mutated to
<italic>x</italic>
during reverse transcription,
<italic>y</italic>
is summed over all codons, and the probability that a site experiences multiple mutations or errors in the same clone is taken to be negligibly small. The observed codon counts are multinomially distributed around these expected frequencies, so by placing a symmetric Dirichlet-distribution prior over
<inline-formula>
<inline-graphic xlink:href="msu173i18.jpg"></inline-graphic>
</inline-formula>
and jointly estimating the error (
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
and
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
) and mutation (
<italic>μ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>x</sub>
</italic>
) rates from the appropriate samples in
<xref ref-type="fig" rid="msu173-F2">figure 2</xref>
, it is possible to infer the posterior mean for all amino acid preferences by Markov chain Monte Carlo (MCMC, see Materials and Methods).</p>
<p>A basic check on the consistency of the overall experimental and computational approach is to compare the amino acid preferences inferred from different replicates or different viral passages of the same replicate.
<xref ref-type="fig" rid="msu173-F5">Figure 5</xref>
<italic>A</italic>
and
<italic>B</italic>
shows that the preferences inferred from the first and second viral passages within each replicate are extremely similar, indicating that most selection occurs during initial viral creation and passage and that technical variation (preparation of samples, stochasticity in sequencing, etc.) has little impact. A more crucial comparison is between the preferences inferred from the two independent experimental replicates. This comparison (
<xref ref-type="fig" rid="msu173-F5">fig. 5</xref>
<italic>C</italic>
) shows that preferences from the independent replicates are substantially but less perfectly correlated—probably the imperfect correlation is because the mutant viruses created by reverse genetics independently in each replicate are different incomplete samples of the many clones in the plasmid mutant libraries. Nonetheless, the substantial correlation between replicates shows that the sampling is sufficient to clearly reveal inherent preferences despite these experimental imperfections. Presumably better inferences can be made by aggregating data via averaging of the preferences from both replicates.
<xref ref-type="fig" rid="msu173-F5">Figure 5</xref>
<italic>D</italic>
shows such average preferences from the first passage of both replicates. These preferences are consistent with existing knowledge about NP function and stability. For example, at the conserved residues in NP’s RNA binding interface (
<xref rid="msu173-B74" ref-type="bibr">Ye et al. 2006</xref>
), the amino acids found in natural sequences tend to be the ones with the highest preferences (
<xref ref-type="table" rid="msu173-T4">table 4</xref>
). Similarly, for mutations that have been experimentally characterized as having large effects on NP protein stability (
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
;
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
), the stabilizing amino acid has the higher preference (
<xref ref-type="table" rid="msu173-T5">table 5</xref>
).
<fig id="msu173-F5" position="float">
<label>F
<sc>ig</sc>
. 5.</label>
<caption>
<p>Amino acid preferences. (
<italic>A</italic>
) and (
<italic>B</italic>
) Preferences inferred from passages 1 and 2 are similar within each replicate, indicating that most selection occurs during initial viral creation and passage and that technical variation is small. (
<italic>C</italic>
) Preferences from the two independent replicates are also correlated but less perfectly. The increased variation is presumably due to stochasticity during the independent viral creation from plasmids for each replicate. (
<italic>D</italic>
) Preferences for all sites in NP (the N-terminal Met was not mutagenized) inferred from passage 1 of the combined replicates. Letters’ heights are proportional to the preference for that amino acid and are colored by hydrophobicity. RSA and secondary structure are overlaid for residues in crystal structure. Correlation plots show Pearson’s
<italic>R</italic>
and
<italic>P</italic>
value. Numerical data for (
<italic>D</italic>
) are in
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">supplementary file S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">Supplementary Material</ext-link>
online. The preferences are consistent with existing knowledge about mutations to NP (
<xref ref-type="table" rid="msu173-T4">tables 4</xref>
and
<xref ref-type="table" rid="msu173-T5">5</xref>
). The computer code used to generate this figure is at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html">http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html</ext-link>
(last accessed May 31, 2014).</p>
</caption>
<graphic xlink:href="msu173f5p"></graphic>
</fig>
<table-wrap id="msu173-T4" position="float">
<label>Table 4.</label>
<caption>
<p>For Residues Involved in NP’s RNA-Binding Groove, the Preferences and Expected Evolutionary Equilibrium Frequencies from the Experiments Correlate Well with the Amino Acid Frequencies in Naturally Occurring Sequences.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Residue</th>
<th rowspan="1" colspan="1">Frequencies in Natural Sequences</th>
<th rowspan="1" colspan="1">Experimentally Measured Amino Acid Preferences</th>
<th rowspan="1" colspan="1">Expected Equilibrium Evolutionary Frequencies from Experiments</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">65</td>
<td align="left" rowspan="1" colspan="1">R (0.83), K (0.17)</td>
<td align="left" rowspan="1" colspan="1">R (0.40), K (0.10), N (0.06)</td>
<td align="left" rowspan="1" colspan="1">R (0.58), S (0.07)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">150</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.46), K (0.06), P (0.05), L (0.05)</td>
<td align="left" rowspan="1" colspan="1">R (0.63), L (0.07)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">152</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.52), K (0.07), Q (0.07)</td>
<td align="left" rowspan="1" colspan="1">R (0.71)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">156</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.52), Q (0.06)</td>
<td align="left" rowspan="1" colspan="1">R (0.69), S (0.06)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">174</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.58), N (0.06), T (0.05)</td>
<td align="left" rowspan="1" colspan="1">R (0.75)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">175</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.46), K (0.16)</td>
<td align="left" rowspan="1" colspan="1">R (0.66), K (0.08), S (0.05)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">195</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.51)</td>
<td align="left" rowspan="1" colspan="1">R (0.69)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">199</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.44), M (0.08), Y (0.06), V (0.05)</td>
<td align="left" rowspan="1" colspan="1">R (0.64), V (0.05)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">213</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.51), N (0.06)</td>
<td align="left" rowspan="1" colspan="1">R (0.69)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">214</td>
<td align="left" rowspan="1" colspan="1">R (0.72), K (0.28)</td>
<td align="left" rowspan="1" colspan="1">K (0.24), H (0.09), R (0.09), Q (0.08), M (0.06), N (0.06), A (0.06), I (0.06)</td>
<td align="left" rowspan="1" colspan="1">R (0.19), K (0.17), A (0.09), H (0.07), I (0.06), L (0.06), Q (0.06)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">221</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.46), E (0.07), K (0.07)</td>
<td align="left" rowspan="1" colspan="1">R (0.66), L (0.05)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">236</td>
<td align="left" rowspan="1" colspan="1">R (0.94), K (0.06)</td>
<td align="left" rowspan="1" colspan="1">K (0.32), R (0.30)</td>
<td align="left" rowspan="1" colspan="1">R (0.51), K (0.18)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">355</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.29), L (0.13), K (0.09)</td>
<td align="left" rowspan="1" colspan="1">R (0.43), L (0.19)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">357</td>
<td align="left" rowspan="1" colspan="1">K (0.56), Q (0.44)</td>
<td align="left" rowspan="1" colspan="1">K (0.38), E (0.09), N (0.07), Y (0.05)</td>
<td align="left" rowspan="1" colspan="1">K (0.31), R (0.09), E (0.08), N (0.06)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">361</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.53), V (0.13)</td>
<td align="left" rowspan="1" colspan="1">R (0.68), V (0.11)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">391</td>
<td align="left" rowspan="1" colspan="1">R (1.00)</td>
<td align="left" rowspan="1" colspan="1">R (0.59), K (0.09)</td>
<td align="left" rowspan="1" colspan="1">R (0.77)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">148</td>
<td align="left" rowspan="1" colspan="1">Y (1.00)</td>
<td align="left" rowspan="1" colspan="1">Y (0.54), I (0.06)</td>
<td align="left" rowspan="1" colspan="1">Y (0.44), I (0.07), T (0.07), P (0.06), S (0.06)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF4">
<p>N
<sc>ote</sc>
.—Shown are the 17 residues in the NP RNA-binding groove in
<xref rid="msu173-B74" ref-type="bibr">Ye et al. (2006)</xref>
. The second column gives the frequencies of amino acids in all 21,108 full-length NP sequences from influenza A (excluding bat lineages) in the Influenza Virus Resource as of January 31, 2014. The third column gives the experimentally measured amino acid preferences (
<xref ref-type="fig" rid="msu173-F5">fig. 5</xref>
<italic>D</italic>
). The fourth column gives the expected evolutionary equilibrium frequency of the amino acids (
<xref ref-type="fig" rid="msu173-F6">fig. 6</xref>
). Only residues with frequencies or preferences
<inline-formula>
<inline-graphic xlink:href="msu173i19.jpg"></inline-graphic>
</inline-formula>
are listed. In all cases, the most abundant amino acid in the natural sequences has the highest expected evolutionary equilibrium frequency. In 15 of 17 cases, the most abundant amino acid in the natural sequences has the highest experimentally measured preference—in the other two cases, the most abundant amino acid in the natural sequences is among those with the highest preference.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="msu173-T5" position="float">
<label>Table 5.</label>
<caption>
<p>For Residues Where Mutations Have Previously Been Characterized as Having Large Effects on the Stability of the A/Aichi/2/1968 NP, the More Stable Amino Acid Has a Higher Preference and Is Also More Frequent in Actual NP Sequences.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Residue</th>
<th rowspan="1" colspan="1">Stability Measurement</th>
<th rowspan="1" colspan="1">Frequencies in Natural Sequences</th>
<th rowspan="1" colspan="1">Experimentally Measured Amino Acid Preferences</th>
<th rowspan="1" colspan="1">Expected Equilibrium Evolutionary Frequencies from Experiments</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">259</td>
<td align="left" rowspan="1" colspan="1">L259S is destabilizing (
<inline-formula>
<inline-graphic xlink:href="msu173i20.jpg"></inline-graphic>
</inline-formula>
)</td>
<td align="left" rowspan="1" colspan="1">L (0.98), S (0.02)</td>
<td align="left" rowspan="1" colspan="1">L (0.23), S (0.04)</td>
<td align="left" rowspan="1" colspan="1">L (0.36), S (0.06)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">280</td>
<td align="left" rowspan="1" colspan="1">V280A is destabilizing (
<inline-formula>
<inline-graphic xlink:href="msu173i21.jpg"></inline-graphic>
</inline-formula>
)</td>
<td align="left" rowspan="1" colspan="1">V (0.89), A (0.10)</td>
<td align="left" rowspan="1" colspan="1">V (0.19), A (0.02)</td>
<td align="left" rowspan="1" colspan="1">V (0.25), A (0.03)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">334</td>
<td align="left" rowspan="1" colspan="1">N334H is stabilizing (
<inline-formula>
<inline-graphic xlink:href="msu173i22.jpg"></inline-graphic>
</inline-formula>
)</td>
<td align="left" rowspan="1" colspan="1">H (0.93), N (0.07)</td>
<td align="left" rowspan="1" colspan="1">H (0.28), N (0.12)</td>
<td align="left" rowspan="1" colspan="1">H (0.23), N (0.10)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">384</td>
<td align="left" rowspan="1" colspan="1">R384G is destabilizing (
<inline-formula>
<inline-graphic xlink:href="msu173i23.jpg"></inline-graphic>
</inline-formula>
)</td>
<td align="left" rowspan="1" colspan="1">R (0.80), G (0.17)</td>
<td align="left" rowspan="1" colspan="1">R (0.22), G (0.04)</td>
<td align="left" rowspan="1" colspan="1">R (0.39), G (0.04)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF5">
<p>N
<sc>ote</sc>
.—The second column gives the experimentally measured change in melting temperature (Δ
<italic>T</italic>
<sub>m</sub>
) induced by the mutation to the A/Aichi/2/1968 NP as measured in (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
); these mutational effects on stability are largely conserved in other NPs (
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
). The third column gives the frequencies of the amino acids in all 21,108 full-length NP sequences from influenza A (excluding bat lineages) in the Influenza Virus Resource as of January 31, 2014. The fourth column gives the experimentally measured amino acid preferences (
<xref ref-type="fig" rid="msu173-F5">fig. 5</xref>
<italic>D</italic>
). The fifth column gives the expected evolutionary equilibrium frequency of the amino acids (
<xref ref-type="fig" rid="msu173-F6">fig. 6</xref>
).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec>
<title>The Experimentally Determined Evolutionary Model</title>
<p>The final step is to use the amino acid preferences to estimate the fixation probabilities
<italic>F
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
, which can then be combined with the mutation rates to create a fully experimentally determined evolutionary model. Intuitively, it is obvious that the amino acid preferences provide information about the fixation probabilities. For instance, it seems reasonable to expect that a mutation from
<italic>x</italic>
to
<italic>y</italic>
at site
<italic>r</italic>
will be more likely to fix (relatively larger value of
<italic>F
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
) if amino acid
<inline-formula>
<inline-graphic xlink:href="msu173i24.jpg"></inline-graphic>
</inline-formula>
is preferred to
<inline-formula>
<inline-graphic xlink:href="msu173i25.jpg"></inline-graphic>
</inline-formula>
at this site (if
<inline-formula>
<inline-graphic xlink:href="msu173i26.jpg"></inline-graphic>
</inline-formula>
) and less likely to fix if
<inline-formula>
<inline-graphic xlink:href="msu173i27.jpg"></inline-graphic>
</inline-formula>
. However, the exact relationship between the amino acid preferences and the fixation probabilities is unclear. A rigorous derivation would require knowledge of unknown and probably unmeasurable population-genetics parameters for both the deep mutational scanning experiment and the naturally evolving populations that gave rise to the sequences being analyzed phylogenetically. Instead, I provide two heuristic relationships. Both relationships satisfy detailed balance (reversibility), such that
<inline-formula>
<inline-graphic xlink:href="msu173i28.jpg"></inline-graphic>
</inline-formula>
, meaning that
<italic>F
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>xy</sub>
</italic>
defines a Markov process with
<inline-formula>
<inline-graphic xlink:href="msu173i29.jpg"></inline-graphic>
</inline-formula>
proportional to its stationary state when all amino acid interchanges are equally probable.</p>
<p>It is helpful to first consider what the amino acid preferences values actually represent. Most NP variants in the deep mutational scanning libraries contain multiple mutations, so the amino acid preferences represent the mutational effects averaged over the nearby genetic neighborhood of the parent protein. Therefore, one interpretation is that a preference is proportional to the fraction of genetic backgrounds in which a mutation is tolerated, such that a mutation from
<italic>x</italic>
to
<italic>y</italic>
is always tolerated if
<inline-formula>
<inline-graphic xlink:href="msu173i30.jpg"></inline-graphic>
</inline-formula>
but is only sometimes tolerated if
<inline-formula>
<inline-graphic xlink:href="msu173i31.jpg"></inline-graphic>
</inline-formula>
. In this interpretation, there should be strong selection during initial viral growth depending on whether the mutation is tolerated in the particular genetic background in which it occurs, and then there should be little further enrichment or depletion during subsequent viral passages—loosely consistent with
<xref ref-type="fig" rid="msu173-F5">fig. 5</xref>
<italic>A,B</italic>
, which shows that the amino acid preferences inferred after two viral passages are very similar to those inferred after one passage. Note that this interpretation can be related to the selection-threshold evolutionary dynamics described in
<xref rid="msu173-B7" ref-type="bibr">Bloom et al. (2007)</xref>
. An equation that describes this scenario is
<disp-formula id="msu173-M3">
<label>(3)</label>
<graphic xlink:href="msu173m3.jpg" position="float"></graphic>
</disp-formula>
This equation is equivalent to the Metropolis acceptance criterion (
<xref rid="msu173-B41" ref-type="bibr">Metropolis et al. 1953</xref>
).</p>
<p>An alternative interpretation is that
<inline-formula>
<inline-graphic xlink:href="msu173i32.jpg"></inline-graphic>
</inline-formula>
reflects the selection coefficient for the amino acid
<inline-formula>
<inline-graphic xlink:href="msu173i33.jpg"></inline-graphic>
</inline-formula>
at site
<italic>r</italic>
. In this case, if the
<inline-formula>
<inline-graphic xlink:href="msu173i34.jpg"></inline-graphic>
</inline-formula>
values represent the expected amino acid equilibrium frequencies in a hypothetical evolving population in which all amino acid interchanges are equally likely, and assuming (probably unrealistically) that this hypothetical population and the actual population in which NP evolves are in the weak-mutation limit (i.e., the population is mostly homogenous, see
<xref rid="msu173-B13" ref-type="bibr">Desai and Fisher 2007</xref>
) and have identical constant effective population sizes, then
<xref rid="msu173-B24" ref-type="bibr">Halpern and Bruno (1998)</xref>
derive
<disp-formula id="msu173-M4">
<label>(4)</label>
<graphic xlink:href="msu173m4.jpg" position="float"></graphic>
</disp-formula>
</p>
<p>Given one of these definitions for the fixation probabilities and the mutation rates defined by
<xref ref-type="disp-formula" rid="msu173-M2">equation (2)</xref>
, the experimentally determined evolutionary model is defined by
<xref ref-type="disp-formula" rid="msu173-M1">equation (1)</xref>
. For the mutation rates and fixation probabilities used here, this evolutionary model defines a stochastic process with a unique stationary state for each site
<italic>r</italic>
. These stationary states give the expected amino acid frequencies at evolutionary equilibrium. These evolutionary equilibrium frequencies are shown in
<xref ref-type="fig" rid="msu173-F6">figure 6</xref>
and are somewhat different than the amino acid preferences because they also depend on the structure of the genetic code (and the mutation rates when these are nonsymmetric). For example, if arginine and lysine have equal preferences at a site, arginine will be more evolutionarily abundant because it has more codons.
<fig id="msu173-F6" position="float">
<label>F
<sc>ig</sc>
. 6.</label>
<caption>
<p>The expected frequencies of the amino acids at evolutionary equilibrium using the experimentally determined evolutionary model from passage 1 of the combined replicates and
<xref ref-type="disp-formula" rid="msu173-M3">equation (3)</xref>
for the fixation probabilities. Note that these expected frequencies are slightly different than the amino acid preferences in
<xref ref-type="fig" rid="msu173-F5">figure 5</xref>
<italic>D</italic>
due to the structure of the genetic code. For instance, when arginine and lysine have equal preferences at a site, arginine will tend to have a higher evolutionary equilibrium frequency because it is encoded by more codons. The numerical data are in
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">supplementary file S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">Supplementary Material</ext-link>
online. The computer code used to generate this plot is at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2013Analysis_Influenza_NP_Human_1918_Descended.html">http://jbloom.github.io/phyloExpCM/example_2013Analysis_Influenza_NP_Human_1918_Descended.html</ext-link>
(last accessed May 31, 2014).</p>
</caption>
<graphic xlink:href="msu173f6p"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Phylogenetic Analyses</title>
<p>The experimentally determined evolutionary model can be used to compute phylogenetic likelihoods, thereby enabling its comparison to existing models. To perform these comparisons, I first used codonPhyML (
<xref rid="msu173-B18" ref-type="bibr">Gil et al. 2013</xref>
) to infer maximum-likelihood trees (
<xref ref-type="fig" rid="msu173-F7">fig. 7</xref>
) for NP sequences from human influenza using the Goldman–Yang (GY94) (
<xref rid="msu173-B20" ref-type="bibr">Goldman and Yang 1994</xref>
) and the Kosiol et al
<italic>.</italic>
(
<xref rid="msu173-B33" ref-type="bibr">2007</xref>
, KOSI07+F) codon substitution models. These tree topologies were then fixed, and the branch lengths and model parameters were optimized by maximum likelihood for each of the models.
<fig id="msu173-F7" position="float">
<label>F
<sc>ig</sc>
. 7.</label>
<caption>
<p>Phylogenetic tree of NPs from human influenza descended from a close relative of the 1918 virus. Black: H1N1 from 1918 lineage; green: seasonal H1N1; red: H2N2; blue: H3N2. Maximum-likelihood trees constructed using codonPhyML (
<xref rid="msu173-B18" ref-type="bibr">Gil et al. 2013</xref>
) with (
<italic>A</italic>
) the GY94 substitution model or (
<italic>B</italic>
) the KOSI07+F substitution model. Up to three NP sequences per year from each subtype were used to build the tree. The A/Aichi/2/1968 NP that was the subject of this experiment was not one of the NP sequences randomly subsampled for the tree, so its name is indicated close to a nearly identical sequence that is shown in the tree. The computer code used to generate this tree is at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2013Analysis_Influenza_NP_Human_1918_Descended.html">http://jbloom.github.io/phyloExpCM/example_2013Analysis_Influenza_NP_Human_1918_Descended.html</ext-link>
(last accessed May 31, 2014).</p>
</caption>
<graphic xlink:href="msu173f7p"></graphic>
</fig>
</p>
<p>These models differ in their number of free parameters. A “free parameter” is any variable with a value that is determined from the same naturally occurring NP sequences that are being analyzed phylogenetically. The experimentally determined evolutionary model has no free parameters, because all of the properties of this model were determined by experiments that did not utilize information from naturally occurring NP sequences (the amino acid preferences are inferred from the experiments using a symmetric prior, so in the absence of experimental data all 20 amino acids would be inferred as equally preferable at each site). Similarly, although the KOSI07+F model has a large number of exchangeability variables that were determined empirically, these variables are not free parameters because they were specified ahead of time from analysis of a general set of gene homologs that did not include NP. However, both GY94 and KOSI07+F also contain free parameters that are estimated from the NP sequences that are being analyzed phylogenetically. In the simplest form, GY94 contains 11 such free parameters (nine equilibrium frequencies plus transition–transversion and synonymous–nonsynonymous ratios), whereas KOSI07+F contains 62 parameters (60 frequencies plus transition–transversion and synonymous–nonsynonymous ratios). More complex variants add parameters allowing variation in substitution rate (
<xref rid="msu173-B70" ref-type="bibr">Yang 1994</xref>
) or synonymous–nonsynonymous ratio among sites or lineages (
<xref rid="msu173-B71" ref-type="bibr">Yang and Nielsen 1998</xref>
;
<xref rid="msu173-B72" ref-type="bibr">Yang et al. 2000</xref>
). For all these models, HYPHY (
<xref rid="msu173-B48" ref-type="bibr">Pond et al. 2005</xref>
) was used to calculate the likelihood after optimizing branch lengths and model parameters on the fixed tree topologies.</p>
<p>Comparison of these likelihoods strikingly validates the superiority of the experimentally determined model (
<xref ref-type="table" rid="msu173-T6">tables 6</xref>
and
<xref ref-type="table" rid="msu173-T7">7</xref>
). Adding free parameters generally improves a model’s fit to data, and this is true within GY94 and KOSI07+F. However, the parameter-free experimentally determined evolutionary model describes the sequence phylogeny with a likelihood far greater than even the most highly parameterized GY94 and KOSI07+F variants. Interpreting the amino acid preferences as the fraction of genetic backgrounds that tolerate a mutation (
<xref ref-type="disp-formula" rid="msu173-M3">eq. 3</xref>
) outperforms interpreting them as selection coefficients (
<xref ref-type="disp-formula" rid="msu173-M4">eq. 4</xref>
), although either interpretation yields evolutionary models for NP far superior to GY94 or KOSI07+F. Comparison using Akaike information content (AIC) to penalize parameters (
<xref rid="msu173-B50" ref-type="bibr">Posada and Buckley 2004</xref>
) even more emphatically highlights the superiority of the experimentally determined models.
<table-wrap id="msu173-T6" position="float">
<label>Table 6.</label>
<caption>
<p>Likelihoods Computed Using Various Evolutionary Models After Optimizing the Branch Lengths for the Fixed Tree Topology Inferred Using the GY94 model (
<xref ref-type="fig" rid="msu173-F7">fig. 7</xref>
).</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Model</th>
<th rowspan="1" colspan="1">ΔAIC</th>
<th rowspan="1" colspan="1">Log Likelihood</th>
<th rowspan="1" colspan="1">Parameters (Optimized + Empirical)</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Experimental, combined replicates</td>
<td rowspan="1" colspan="1">0.0</td>
<td rowspan="1" colspan="1">−12,338.9</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Experimental, replicate A</td>
<td rowspan="1" colspan="1">67.9</td>
<td rowspan="1" colspan="1">−12,372.8</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Experimental, replicate B</td>
<td rowspan="1" colspan="1">106.1</td>
<td rowspan="1" colspan="1">−12,392.0</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Halpern and Bruno, combined replicates</td>
<td rowspan="1" colspan="1">357.9</td>
<td rowspan="1" colspan="1">−12,517.9</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Halpern and Bruno, replicate A</td>
<td rowspan="1" colspan="1">393.0</td>
<td rowspan="1" colspan="1">−12,535.4</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Halpern and Bruno, replicate B</td>
<td rowspan="1" colspan="1">455.5</td>
<td rowspan="1" colspan="1">−12,566.7</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, beta
<italic>ω</italic>
plus positive, one rate (M8)</td>
<td rowspan="1" colspan="1">1,136.8</td>
<td rowspan="1" colspan="1">−12,893.3</td>
<td rowspan="1" colspan="1">14 (5 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, three-category
<italic>ω</italic>
, one rate (M2a)</td>
<td rowspan="1" colspan="1">1,209.5</td>
<td rowspan="1" colspan="1">−12,929.7</td>
<td rowspan="1" colspan="1">14 (5 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, gamma
<italic>ω</italic>
, one rate (M5)</td>
<td rowspan="1" colspan="1">1,218.0</td>
<td rowspan="1" colspan="1">−12,935.9</td>
<td rowspan="1" colspan="1">12 (3 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, one
<italic>ω</italic>
, gamma rates</td>
<td rowspan="1" colspan="1">1,485.7</td>
<td rowspan="1" colspan="1">−13,069.8</td>
<td rowspan="1" colspan="1">12 (3 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, three-category
<italic>ω</italic>
, one rate (M2a)</td>
<td rowspan="1" colspan="1">1,679.7</td>
<td rowspan="1" colspan="1">−13,113.8</td>
<td rowspan="1" colspan="1">65 (5 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, M8 rates-one</td>
<td rowspan="1" colspan="1">1,680.5</td>
<td rowspan="1" colspan="1">−13,114.1</td>
<td rowspan="1" colspan="1">65 (5 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, one
<italic>ω</italic>
, one rate (M0)</td>
<td rowspan="1" colspan="1">1,754.1</td>
<td rowspan="1" colspan="1">−13,205.0</td>
<td rowspan="1" colspan="1">11 (2 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, gamma
<italic>ω</italic>
, one rate</td>
<td rowspan="1" colspan="1">1,757.7</td>
<td rowspan="1" colspan="1"> − 13,154.8</td>
<td rowspan="1" colspan="1">63 (3 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, one
<italic>ω</italic>
, gamma rates</td>
<td rowspan="1" colspan="1">1,831.1</td>
<td rowspan="1" colspan="1">−13,191.5</td>
<td rowspan="1" colspan="1">63 (3 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, branch-specific
<italic>ω</italic>
, gamma rates (M5)</td>
<td rowspan="1" colspan="1">1,972.3</td>
<td rowspan="1" colspan="1">−12,769.1</td>
<td rowspan="1" colspan="1">556 (547 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, one
<italic>ω</italic>
, one rate (M0)</td>
<td rowspan="1" colspan="1">2,254.2</td>
<td rowspan="1" colspan="1">−13,404.0</td>
<td rowspan="1" colspan="1">62 (2 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, branch-specific
<italic>ω</italic>
, gamma rates</td>
<td rowspan="1" colspan="1">2,319.5</td>
<td rowspan="1" colspan="1">−12,891.7</td>
<td rowspan="1" colspan="1">607 (547 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized experimental, combined replicates</td>
<td rowspan="1" colspan="1">3,741.0</td>
<td rowspan="1" colspan="1">−14,209.4</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized experimental, replicate A</td>
<td rowspan="1" colspan="1">3,809.6</td>
<td rowspan="1" colspan="1">−14,243.7</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized experimental, replicate B</td>
<td rowspan="1" colspan="1">3,840.4</td>
<td rowspan="1" colspan="1">−14,259.1</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized Halpern and Bruno, combined replicates</td>
<td rowspan="1" colspan="1">4,388.7</td>
<td rowspan="1" colspan="1">−14,533.3</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized Halpern and Bruno, replicate B</td>
<td rowspan="1" colspan="1">4,559.1</td>
<td rowspan="1" colspan="1">−14,618.5</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized Halpern and Bruno, replicate A</td>
<td rowspan="1" colspan="1">4,622.1</td>
<td rowspan="1" colspan="1">−14,649.9</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF6">
<p>N
<sc>ote</sc>
.—Experimentally determined models vastly outperform GY94 or KOSI07+F. Models are sorted by ΔAIC (
<xref rid="msu173-B50" ref-type="bibr">Posada and Buckley 2004</xref>
) but note that the experimentally determined models all have much higher log likelihoods even before penalizing parameters. The experimentally determined models fit best if the amino acid preferences are interpreted as the fraction of genetic backgrounds that tolerate a mutation (
<xref ref-type="disp-formula" rid="msu173-M3">eq. 3</xref>
) rather than as selection coefficients (
<xref ref-type="disp-formula" rid="msu173-M4">eq. 4</xref>
). Randomizing the experimentally determined preferences among sites makes the models far worse. All variants of GY94 and KOSI07+F contain empirical equilibrium frequencies plus a transition–transversion ratio and synonymous–nonsynonymous ratio (
<italic>ω</italic>
) optimized by likelihood. Some variants allow
<italic>ω</italic>
to vary across sites using discrete categories (M2a), a gamma distribution (M5), or a beta distribution plus a category (M8). Some variants allow a different
<italic>ω</italic>
for each branch. Some variants allow the rate of substitution to be gamma distributed.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="msu173-T7" position="float">
<label>Table 7.</label>
<caption>
<p>Likelihoods for the Various Evolutionary Models for the Tree Topology Inferred with CodonPhyML Using KOSI07+F.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Model</th>
<th rowspan="1" colspan="1">ΔAIC</th>
<th rowspan="1" colspan="1">Log Likelihood</th>
<th rowspan="1" colspan="1">Parameters (Optimized + Empirical)</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Experimental, combined replicates</td>
<td rowspan="1" colspan="1">0.0</td>
<td rowspan="1" colspan="1">−12,334.6</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Experimental, replicate A</td>
<td rowspan="1" colspan="1">67.9</td>
<td rowspan="1" colspan="1">−12,368.5</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Experimental, replicate B</td>
<td rowspan="1" colspan="1">106.2</td>
<td rowspan="1" colspan="1">−12,387.7</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Halpern and Bruno, combined replicates</td>
<td rowspan="1" colspan="1">356.8</td>
<td rowspan="1" colspan="1">−12,513.0</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Halpern and Bruno, replicate A</td>
<td rowspan="1" colspan="1">391.5</td>
<td rowspan="1" colspan="1">−12,530.3</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Halpern and Bruno, replicate B</td>
<td rowspan="1" colspan="1">454.8</td>
<td rowspan="1" colspan="1">−12,562.0</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, beta
<italic>ω</italic>
plus positive, one rate (M8)</td>
<td rowspan="1" colspan="1">1,183.4</td>
<td rowspan="1" colspan="1">−12,912.3</td>
<td rowspan="1" colspan="1">14 (5 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, three-category
<italic>ω</italic>
, one rate (M2a)</td>
<td rowspan="1" colspan="1">1,209.4</td>
<td rowspan="1" colspan="1">−12,925.3</td>
<td rowspan="1" colspan="1">14 (5 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, gamma
<italic>ω</italic>
, one rate (M5)</td>
<td rowspan="1" colspan="1">1,219.6</td>
<td rowspan="1" colspan="1">−12,932.4</td>
<td rowspan="1" colspan="1">12 (3 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, one
<italic>ω</italic>
, gamma rates</td>
<td rowspan="1" colspan="1">1,493.1</td>
<td rowspan="1" colspan="1">−13,069.1</td>
<td rowspan="1" colspan="1">12 (3 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, three-category
<italic>ω</italic>
, one rate (M2a)</td>
<td rowspan="1" colspan="1">1,676.0</td>
<td rowspan="1" colspan="1">−13,107.6</td>
<td rowspan="1" colspan="1">65 (5 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, M8 rates-one</td>
<td rowspan="1" colspan="1">1,676.6</td>
<td rowspan="1" colspan="1">−13,107.9</td>
<td rowspan="1" colspan="1">65 (5 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, gamma
<italic>ω</italic>
, one rate</td>
<td rowspan="1" colspan="1">1,753.3</td>
<td rowspan="1" colspan="1">−13,148.2</td>
<td rowspan="1" colspan="1">63 (3 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, one
<italic>ω</italic>
, one rate (M0)</td>
<td rowspan="1" colspan="1">1,762.2</td>
<td rowspan="1" colspan="1">−13,204.7</td>
<td rowspan="1" colspan="1">11 (2 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, one
<italic>ω</italic>
, gamma rates</td>
<td rowspan="1" colspan="1">1,834.3</td>
<td rowspan="1" colspan="1">−13,188.7</td>
<td rowspan="1" colspan="1">63 (3 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GY94, branch-specific
<italic>ω</italic>
, gamma rates (M5)</td>
<td rowspan="1" colspan="1">1,980.8</td>
<td rowspan="1" colspan="1">−12,769.0</td>
<td rowspan="1" colspan="1">556 (547 + 9)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, one
<italic>ω</italic>
, one rate (M0)</td>
<td rowspan="1" colspan="1">2,256.8</td>
<td rowspan="1" colspan="1">−13,401.0</td>
<td rowspan="1" colspan="1">62 (2 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KOSI07+F, branch-specific
<italic>ω</italic>
, gamma rates</td>
<td rowspan="1" colspan="1">2,324.0</td>
<td rowspan="1" colspan="1">−12,889.6</td>
<td rowspan="1" colspan="1">607 (547 + 60)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized experimental, combined replicates</td>
<td rowspan="1" colspan="1">3,741.3</td>
<td rowspan="1" colspan="1">−14,205.2</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized experimental, replicate A</td>
<td rowspan="1" colspan="1">3,809.4</td>
<td rowspan="1" colspan="1">−14,239.3</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized experimental, replicate B</td>
<td rowspan="1" colspan="1">3,841.4</td>
<td rowspan="1" colspan="1">−14,255.3</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized Halpern and Bruno, combined replicates</td>
<td rowspan="1" colspan="1">4,387.6</td>
<td rowspan="1" colspan="1">−14,528.4</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized Halpern and Bruno, replicate B</td>
<td rowspan="1" colspan="1">4,557.9</td>
<td rowspan="1" colspan="1">−14,613.6</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Randomized Halpern and Bruno, replicate A</td>
<td rowspan="1" colspan="1">4,620.8</td>
<td rowspan="1" colspan="1">−14,645.0</td>
<td rowspan="1" colspan="1">0 (0 + 0)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msu173-TF7">
<p>N
<sc>ote</sc>
.—This table differs from
<xref ref-type="table" rid="msu173-T6">table 6</xref>
in that it optimizes the likelihoods on the tree topology inferred with KOSI07+F rather than GY94.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>There is also a clear correlation between the quality and volume of experimental data and the phylogenetic fit: Models from individual experimental replicates give lower likelihoods than both replicates combined, and the technically superior replicate A (recall the comparison in
<xref ref-type="fig" rid="msu173-F4">fig. 4</xref>
) gives a better likelihood than replicate B (
<xref ref-type="table" rid="msu173-T6">tables 6</xref>
and
<xref ref-type="table" rid="msu173-T7">7</xref>
). This fact suggests that improvements in experimental methodology that improve the accuracy of the measured mutational effects should lead to even better experimentally determined evolutionary models.</p>
<p>In
<xref ref-type="table" rid="msu173-T6">tables 6</xref>
and
<xref ref-type="table" rid="msu173-T7">7</xref>
, the site-specific experimentally determined model is compared with variants of two general models (GY94 and KOSI07+F) that apply broadly to all proteins. More recently, it has become possible to estimate nonsite-specific (identical across sites) codon and amino acid models using naturally occurring sequences from specific proteins or viruses (
<xref rid="msu173-B11" ref-type="bibr">Dang et al. 2010</xref>
;
<xref rid="msu173-B12" ref-type="bibr">De Maio et al. 2013</xref>
). One could therefore ask if the experimentally determined model is superior because it is site specific or simply because it is experimentally derived from deep mutational scanning of influenza. To address this question, I created “randomized” experimentally determined models in which the deep mutational scanning data were randomly shuffled among protein sites. These randomized models are still derived from deep mutational scanning of influenza but have lost their linkage to site-specific experimental information. These randomized models are greatly inferior to all of the other models considered here (
<xref ref-type="table" rid="msu173-T6">tables 6</xref>
and
<xref ref-type="table" rid="msu173-T7">7</xref>
). Therefore, the superiority of the experimentally determined model is due to its utilization of site-specific information from the deep mutational scanning—if this site specificity is lost, the model becomes far worse than general models such as GY94 or KOSI07+F.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>These results establish that an experimentally determined evolutionary model is far superior to existing models for describing the phylogeny of NP gene sequences. The extent of this superiority is striking. The parameter-free evolutionary model dramatically outperforms even the most highly parameterized existing models using the parameter-penalizing metric of AIC—but more remarkably, it also outperforms these parameterized models by over 400 log-likelihood units even in the absence of parameter penalization (
<xref ref-type="table" rid="msu173-T6">table 6</xref>
). The reason for this superiority is easy to understand: Proteins have strong and fairly conserved preferences for specific amino acids at different sites (
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
), but these site-specific preferences are ignored by most existing phylogenetic models. Inspection of the overlaid bars in
<xref ref-type="fig" rid="msu173-F5">figure 5</xref>
<italic>D</italic>
illustrates the inadequacy of trying to capture these preferences simply by classifying sites based on gross features of protein structure (
<xref rid="msu173-B65" ref-type="bibr">Thorne et al. 1996</xref>
;
<xref rid="msu173-B19" ref-type="bibr">Goldman et al. 1998</xref>
)—the site-specific amino acid preferences are not simply related to secondary structure or solvent accessibility. The complexity of the preferences in
<xref ref-type="fig" rid="msu173-F5">figure 5</xref>
<italic>D</italic>
also show the limitations of attempting to infer amino acid preference parameters for a small number of site classes from sequence data (
<xref rid="msu173-B35" ref-type="bibr">Lartillot and Philippe 2004</xref>
;
<xref rid="msu173-B36" ref-type="bibr">Le et al. 2008</xref>
;
<xref rid="msu173-B68" ref-type="bibr">Wang et al. 2008</xref>
;
<xref rid="msu173-B69" ref-type="bibr">Wu et al. 2013</xref>
), as it is clear that each site is unique. Direct experimental measurement therefore represents a highly attractive method for determining the idiosyncratic constraints that affect the evolution of each site in a gene.</p>
<p>Another appealing aspect of an experimentally determined evolutionary model is interpretability. A frustrating aspect of existing evolutionary models is the inability to interpret many of their free parameters directly in evolutionary or molecular terms. For example, the equilibrium frequency parameters used by most existing models reflect some unknown combination of mutational bias and selection for specific codons or amino acids—but the relative contributions of these factors in determining the parameter values is unclear. On the other hand, all aspects of the experimentally determined evolutionary model can be related to direct measurements, making them more amenable to direct interpretation. So even if such a model were eventually augmented with a few free parameters, this could be done in a way that allowed these parameters to retain a clear connection to the molecular processes of biology and evolution.</p>
<p>The results presented here also demonstrate that phylogenetic evolutionary models can be greatly improved while retaining the assumption of independence of sites. Phylogenetic evolutionary models make two assumptions that are egregiously bad from the perspective of the protein chemist: First, these models assume that sites are identical (or at least can be described by a small number of classes), and second, they assume that sites are independent. The experimentally determined model eliminates the first assumption but does nothing to relax the second. Is this model therefore inconsistent with the idea that epistasis is common during protein evolution (
<xref rid="msu173-B38" ref-type="bibr">Lunzer et al. 2010</xref>
)? In fact, experiments show that a general conservation of site-specific amino acid preferences is entirely consistent with epistasis. For instance, there is known epistasis among some of the mutations fixed along the NP phylogenetic tree analyzed here (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
)—but the site-specific compatibilities of amino acids with the protein’s structural stability are largely conserved among homologs on this tree, even for sites involved in epistatic interactions (
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
). The reason is that evolutionary relevant epistasis can arise from subtle and transient fluctuations in properties such as protein stability, whereas the phylogenetic improvements from a site-specific model probably come mostly from capturing basic information about the compatibility of amino acids with a protein’s evolutionarily conserved structure. Models that assume independence among sites can therefore still lead to major improvements if the site-specific amino acid preferences are accurately represented.</p>
<p>The major drawback of the experimentally determined evolutionary model is its lack of generality. Although this model is clearly superior for influenza NP, it is entirely unsuitable for other genes. At first blush, it might seem that the arduous experiments described here provide data that is unlikely to ever become available for most situations of interest. However, it is worth remembering that today’s arduous experiment frequently becomes routine in a few years. For example, the very gene sequences that are the subjects of molecular phylogenetics were once rare pieces of data—now such sequences are so abundant that they easily overwhelm modern computers. The experimental ease of the deep mutational scanning approach used here is on a comparable trajectory: Similar approaches have already been applied to several proteins (
<xref rid="msu173-B17" ref-type="bibr">Fowler et al. 2010</xref>
;
<xref rid="msu173-B67" ref-type="bibr">Traxlmayr et al. 2012</xref>
;
<xref rid="msu173-B40" ref-type="bibr">Melamed et al. 2013</xref>
;
<xref rid="msu173-B56" ref-type="bibr">Roscoe et al. 2013</xref>
;
<xref rid="msu173-B61" ref-type="bibr">Starita et al. 2013</xref>
), and there continue to be rapid improvements in techniques for mutagenesis (
<xref rid="msu173-B16" ref-type="bibr">Firnberg and Ostermeier 2012</xref>
;
<xref rid="msu173-B29" ref-type="bibr">Jain and Varadarajan 2014</xref>
) and sequencing (
<xref rid="msu173-B25" ref-type="bibr">Hiatt et al. 2010</xref>
;
<xref rid="msu173-B57" ref-type="bibr">Schmitt et al. 2012</xref>
;
<xref rid="msu173-B37" ref-type="bibr">Lou et al. 2013</xref>
). Given these prospects for technical improvements in deep mutational scanning, it is therefore especially encouraging that the phylogenetic fit of the NP evolutionary model improves with the quality and volume of experimental data from which it is derived (
<xref ref-type="table" rid="msu173-T6">table 6</xref>
). The increasing availability of similar high-throughput data for a vast range of proteins has the potential to transform phylogenetic analyses by greatly increasing the accuracy of evolutionary models, while at the same time replacing a plethora of free parameters with experimentally measured quantities that can be given clear biological and evolutionary interpretations.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Availability of Data and Computer Code</title>
<p>Illumina sequencing data are available at the Sequence Read Archive (SRA) (accession SRP036064,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra/?term=SRP036064">http://www.ncbi.nlm.nih.gov/sra/?term=SRP036064</ext-link>
, last accessed May 31, 2014). A description and links to the source code used to analyze the sequencing data and infer the amino acid preferences is at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html">http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html</ext-link>
(last accessed May 31, 2014). A description and links to the source code used for the phylogenetic analyses is at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2013Analysis_Influenza_NP_Human_1918_Descended.html">http://jbloom.github.io/phyloExpCM/example_2013Analysis_Influenza_NP_Human_1918_Descended.html</ext-link>
(last accessed May 31, 2014).</p>
</sec>
<sec>
<title>Experimental Measurement of Mutation Rates</title>
<p>To measure mutation rates, I generated GFP-carrying viruses with all genes derived from A/WSN/1933 (H1N1) by reverse genetics as described previously (
<xref rid="msu173-B5" ref-type="bibr">Bloom et al. 2010</xref>
). These viruses were repeatedly passaged at limiting dilution in MDCK-SIAT1-CMV-PB1 cells (
<xref rid="msu173-B5" ref-type="bibr">Bloom et al. 2010</xref>
) using low serum media (Opti-MEM I with 0.5% heat-inactivated fetal bovine serum, 0.3% BSA, 100 U/ml penicillin, 100 μg/ml streptomycin, and 100 μg/ml calcium chloride)—a moderate serum concentration was retained and no trypsin was added because viruses with the WSN HA and NA are trypsin independent (
<xref rid="msu173-B22" ref-type="bibr">Goto and Kawaoka 1998</xref>
). These passages were performed for 27 replicate populations. For each passage, 100 μl containing the equivalent of 2 μl of virus collection was added to the first row of a 96-well plate. The virus was serially diluted 1:5 down the plate, such that at the conclusion of the dilutions, each well contained 80 μl of virus dilution. MDCK-SIAT1-CMV-PB1 cells were then added to each well in a 50 μl volume containing 2.5 × 10
<sup>3</sup>
cells. The plates were grown for approximately 80 h, and wells were examined for cytopathic effect indicative of viral growth. The last well with cytopathic effect was collected and used as the parent population for the next round of limiting-dilution passage.</p>
<p>After 25 limiting-dilution passages, 10 of the 27 viral populations no longer caused any visible GFP expression in the cells in which they caused cytopathic effect, indicating fixation of a mutation that ablated GFP fluorescence. The 17 remaining populations all caused fluorescence in infected cells, although in some cases the intensity was visibly reduced—these populations therefore must have retained at least a partially functional GFP. Total RNA was purified from each viral population, the PB1 segment was reverse transcribed using the primers
<monospace>CATGATCGTCTCGTATTAGTAGAAACAAGGCATTTTTTCATGAAGGACAAGC</monospace>
and
<monospace>CATGATCG</monospace>
<monospace>TCTCAGGGAGCGAAAGCAGGCAAACCATTTGATTGG</monospace>
, and the reverse-transcribed cDNA was amplified by conventional PCR using the same primers. For 22 of the 27 replicate viral populations, this process amplified an insert with the length expected for the full GFP-carrying PB1 segment. For two of the replicates, this amplified inserts between 0.4 and 0.5 kb shorter than the expected length, suggesting an internal deletion in part of the segment. For three replicates, this failed to amplify any insert, suggesting total loss of the GFP-carrying PB1 segment, a very large internal deletion, or rearrangement that rendered the reverse-transcription primers ineffective. For the 24 replicates from which an insert could be amplified, the GFP coding region was Sanger sequenced to determine the consensus sequence. The results are in
<xref ref-type="table" rid="msu173-T1">tables 1</xref>
and
<xref ref-type="table" rid="msu173-T2">2</xref>
.</p>
<p>To estimate
<italic>R
<sub>m</sub>
</italic>
<sub></sub>
<italic>
<sub>n</sub>
</italic>
, it is necessary to normalize by the nucleotide composition of the GFP gene. The numbers of each nucleotide in this gene are
<inline-formula>
<inline-graphic xlink:href="msu173i35.jpg"></inline-graphic>
</inline-formula>
, and
<italic>N</italic>
<sub>G</sub>
= 201. Given that the counts in
<xref ref-type="table" rid="msu173-T2">table 2</xref>
come after 25 passages of 24 replicates:
<disp-formula id="msu173-M5">
<label>(5)</label>
<graphic xlink:href="msu173m5.jpg" position="float"></graphic>
</disp-formula>
where
<italic>N
<sub>m</sub>
</italic>
<sub></sub>
<italic>
<sub>n</sub>
</italic>
is the number of observed mutations from
<italic>m</italic>
to
<italic>n</italic>
in
<xref ref-type="table" rid="msu173-T2">table 2</xref>
,
<italic>m
<sub>c</sub>
</italic>
indicates the complement of DNA nucleotide
<italic>m</italic>
(e.g.,
<italic>A
<sub>c</sub>
</italic>
=
<italic>T</italic>
). The one in the numerator is a pseudocount added to the observed counts of each type of mutation to avoid estimating rates of zero. The values of
<italic>R
<sub>m</sub>
</italic>
<sub></sub>
<italic>
<sub>n</sub>
</italic>
estimated from
<xref ref-type="disp-formula" rid="msu173-M5">equation (5)</xref>
give the probability that a nucleotide that is already
<italic>m</italic>
will mutate to
<italic>n</italic>
in a single tissue-culture generation.</p>
</sec>
<sec>
<title>Construction of NP Codon-Mutant Libraries</title>
<p>The goal was to construct a mutant library with an average of two to three random codon mutations per gene. Most techniques for creating mutant libraries of full-length genes, such as error-prone PCR (
<xref rid="msu173-B9" ref-type="bibr">Cirino et al. 2003</xref>
) and chemical mutagenesis (
<xref rid="msu173-B43" ref-type="bibr">Neylon 2004</xref>
), introduce mutations at the nucleotide level, meaning that codon substitutions involving multiple nucleotide changes occur at a negligible rate. Recently, several groups have developed strategies for introducing codon mutations along the lengths of entire genes (
<xref rid="msu173-B16" ref-type="bibr">Firnberg and Ostermeier 2012</xref>
;
<xref rid="msu173-B29" ref-type="bibr">Jain and Varadarajan 2014</xref>
, Kitzman J and Shendure J, personal communication). Most of these strategies are designed to create exactly one codon mutation per gene. For my experiments, it was desirable to introduce a distribution of around one to four codon mutations per gene to examine the average effects of mutations in a variety of closely related genetic backgrounds. Therefore, I devised a codon-mutagenesis protocol specifically for this purpose.</p>
<p>This technique involved iterative rounds of low-cycle PCR with pools of mutagenic synthetic oligonucleotides that each contain a randomized
<monospace>NNN</monospace>
triplet at a specific codon site. Two replicate libraries each of the WT and N334H variants of the Aichi/1968 NP were prepared in full biological duplicate, beginning each with independent preps of the plasmid templates pHWAichi68-NP and pHWAichi68-NP-N334H. The sequences of the NP genes in these plasmids are provided in
<xref rid="msu173-B21" ref-type="bibr">Gong et al. (2013)</xref>
. To avoid cross-contamination, all purification steps used an independent gel for each sample, with the relevant equipment thoroughly washed to remove residual DNA.</p>
<p>First, for each codon except for that encoding the initiating methionine in the 498-residue NP gene, I designed an oligonucleotide that contained a randomized
<monospace>NNN</monospace>
nucleotide triplet preceded by the 16 nucleotides upstream of that codon in the NP gene and followed by the 16 nucleotides downstream of that codon in the NP gene. I ordered these oligonucleotides in 96-well plate format from Integrated DNA Technologies and combined them in equimolar quantities to create the forward-mutagenesis primer pool. I also designed and ordered the reverse complement of each of these oligonucleotides and combined them in equimolar quantities to create the reverse-mutagenesis pool. The primers for the N334H variants differed only for those that overlapped the N334H codon. I also designed end primers that annealed to the termini of the NP sequence and contained sites appropriate for BsmBI cloning into the influenza reverse-genetics plasmid pHW2000 (
<xref rid="msu173-B26" ref-type="bibr">Hoffmann et al. 2000</xref>
). These primers are 5’-BsmBI-Aichi68-NP (
<monospace>catgatcgtctcagggagcaaaagcagggtagataatcactcacag</monospace>
) and 3’-BsmBI-Aichi68-NP (
<monospace>catgatcgtctcgtattagtagaaacaagggtatttttcttta</monospace>
).</p>
<p>I set up PCR reactions that contained 1 μl of 10 ng/μl template pHWAichi68-NP plasmid (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
), 25 μl of 2× KOD Hot Start Master Mix (product number 71842, EMD Millipore), 1.5 μl each of 10 μM solutions of the end primers 5’-BsmBI-Aichi68-NP and 3’-BsmBI-Aichi68-NP, and 21 μl of water. I used the following PCR program (referred to as the amplicon PCR program in the remainder of this article):
<list list-type="order">
<list-item>
<p>95 °C for 2 min.</p>
</list-item>
<list-item>
<p>95 °C for 20 s.</p>
</list-item>
<list-item>
<p>70 °C for 1 s.</p>
</list-item>
<list-item>
<p>50 °C for 30 s cooling to 50 °C at 0.5 °C/s.</p>
</list-item>
<list-item>
<p>70 °C for 40 s.</p>
</list-item>
<list-item>
<p>Repeat steps 2 through 5 for 24 additional cycles.</p>
</list-item>
<list-item>
<p>Hold 4 °C.</p>
</list-item>
</list>
The PCR products were purified over agarose gels using ZymoClean columns (product number D4002, Zymo Research) and used as templates for the initial codon mutagenesis fragment PCR.</p>
<p>Two fragment PCR reactions were run for each template. The forward-fragment reactions contained 15 μl of 2× KOD Hot Start Master Mix, 2 μl of the forward mutagenesis primer pool at a total oligonucleotide concentration of 4.5 μM, 2 μl of 4.5 μM 3’-BsmBI-Aichi68-NP, 4 μl of 3 ng/μl of the aforementioned gel-purified linear PCR product template, and 7 μl of water. The reverse-fragment reactions were identical except that the reverse mutagenesis pool was substituted for the forward mutagenesis pool and that 5’-BsmBI-Aichi68-NP was substituted for 3’-BsmBI-Aichi68-NP. The PCR program for these fragment reactions was identical to the amplicon PCR program except that it utilized a total of 7 rather than 25 thermal cycles.</p>
<p>The products from the fragment PCR reactions were diluted 1:4 in water. These dilutions were then used for the joining PCR reactions, which contained 15 μl of 2× KOD Hot Start Master Mix, 4 μl of the 1:4 dilution of the forward-fragment reaction, 4 μl of the 1:4 dilution of the reverse-fragment reaction, 2 μl of 4.5 μM 5’-BsmBI-Aichi68-NP, 2 μl of 4.5 μM 3’-BsmBI-Aichi68-NP, and 3 μl of water. The PCR program for these joining reactions was identical to the amplicon PCR program except that it utilized a total of 20 rather than 25 thermal cycles. The products from these joining PCRs were purified over agarose gels.</p>
<p>The purified products of the first joining PCR reactions were used as templates for a second round of fragment reactions followed by joining PCRs. These second-round products were used as templates for a third round. The third-round products were purified over agarose gels, digested with BsmBI (product number R0580L, New England Biolabs), and ligated into a dephosphorylated (Antarctic Phosphatase, product number M0289L, New England Biolabs) BsmBI digest of pHW2000 (
<xref rid="msu173-B26" ref-type="bibr">Hoffmann et al. 2000</xref>
) using T4 DNA ligase. The ligations were purified using ZymoClean columns, electroporated into ElectroMAX DH10B T1 phage-resistant competent cells (product number 12033-015, Invitrogen), and plated on LB plates supplemented with 100 μg/ml of ampicillin. These transformations yielded between 400,000 and 800,000 unique transformants per plate, as judged by plating a 1:4,000 dilution of the transformations on a second set of plates. Transformation of a parallel no-insert control ligation yielded approximately 50-fold fewer colonies, indicating that self ligation of pHW2000 only accounts for a small fraction of the transformants. For each library, I performed three transformations, grew the plates overnight, and then scraped the colonies into liquid LB supplemented with ampicillin and mini-prepped several hours later to yield the plasmid mutant libraries. These libraries each contained in excess of 10
<sup>6</sup>
unique transformants, most of which will be unique codon mutants of the NP gene.</p>
<p>I sequenced the NP gene for 30 individual clones drawn from the four mutant libraries. As shown in
<xref ref-type="fig" rid="msu173-F1">figure 1</xref>
, the number of mutations per clone was approximately Poisson distributed and the mutations occurred uniformly along the primary sequence. If all codon mutations are made with equal probability, 9/63 of the mutations should be single-nucleotide changes, 27/63 should be two-nucleotide changes, and 27/63 should be three-nucleotide changes. This is approximately what was observed in the Sanger-sequenced clones. The nucleotide composition of the mutated codons was roughly uniform, and there was no tendency for clustering of multiple mutations in primary sequence. The results of this Sanger sequencing are compatible with the mutation frequencies obtained from deep sequencing the “mutDNA” samples after subtracting off the sequencing error rate estimated from the DNA samples (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
), especially considering that the statistics from the Sanger sequencing are subject to sampling error due to the limited number of clones analyzed.</p>
</sec>
<sec>
<title>Viral Growth and Passage</title>
<p>Two independent replicates of viral growth and passage were performed (replicates A and B). The procedures were similar between replicates, but there were a few small differences. In the actual experimental chronology, replicate B was performed first, and the modifications in replicate A were designed to improve the sampling of the mutations by the created mutant viruses. These modifications may be the reason why replicate A slightly outperforms replicate B by two objective measures: The viruses more completely sample the codon mutations (
<xref ref-type="fig" rid="msu173-F4">fig. 4</xref>
), and the evolutionary model derived solely from replicate A gives a higher likelihood than the evolutionary model derived solely from replicate B (
<xref ref-type="table" rid="msu173-T6">tables 6</xref>
and
<xref ref-type="table" rid="msu173-T7">7</xref>
).</p>
<p>For replicate B, I used reverse genetics to rescue viruses carrying the Aichi/1968 NP or one of its derivatives, PB2 and PA from the A/Nanchang/933/1995 (H3N2), a PB1 gene segment encoding GFP, and HA/NA/M/NS from A/WSN/1933 (H1N1) strain. With the exception of the variants of NP used, these viruses are identical to those described in
<xref rid="msu173-B21" ref-type="bibr">Gong et al. (2013)</xref>
and were rescued by reverse genetics in 293 T-CMV-Nan95-PB1 and MDCK-SIAT1-CMV-Nan95-PB1 cells as described in that reference. The previous section describes four NP codon-mutant libraries, two of the WT Aichi/1968 gene (WT-1 and WT-2) and two of the N334H variant (N334H-1 and N334H-2). I grew mutant viruses from all four mutant libraries and four paired unmutated viruses from independent preps of the parent plasmids. A major goal was to maintain diversity during viral creation by reverse genetics—the experiment would obviously be undermined if most of the rescued viruses derived from a small number of transfected plasmids. I therefore performed the reverse genetics in 15-cm tissue culture dishes to maximize the number of transfected cells. Specifically, 15 cm dishes were seeded with 10
<sup>7</sup>
293T-CMV-Nan95-PB1 cells in D10 media (DMEM with 10% heat-inactivated fetal bovine serum, 2 mM
<sc>l</sc>
-glutamine, 100 U/ml penicillin, and 100 μg/ml streptomycin). At 20 h postseeding, the dishes were transfected with 2.8 μg of each of the eight reverse-genetics plasmids. At 20 h posttransfection, about 20% of the cells expressed GFP (indicating transcription by the viral polymerase of the GFP encoded by pHH-PB1flank-eGFP), suggesting that many unique cells were transfected. At 20 h posttransfection, the media was changed to the low serum media described above. At 78 h posttransfection, the viral supernatants were collected, clarified by centrifugation at 2,000 × g for 5 min, and stored at 4 °C. The viruses were titered by flow cytometry as described previously (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
). A control lacking the NP gene yielded no infectious virus as expected.</p>
<p>The virus was then passaged in MDCK-SIAT1-CMV-Nan95-PB1 cells. These cells were seeded into 15 cm dishes, and when they had reached a density of 10
<sup>7</sup>
per plate, they were infected with 10
<sup>6</sup>
infectious particles (multiplicity of infection (MOI) of 0.1) of the transfectant viruses in low serum media. After 18 h, 30–50% of the cells were green as judged by microscopy, indicating viral spread. At 40 h posttransfection, 100% of the cells were green, and many showed clear signs of cytopathic effect. At this time, the viral supernatants were again collected, clarified, and stored at 4 °C. NP cDNA isolated from these viruses was the source the deep-sequencing samples “virus-p1” and “mutvirus-p1” in
<xref ref-type="fig" rid="msu173-F2">figure 2</xref>
. The virus was then passaged a second time exactly as before (again using an MOI of 0.1). NP cDNA from these twice-passaged viruses constituted the source for the samples “virus-p2” and “mutvirus-p2” in
<xref ref-type="fig" rid="msu173-F2">figure 2</xref>
.</p>
<p>For replicate A, all viruses (both the four mutant viruses and the paired unmutated controls) were regrown independently from the same plasmid preps used for replicate B. The experimental process was identical to that used for replicate B except for the following: Standard influenza viruses (rather than the GFP-carrying variants) were used, so plasmid pHW-Nan95-PB1 (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
) was substituted for pHH-PB1flank-eGFP during reverse genetics, and 293T and MDCK-SIAT1 cells were substituted for the PB1-expressing variants. Rather than creating the viruses by transfecting a single 15-cm dish, each sample was created by transfecting two 12-well dishes, with the dishes seeded at 3 × 10
<sup>5</sup>
293T and 5 × 10
<sup>4</sup>
MDCK-SIAT1 cells prior to transfection. The passaging was then done in four 10 cm dishes for each sample, with the dishes seeded at 4 × 10
<sup>6</sup>
MDCK-SIAT1 cells 12–14 h prior to infection. The passaging was still done at an MOI of 0.1. These modifications were designed to increase diversity in the viral population. These viruses were titered by TCID50 rather than flow cytometry.</p>
</sec>
<sec>
<title>Sample Preparation and Illumina Sequencing</title>
<p>For each sample, a PCR amplicon was created to serve as the template for Illumina sequencing. The steps used to generate the PCR amplicon for each of the seven sample types (
<xref ref-type="fig" rid="msu173-F2">fig. 2</xref>
) are listed below. Once the PCR template was generated, for all samples the PCR amplicon was created using the amplicon PCR program described above in 50 μl reactions consisting of 25 μl of 2× KOD Hot Start Master Mix, 1.5 μl each of 10 μM of 5’-BsmBI-Aichi68-NP and 3’-BsmBI-Aichi68-NP, the indicated template, and ultrapure water. A small amount of each PCR reaction was run on an analytical agarose gel to confirm the desired band. The remainder was then run on its own agarose gel without any ladder (to avoid contamination) after carefully cleaning the gel rig and all related equipment. The amplicons were excised from the gels, purified over ZymoClean columns, and analyzed using a NanoDrop to ensure that the absorbance at 260 nm was at least 1.8 times that at 230 nm and 280 nm. The templates were as follows:
<list list-type="bullet">
<list-item>
<p>DNA: The templates for these amplicons were 10 ng of the unmutated independent plasmid preps used to create the codon mutant libraries.</p>
</list-item>
<list-item>
<p>mutDNA: The templates for these amplicons were 10 ng of the plasmid mutant libraries.</p>
</list-item>
<list-item>
<p>RNA: This amplicon quantifies the net error rate of transcription and reverse transcription. Because the viral RNA is initially transcribed from the reverse-genetics plasmids by RNA polymerase I, but the bidirectional reverse-genetics plasmids direct transcription of RNA by both RNA polymerases I and II (
<xref rid="msu173-B26" ref-type="bibr">Hoffmann et al. 2000</xref>
), the RNA templates for these amplicons were transcribed from plasmids derived from pHH21 (
<xref rid="msu173-B42" ref-type="bibr">Neumann et al. 1999</xref>
), which only directs transcription by RNA polymerase I. The unmutated WT and N334H NP genes were cloned into this plasmid to create pHH-Aichi68-NP and pHH-Aichi68-NP-N334H. Independent preparations of these plasmids were transfected into 293T cells, transfecting 2 μg of plasmid into 5 × 10
<sup>5</sup>
cells in six-well dishes. After 32 h, total RNA was isolated using Qiagen RNeasy columns and treated with the Ambion TURBO DNA-free kit (Applied Biosystems AM1907) to remove residual plasmid DNA. This RNA was used as a template for reverse transcription with AccuScript (Agilent 200820) using the primers 5’-BsmBI-Aichi68-NP and 3’-BsmBI-Aichi68-NP. The resulting cDNA was quantified by quantitative PCR (qPCR) specific for NP (see below), which showed high levels of NP cDNA in the reverse-transcription reactions but undetectable levels in control reactions lacking the reverse transcriptase, indicating that residual plasmid DNA had been successfully removed. A volume of cDNA that contained at least 2 × 10
<sup>6</sup>
NP cDNA molecules (as quantified by qPCR) was used as template for the amplicon PCR reaction. Control PCR reactions using equivalent volumes of template from the no reverse-transcriptase control reactions yielded no product.</p>
</list-item>
<list-item>
<p>virus-p1: This amplicon was derived from virus created from the unmutated plasmid and collected at the end of the first passage. Clarified virus supernatant was ultracentrifuged at 64,000 × g for 1.5 h at 4 °C, and the supernatant was decanted. Total RNA was then isolated from the viral pellet using a Qiagen RNeasy kit. This RNA was used as a template for reverse transcription with AccuScript using the primers 5’-BsmBI-Aichi68-NP and 3’-BsmBI-Aichi68-NP. The resulting cDNA was quantified by qPCR, which showed high levels of NP cDNA in the reverse-transcription reactions but undetectable levels in control reactions lacking the reverse transcriptase. A volume of cDNA that contained at least 10
<sup>7</sup>
NP cDNA molecules (as quantified by qPCR) was used as template for the amplicon PCR reaction. Control PCR reactions using equivalent volumes of template from the no reverse-transcriptase control reactions yielded no product.</p>
</list-item>
<list-item>
<p>virus-p2, mutvirus-p1, and mutvirus-p2: These amplicons were created as for the virus-p1 amplicons but used the appropriate virus as the initial template as outlined in
<xref ref-type="fig" rid="msu173-F2">figure 2</xref>
.</p>
</list-item>
</list>
</p>
<p>An important note: It was found that the use of relatively new RNeasy kits with β-mercaptoethanol (a reducing agent), freshly added per the manufacturer’s instructions, was necessary to avoid what appeared to be oxidative damage to purified RNA.</p>
<p>The overall experiment only makes sense if the sequenced NP genes derive from a large diversity of initial template molecules. Therefore, qPCR was used to quantify the molecules produced by reverse transcription to ensure that a sufficiently large number were used as PCR templates to create the amplicons. The qPCR primers were 5’-Aichi68-NP-for (
<monospace>gcaacagctggtctgactcaca</monospace>
) and 3’-Aichi68-NP-rev (
<monospace>tccatgccggtgcgaacaag</monospace>
). The qPCR reactions were performed using the SYBR Green PCR Master Mix (Applied Biosystems 4309155) following the manufacturer’s instructions. Linear NP PCR-ed from the pHWAichi68-NP plasmid was used as a quantification standard—the use of a linear standard is important, because amplification efficiencies differ for linear and circular templates (
<xref rid="msu173-B27" ref-type="bibr">Hou et al. 2010</xref>
). The standard curves were linear with respect to the amount of NP standard over the range from 10
<sup>2</sup>
to 10
<sup>9</sup>
NP molecules. These standard curves were used to determine the absolute number of NP cDNA molecules after reverse transcription. Note that the use of only 25 thermal cycles in the amplicon PCR program provides a second check that there are a substantial number of template molecules, as this moderate number of thermal cycles will not lead to sufficient product if there are only a few template molecules.</p>
<p>To allow the Illumina sequencing inserts to be read in both directions by paired-end 50 nt reads (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">supplementary fig. S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">Supplementary Material</ext-link>
online), it was necessary to us an Illumina library-prep protocol that created NP inserts that were roughly 50 nt in length. This was done via a modification of the Illumina Nextera protocol. First, concentrations of the PCR amplicons were determined using PicoGreen (Invitrogen P7859). These amplicons were used as input to the Illumina Nextera DNA Sample Preparation kit (Illumina FC-121-1031). The manufacturer’s protocol for the tagmentation step was modified to use 5-fold less input DNA (10 ng rather than 50 ng) and 2-fold more tagmentation enzyme (10 μl rather than 5 μl), and the incubation at 55 °C was doubled from 5 to 10 min. Samples were barcoded using the Nextera Index Kit for 96 indices (Illumina FC-121-1012). For index 1, the barcoding was DNA with N701, RNA with N702, mutDNA with N703, virus-p1 with N704, mutvirus-p1 with N705, virus-p2 with N706, and mutvirus-p2 with N707. After completion of the Nextera PCR, the samples were subjected to a ZymoClean purification rather than the bead cleanup step specified in the Nextera protocol. The size distribution of these purified PCR products was analyzed using an Agilent 200 TapeStation Instrument. If the NP sequencing insert is exactly 50 nt in size, then the product of the Nextera PCR should be 186 nt in length after accounting for the addition of the Nextera adaptors. The actual size distribution was peaked close to this value. The ZymoClean-purified PCR products were quantified using PicoGreen and combined in equal amounts into pools: A WT-1 pool of the seven samples for that library, a WT-2 pool of the seven samples for that library, etc. These pools were subjected to further size selection by running them on a 4% agarose gel versus a custom ladder containing 171 and 196 nt bands created by PCR from a GFP template using the forward primer
<monospace>gcacggggccgtcgccg</monospace>
and the reverse primers
<monospace>tggggcacaagctggagtacaac</monospace>
(for the 171 nt band) and
<monospace>gacttcaaggaggacggcaacatcc</monospace>
(for the 196 nt band). The gel slice for the sample pools corresponding to sizes between 171 and 196 nt was excised and purified using a ZymoClean column. A separate clean gel was run for each pool to avoid cross contamination.</p>
<p>Library QC and cluster optimization were performed using Agilent Technologies qPCR NGS Library Quantification Kit (Agilent Technologies, Santa Clara, CA). Libraries were introduced onto the flow cell using an Illumina cBot (Illumina, Inc., San Diego, CA) and a TruSeq Rapid Duo cBot Sample Loading Kit. Cluster generation and deep sequencing was performed on an Illumina HiSeq 2500 using an Illumina TruSeq Rapid PE Cluster Kit and TruSeq Rapid SBS Kit. A paired-end, 50 nt read-length (PE50) sequencing strategy was performed in rapid run mode. Image analysis and base calling were performed using Illumina’s Real Time Analysis v1.17.20.0 software, followed by demultiplexing of indexed reads and generation of FASTQ files, using Illumina’s CASAVA v1.8.2 software (
<ext-link ext-link-type="uri" xlink:href="http://www.illumina.com/software.ilmn">http://www.illumina.com/software.ilmn</ext-link>
, last accessed May 31, 2014). These FASTQ files were uploaded to the Sequence Read Archive (SRA) under accession SRP036064 (see
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra/?term=SRP036064">http://www.ncbi.nlm.nih.gov/sra/?term=SRP036064</ext-link>
, last accessed May 31, 2014).</p>
</sec>
<sec>
<title>Read Alignment and Quantification of Mutation Frequencies</title>
<p>A custom Python software package, mapmuts, was created to quantify the frequencies of mutations from the Illumina sequencing. A description of the software as utilized in this work is available at
<ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html">http://jbloom.github.io/mapmuts/example_2013Analysis_Influenza_NP_Aichi68.html</ext-link>
(last accessed May 31, 2014). Briefly:
<list list-type="order">
<list-item>
<p>Reads were discarded if either read in a pair failed the Illumina chastity filter, had a mean Q-score less than 25, or had more than two ambiguous (
<monospace>N</monospace>
) nucleotides.</p>
</list-item>
<list-item>
<p>The remaining paired reads were aligned to each other, and retained only if they shared at least 30 nt of overlap, disagreed at no more than one site, and matched the expected terminal Illumina adaptors with no more than one mismatch.</p>
</list-item>
<list-item>
<p>The overlap of the paired reads was aligned to NP, disallowing alignments with gaps or more than six nucleotide mismatches. A small fraction of alignments corresponded exclusively to the noncoding termini of the viral RNA; the rest contained portions of the NP coding sequence.</p>
</list-item>
<list-item>
<p>For every paired read that aligned with NP, the codon identity was called if both reads concurred for all three nucleotides in the codon. If the reads disagreed or contained an ambiguity in that codon, the identity was not called.</p>
</list-item>
</list>
</p>
</sec>
<sec>
<title>Inference of the Amino Acid Preferences</title>
<p>The approach described here is based on the assumption that there is an inherent preference for each amino acid at each site in the protein. This assumption is clearly not completely accurate, as the effect of a mutation at one site can be influenced by the identities of other sites. However, experimental work with NP (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
) and other proteins (
<xref rid="msu173-B59" ref-type="bibr">Serrano et al. 1993</xref>
;
<xref rid="msu173-B8" ref-type="bibr">Bloom et al. 2005</xref>
,
<xref rid="msu173-B6" ref-type="bibr">2006</xref>
;
<xref rid="msu173-B4" ref-type="bibr">Bershtein et al. 2006</xref>
) suggests that at an evolutionary level, sites interact mostly through generic effects on stability and folding. Furthermore, the effects of mutations on stability and folding tend to be conserved during evolution (
<xref rid="msu173-B59" ref-type="bibr">Serrano et al. 1993</xref>
;
<xref rid="msu173-B2" ref-type="bibr">Ashenberg et al. 2013</xref>
). So one justification for assuming site-specific but site-independent preferences is that selection on a mutation is mostly determined by whether the protein can tolerate its effect on stability or folding, so stabilizing amino acids will be tolerated in most genetic backgrounds, whereas destabilizing amino acids will only be tolerated in some backgrounds, as has been described experimentally (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
) and theoretically (
<xref rid="msu173-B7" ref-type="bibr">Bloom et al. 2007</xref>
). A more pragmatic justification is that the work here builds off this assumption to create evolutionary models that are much better than existing alternatives.</p>
<p>Assume that the preferences are entirely at the amino acid level and are indifferent to the specific codon (the study of preferences for synonymous codons is an interesting area for future work). Denote the preference of site
<italic>r</italic>
for amino acid
<italic>a</italic>
as
<inline-formula>
<inline-graphic xlink:href="msu173i36.jpg"></inline-graphic>
</inline-formula>
, where
<disp-formula id="msu173-M6">
<label>(6)</label>
<graphic xlink:href="msu173m6.jpg" position="float"></graphic>
</disp-formula>
Define
<inline-formula>
<inline-graphic xlink:href="msu173i37.jpg"></inline-graphic>
</inline-formula>
as the expected ratio of amino acid
<italic>a</italic>
to
<inline-formula>
<inline-graphic xlink:href="msu173i38.jpg"></inline-graphic>
</inline-formula>
after viral growth if both are initially introduced into the mutant library at equal frequency. Mutations that enhance viral growth will have larger values of
<inline-formula>
<inline-graphic xlink:href="msu173i39.jpg"></inline-graphic>
</inline-formula>
, whereas mutations that hamper growth will have lower values of
<inline-formula>
<inline-graphic xlink:href="msu173i40.jpg"></inline-graphic>
</inline-formula>
. However,
<inline-formula>
<inline-graphic xlink:href="msu173i41.jpg"></inline-graphic>
</inline-formula>
cannot be simply interpreted as the fitness effect of mutating site
<italic>r</italic>
from
<italic>a</italic>
to
<inline-formula>
<inline-graphic xlink:href="msu173i42.jpg"></inline-graphic>
</inline-formula>
: Because most clones have multiple mutations, this ratio summarizes the effect of a mutation in a variety of related genetic backgrounds. A mutation can therefore have a ratio greater than one due to its inherent effect on viral growth or its effect on the tolerance for other mutations (or both). This analysis does not separate these factors, but experimental work (
<xref rid="msu173-B21" ref-type="bibr">Gong et al. 2013</xref>
) has shown that it is fairly common for one mutation to NP to alter the tolerance to a subsequent one.</p>
<p>The most naive approach is to set
<inline-formula>
<inline-graphic xlink:href="msu173i43.jpg"></inline-graphic>
</inline-formula>
proportional to the frequency of amino acid
<italic>a</italic>
in mutvirus-p1 divided by its frequency in mutDNA and then apply the normalization in
<xref ref-type="disp-formula" rid="msu173-M6">equation (6)</xref>
. However, such an approach is problematic for several reasons. First, it fails to account for errors (PCR, reverse transcription) that inflate the observed frequencies of some mutations. Second, estimating ratios by dividing finite counts is notoriously statistically biased (
<xref rid="msu173-B46" ref-type="bibr">Pearson 1910</xref>
;
<xref rid="msu173-B44" ref-type="bibr">Ogliore et al. 2011</xref>
). For example, in the limiting case where a mutation is counted once in mutvirus-p1 and not at all in mutDNA, the ratio is infinity—yet in practice such low counts give little confidence that enough variants have been assayed to estimate the true effect of the mutation.</p>
<p>To circumvent these problems, I used an approach that explicitly accounts for the sampling statistics. The approach begins with prior estimates that the
<inline-formula>
<inline-graphic xlink:href="msu173i44.jpg"></inline-graphic>
</inline-formula>
values are all equal and that the error and mutation rates for each site are equal to the library averages. Multinomial likelihood functions give the probability of observing a set of counts given the
<inline-formula>
<inline-graphic xlink:href="msu173i45.jpg"></inline-graphic>
</inline-formula>
values and the various error and mutation rates. The posterior mean of the
<inline-formula>
<inline-graphic xlink:href="msu173i46.jpg"></inline-graphic>
</inline-formula>
values is estimated by MCMC.</p>
<p>Use the counts in DNA to quantify errors due to PCR and sequencing. Use the counts in RNA to quantify errors due to reverse transcription. Assume that transcription of the viral genes from the reverse-genetics plasmids and subsequent replication of these genes by the influenza polymerase introduces a negligible number of new mutations. The second of these assumptions is supported by the fact that the mutation frequency in virus-p1 is close to that in RNA (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
). The first of these assumptions is supported by the fact that stop codons are no more frequent in RNA than in virus-p1 (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
)—deleterious stop codons arising during transcription will be purged during viral growth, while those arising from reverse transcription and sequencing errors will not.</p>
<p>At each site
<italic>r</italic>
, there are
<italic>n</italic>
<sub>codon</sub>
codons, indexed by
<italic>i</italic>
= 1, 2, …
<italic>n</italic>
<sub>codon.</sub>
Let
<inline-formula>
<inline-graphic xlink:href="msu173i47.jpg"></inline-graphic>
</inline-formula>
denote the WT codon at site
<italic>r</italic>
. Let
<inline-formula>
<inline-graphic xlink:href="msu173i48.jpg"></inline-graphic>
</inline-formula>
be the total number of sequencing reads at site
<italic>r</italic>
in DNA, and let
<inline-formula>
<inline-graphic xlink:href="msu173i49.jpg"></inline-graphic>
</inline-formula>
be the number of these reads that report codon
<italic>i</italic>
at site
<italic>r</italic>
, so that
<inline-formula>
<inline-graphic xlink:href="msu173i50.jpg"></inline-graphic>
</inline-formula>
. Similarly, let
<inline-formula>
<inline-graphic xlink:href="msu173i51.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i52.jpg"></inline-graphic>
</inline-formula>
be the total number of reads at site
<italic>r</italic>
and let
<inline-formula>
<inline-graphic xlink:href="msu173i53.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i54.jpg"></inline-graphic>
</inline-formula>
be the total number of these reads that report codon
<italic>i</italic>
at site
<italic>r</italic>
in mutDNA, RNA, and mutvirus-p1, respectively.</p>
<p>First consider the rate at which site
<italic>r</italic>
is erroneously read as some incorrect identity due to PCR or sequencing errors. Such errors are the only source of non-WT reads in the sequencing of DNA. For all
<inline-formula>
<inline-graphic xlink:href="msu173i55.jpg"></inline-graphic>
</inline-formula>
, define
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
as the rate at which site
<italic>r</italic>
is erroneously read as codon
<italic>i</italic>
in DNA. Define
<inline-formula>
<inline-graphic xlink:href="msu173i56.jpg"></inline-graphic>
</inline-formula>
to be the rate at which site
<italic>r</italic>
is correctly read as its WT identity of
<inline-formula>
<inline-graphic xlink:href="msu173i57.jpg"></inline-graphic>
</inline-formula>
in DNA. Then
<inline-formula>
<inline-graphic xlink:href="msu173i58.jpg"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i59.jpg"></inline-graphic>
</inline-formula>
denotes the expectation value. Define
<inline-formula>
<inline-graphic xlink:href="msu173i60.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="msu173i61.jpg"></inline-graphic>
</inline-formula>
as vectors of the
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
and
<inline-formula>
<inline-graphic xlink:href="msu173i62.jpg"></inline-graphic>
</inline-formula>
values, so the likelihood of observing
<inline-formula>
<inline-graphic xlink:href="msu173i63.jpg"></inline-graphic>
</inline-formula>
given
<inline-formula>
<inline-graphic xlink:href="msu173i64.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="msu173i65.jpg"></inline-graphic>
</inline-formula>
is
<disp-formula id="msu173-M7">
<label>(7)</label>
<graphic xlink:href="msu173m7.jpg" position="float"></graphic>
</disp-formula>
where Mult denotes the multinomial distribution.</p>
<p>Next consider the rate at which site
<italic>r</italic>
is erroneously copied during reverse transcription. These reverse-transcription errors combine with the PCR/sequencing errors defined by
<inline-formula>
<inline-graphic xlink:href="msu173i66.jpg"></inline-graphic>
</inline-formula>
to create non-WT reads in RNA. For all
<inline-formula>
<inline-graphic xlink:href="msu173i67.jpg"></inline-graphic>
</inline-formula>
, define
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
as the rate at which site
<italic>r</italic>
is miscopied to
<italic>i</italic>
during reverse transcription. Define
<inline-formula>
<inline-graphic xlink:href="msu173i68.jpg"></inline-graphic>
</inline-formula>
as the rate at which site
<italic>r</italic>
is correctly reverse transcribed. Ignore as negligibly rare the possibility that a site is subject to both a reverse transcription and sequencing/PCR error within the same clone (a reasonable assumption as both
<inline-formula>
<inline-graphic xlink:href="msu173i69.jpg"></inline-graphic>
</inline-formula>
and
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
are very small for
<inline-formula>
<inline-graphic xlink:href="msu173i70.jpg"></inline-graphic>
</inline-formula>
). Then
<inline-formula>
<inline-graphic xlink:href="msu173i71.jpg"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i72.jpg"></inline-graphic>
</inline-formula>
is the Kronecker delta (equal to one if
<inline-formula>
<inline-graphic xlink:href="msu173i73.jpg"></inline-graphic>
</inline-formula>
and zero otherwise). The likelihood of observing
<inline-formula>
<inline-graphic xlink:href="msu173i74.jpg"></inline-graphic>
</inline-formula>
given
<inline-formula>
<inline-graphic xlink:href="msu173i75.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i76.jpg"></inline-graphic>
</inline-formula>
is
<disp-formula id="msu173-M8">
<label>(8)</label>
<graphic xlink:href="msu173m8.jpg" position="float"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i77.jpg"></inline-graphic>
</inline-formula>
is a vector that is all zeros except for the element
<inline-formula>
<inline-graphic xlink:href="msu173i78.jpg"></inline-graphic>
</inline-formula>
.</p>
<p>Next consider the rate at which site
<italic>r</italic>
is mutated to some other codon in the plasmid mutant library. These mutations combine with the PCR/sequencing errors defined by
<inline-formula>
<inline-graphic xlink:href="msu173i79.jpg"></inline-graphic>
</inline-formula>
to create non-WT reads in mutDNA. For all
<inline-formula>
<inline-graphic xlink:href="msu173i80.jpg"></inline-graphic>
</inline-formula>
, define
<italic>μ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
as the rate at which site
<italic>r</italic>
is mutated to codon
<italic>i</italic>
in the mutant library. Define
<inline-formula>
<inline-graphic xlink:href="msu173i81.jpg"></inline-graphic>
</inline-formula>
as the rate at which site
<italic>r</italic>
is not mutated. Ignore as negligibly rare the possibility that a site is subject to both a mutation and a sequencing/PCR error within the same clone. Then
<inline-formula>
<inline-graphic xlink:href="msu173i82.jpg"></inline-graphic>
</inline-formula>
. The likelihood of observing
<inline-formula>
<inline-graphic xlink:href="msu173i83.jpg"></inline-graphic>
</inline-formula>
given
<inline-formula>
<inline-graphic xlink:href="msu173i84.jpg"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="msu173i85.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i86.jpg"></inline-graphic>
</inline-formula>
is
<disp-formula id="msu173-M9">
<label>(9)</label>
<graphic xlink:href="msu173m9.jpg" position="float"></graphic>
</disp-formula>
</p>
<p>Finally, consider the effect of the preferences of each site
<italic>r</italic>
for different amino acids, as denoted by the
<inline-formula>
<inline-graphic xlink:href="msu173i87.jpg"></inline-graphic>
</inline-formula>
values. Selection due to these preferences is manifested in “mutvirus.” This selection acts on the mutations in the mutant library (
<italic>μ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
), although the actual counts in mutvirus are also affected by the sequencing/PCR errors (
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
) and reverse-transcription errors (
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
). Again ignore as negligibly rare the possibility that a site is subject to more than one of these sources of mutation and error within a single clone. Let
<inline-formula>
<inline-graphic xlink:href="msu173i88.jpg"></inline-graphic>
</inline-formula>
denote the amino acid encoded by codon
<italic>i</italic>
. Let
<inline-formula>
<inline-graphic xlink:href="msu173i89.jpg"></inline-graphic>
</inline-formula>
be the vector of
<inline-formula>
<inline-graphic xlink:href="msu173i90.jpg"></inline-graphic>
</inline-formula>
values. Define the vector-valued function
<inline-formula>
<inline-graphic xlink:href="msu173i91.jpg"></inline-graphic>
</inline-formula>
as
<disp-formula id="msu173-M10">
<label>(10)</label>
<graphic xlink:href="msu173m10.jpg" position="float"></graphic>
</disp-formula>
so that this function returns a
<italic>n</italic>
<sub>codon</sub>
-element vector constructed from
<inline-formula>
<inline-graphic xlink:href="msu173i92.jpg"></inline-graphic>
</inline-formula>
. Because the selection in mutvirus due to the preferences
<inline-formula>
<inline-graphic xlink:href="msu173i93.jpg"></inline-graphic>
</inline-formula>
occurs after the mutagenesis
<italic>μ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
but before the reverse-transcription errors
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
and the sequencing/PCR errors
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
, then
<inline-formula>
<inline-graphic xlink:href="msu173i94.jpg"></inline-graphic>
</inline-formula>
<inline-formula>
<inline-graphic xlink:href="msu173i94a.jpg"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i95.jpg"></inline-graphic>
</inline-formula>
(where · denotes the dot product) is a normalization factor that accounts for the fact that changes in the frequency of one variant due to selection will influence the observed frequency of other variants. The likelihood of observing
<inline-formula>
<inline-graphic xlink:href="msu173i96.jpg"></inline-graphic>
</inline-formula>
given
<inline-formula>
<inline-graphic xlink:href="msu173i97.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i98.jpg"></inline-graphic>
</inline-formula>
is therefore
<disp-formula id="msu173-M11">
<label>(11)</label>
<graphic xlink:href="msu173m11.jpg" position="float"></graphic>
</disp-formula>
where ○ is the Hadamard (entry wise) product.</p>
<p>Specify priors over
<inline-formula>
<inline-graphic xlink:href="msu173i99.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i100.jpg"></inline-graphic>
</inline-formula>
in the form of Dirichlet distributions (denoted here by Dir). For the priors over the mutation rates
<inline-formula>
<inline-graphic xlink:href="msu173i101.jpg"></inline-graphic>
</inline-formula>
, I choose Dirichlet-distribution parameters, such that the mean of the prior expectation for the mutation rate at each site
<italic>r</italic>
and codon
<italic>i</italic>
is equal to the average value for all sites, estimated as the frequency in mutDNA minus the frequency in DNA (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
), denoted by
<inline-formula>
<inline-graphic xlink:href="msu173i102.jpg"></inline-graphic>
</inline-formula>
. So the prior is
<disp-formula id="msu173-M12">
<label>(12)</label>
<graphic xlink:href="msu173m12.jpg" position="float"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i103.jpg"></inline-graphic>
</inline-formula>
is the
<italic>n</italic>
<sub>codon</sub>
-element vector with elements
<inline-formula>
<inline-graphic xlink:href="msu173i104.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="msu173i105.jpg"></inline-graphic>
</inline-formula>
is the scalar concentration parameter.</p>
<p>For the priors over
<italic>ε
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
and
<italic>ρ
<sub>r</sub>
</italic>
<sub>,</sub>
<italic>
<sub>i</sub>
</italic>
, the Dirichlet-distribution parameters again represent the average value for all sites but now also depend on the number of nucleotide changes in the codon mutation because sequencing/PCR and reverse-transcription errors are far more likely to lead to single-nucleotide codon changes than multiple-nucleotide codon changes (
<xref ref-type="fig" rid="msu173-F3">fig. 3</xref>
). Let
<inline-formula>
<inline-graphic xlink:href="msu173i106.jpg"></inline-graphic>
</inline-formula>
be the number of nucleotide changes in the mutation from codon
<inline-formula>
<inline-graphic xlink:href="msu173i107.jpg"></inline-graphic>
</inline-formula>
to codon
<italic>i</italic>
. For example,
<inline-formula>
<inline-graphic xlink:href="msu173i108.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="msu173i109.jpg"></inline-graphic>
</inline-formula>
. Let
<inline-formula>
<inline-graphic xlink:href="msu173i110.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i111.jpg"></inline-graphic>
</inline-formula>
be the average error rates for one-, two-, and three-nucleotide codon mutations, respectively—these are estimated as the frequencies in DNA. So the prior is
<disp-formula id="msu173-M13">
<label>(13)</label>
<graphic xlink:href="msu173m13.jpg" position="float"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i112.jpg"></inline-graphic>
</inline-formula>
is the
<italic>n</italic>
<sub>codon</sub>
-element vector with elements
<inline-formula>
<inline-graphic xlink:href="msu173i113.jpg"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i114.jpg"></inline-graphic>
</inline-formula>
<inline-formula>
<inline-graphic xlink:href="msu173i114a.jpg"></inline-graphic>
</inline-formula>
, and where
<inline-formula>
<inline-graphic xlink:href="msu173i115.jpg"></inline-graphic>
</inline-formula>
is the scalar concentration parameter.</p>
<p>Similarly, let
<inline-formula>
<inline-graphic xlink:href="msu173i116.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="msu173i117.jpg"></inline-graphic>
</inline-formula>
be the average reverse-transcription error rates for one-, two-, and three-nucleotide codon mutations, respectively—these are estimated as the frequencies in RNA minus those in DNA. So the prior is
<disp-formula id="msu173-M14">
<label>(14)</label>
<graphic xlink:href="msu173m14.jpg" position="float"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i118.jpg"></inline-graphic>
</inline-formula>
is the
<italic>n</italic>
<sub>codon</sub>
-element vector with elements
<inline-formula>
<inline-graphic xlink:href="msu173i119.jpg"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i120.jpg"></inline-graphic>
</inline-formula>
<inline-formula>
<inline-graphic xlink:href="msu173i120a.jpg"></inline-graphic>
</inline-formula>
, and where
<italic>σ
<sub>ρ</sub>
</italic>
is the scalar concentration parameter.</p>
<p>Specify a symmetric Dirichlet-distribution prior over
<inline-formula>
<inline-graphic xlink:href="msu173i121.jpg"></inline-graphic>
</inline-formula>
(note that any other prior, such as one that favored WT, would implicitly favor certain identities based empirically on the WT sequence, and so would not be in the spirit of the parameter-free derivation of the
<inline-formula>
<inline-graphic xlink:href="msu173i122.jpg"></inline-graphic>
</inline-formula>
values employed here). Specifically, use a prior of
<disp-formula id="msu173-M15">
<label>(15)</label>
<graphic xlink:href="msu173m15.jpg" position="float"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="msu173i123.jpg"></inline-graphic>
</inline-formula>
is the
<italic>n</italic>
<sub>aa</sub>
-element vector that is all ones, and
<inline-formula>
<inline-graphic xlink:href="msu173i124.jpg"></inline-graphic>
</inline-formula>
is the scalar concentration parameter.</p>
<p>It is now possible to write expressions for the likelihoods and posterior probabilities. Let
<inline-formula>
<inline-graphic xlink:href="msu173i125.jpg"></inline-graphic>
</inline-formula>
<inline-formula>
<inline-graphic xlink:href="msu173i125a.jpg"></inline-graphic>
</inline-formula>
denote the full set of counts for site
<italic>r</italic>
. The likelihood of
<inline-formula>
<inline-graphic xlink:href="msu173i126.jpg"></inline-graphic>
</inline-formula>
given values for the preferences and mutation/error rates is
<disp-formula id="msu173-M16">
<label>(16)</label>
<graphic xlink:href="msu173m16.jpg" position="float"></graphic>
</disp-formula>
where the likelihoods that compose
<xref ref-type="disp-formula" rid="msu173-M16">equation (16)</xref>
are defined by
<xref ref-type="disp-formula" rid="msu173-M7 msu173-M8 msu173-M9 msu173-M10 msu173-M11">equations (7–11)</xref>
. The posterior probability of a specific value for the preferences and mutation/error rates is
<disp-formula id="msu173-M17">
<label>(17)</label>
<graphic xlink:href="msu173m17.jpg" position="float"></graphic>
</disp-formula>
<disp-formula id="msu173-M18">
<label>(18)</label>
<graphic xlink:href="msu173m18.jpg" position="float"></graphic>
</disp-formula>
where
<italic>C
<sub>r</sub>
</italic>
is a normalization constant that does not need to be explicitly calculated in the MCMC approach used here. The posterior over the preferences
<inline-formula>
<inline-graphic xlink:href="msu173i127.jpg"></inline-graphic>
</inline-formula>
can be calculated by integrating over
<xref ref-type="disp-formula" rid="msu173-M17">equation (17)</xref>
to give
<disp-formula id="msu173-M19">
<label>(19)</label>
<graphic xlink:href="msu173m19.jpg" position="float"></graphic>
</disp-formula>
where the integration is performed by MCMC. The posterior is summarized by its mean,
<disp-formula id="msu173-M20">
<label>(20)</label>
<graphic xlink:href="msu173m20.jpg" position="float"></graphic>
</disp-formula>
In practice, each replicate consists of four libraries (WT-1, WT-2, N334H-1, and N334H-2)—the posterior mean preferences inferred for each library within a replicate are averaged to give the estimated preferences for that replicate. The preferences within each replicate are highly correlated regardless of whether mutvirus-p1 or mutvirus-p2 is used as the mutvirus data set (
<xref ref-type="fig" rid="msu173-F5">fig. 5</xref>
<italic>A</italic>
and
<italic>B</italic>
). This correlation between passages is consistent with the interpretation of the preferences as the fraction of genetic backgrounds that tolerate a mutation (if it was a selection coefficient, there should be further enrichment upon further passage). The preferences averaged over both replicates serve as the “best” estimate and are displayed in
<xref ref-type="fig" rid="msu173-F5">figure 5</xref>
<italic>D</italic>
. This figure was created using the WebLogo 3 program (
<xref rid="msu173-B58" ref-type="bibr">Schneider and Stephens 1990</xref>
;
<xref rid="msu173-B10" ref-type="bibr">Crooks et al. 2004</xref>
).</p>
<p>
<xref ref-type="fig" rid="msu173-F5">Figure 5</xref>
<italic>D</italic>
also shows relative solvent accessibility (RSA) and secondary structure for residues present in chain C of NP crystal structure PDB 2IQH (
<xref rid="msu173-B74" ref-type="bibr">Ye et al. 2006</xref>
). The total accessible surface area (ASA) and the secondary structure for each residue in this monomer alone were calculated using DSSP (
<xref rid="msu173-B31" ref-type="bibr">Kabsch and Sander 1983</xref>
;
<xref rid="msu173-B30" ref-type="bibr">Joosten et al. 2011</xref>
). The RSAs are the total ASA divided by the maximum ASA defined in
<xref rid="msu173-B66" ref-type="bibr">Tien et al. (2013)</xref>
. The secondary structure codes returned by DSSP were grouped into three classes: Helix (DSSP codes G, H, or I), strand (DSSP codes B or E), and loop (any other DSSP code).</p>
</sec>
<sec>
<title>Phylogenetic Analyses</title>
<p>A set of NP coding sequences was assembled for human influenza lineages descended from a close relative the 1918 virus (H1N1 from 1918 to 1957, H2N2 from 1957 to 1968, H3N2 from 1968 to 2013, and seasonal H1N1 from 1977 to 2008). All full-length NP sequences from the Influenza Virus Resource (
<xref rid="msu173-B3" ref-type="bibr">Bao et al. 2008</xref>
) were downloaded, and up to three unique sequences per year from each of the four lineages described above were retained. These sequences were aligned using EMBOSS needle (
<xref rid="msu173-B52" ref-type="bibr">Rice et al. 2000</xref>
). Outlier sequences that correspond to heavily lab-adapted strains, lab recombinants, misannotated sequences, or zoonotic transfers (e.g., a small number of human H3N2 strains are from zoonotic swine variant H3N2 rather than the main human H3N2 lineage) were removed. This was done by first removing known outliers in the influenza databases (
<xref rid="msu173-B34" ref-type="bibr">Krasnitz et al. 2008</xref>
) and then using an analysis with RAxML (
<xref rid="msu173-B60" ref-type="bibr">Stamatakis 2006</xref>
) and Path-O-Gen (
<ext-link ext-link-type="uri" xlink:href="http://tree.bio.ed.ac.uk/software/pathogen/">http://tree.bio.ed.ac.uk/software/pathogen/</ext-link>
, last accessed May 31, 2014) to remove remaining sequences that were extreme outliers from the molecular clock. The final alignment after removing outliers consisted of 274 unique NP sequences.</p>
<p>Maximum-likelihood phylogenetic trees were constructed using codonPhyML (
<xref rid="msu173-B18" ref-type="bibr">Gil et al. 2013</xref>
). Two substitution models were used. The first was GY94 (
<xref rid="msu173-B20" ref-type="bibr">Goldman and Yang 1994</xref>
) using
<italic>CF3x4</italic>
equilibrium frequencies (
<xref rid="msu173-B47" ref-type="bibr">Pond et al. 2010</xref>
), a single transition–transversion ratio optimized by maximum likelihood, and a synonymous–nonsynonymous ratio drawn from four discrete gamma-distributed categories with mean and shape parameter optimized by maximum likelihood (
<xref rid="msu173-B72" ref-type="bibr">Yang et al. 2000</xref>
). The second was KOSI07+F (
<xref rid="msu173-B33" ref-type="bibr">Kosiol et al. 2007</xref>
), optimizing the relative transversion–transition ratio by maximum likelihood, and letting the relative synonymous–nonsynonymous ratio again be drawn from four gamma-distributed categories with mean and shape parameter optimized by maximum likelihood. The trees produced by codonPhyML are unrooted. These trees were rooted using Path-O-Gen (
<ext-link ext-link-type="uri" xlink:href="http://tree.bio.ed.ac.uk/software/pathogen/">http://tree.bio.ed.ac.uk/software/pathogen/</ext-link>
, last accessed May 31, 2014) and visualized with FigTree (
<ext-link ext-link-type="uri" xlink:href="http://tree.bio.ed.ac.uk/software/figtree/">http://tree.bio.ed.ac.uk/software/figtree/</ext-link>
, last accessed May 31, 2014) to create the images in
<xref ref-type="fig" rid="msu173-F7">figure 7</xref>
. The tree topologies are extremely similar for both models.</p>
<p>The evolutionary models were compared by using them to optimize the branch lengths of the fixed tree topologies in
<xref ref-type="fig" rid="msu173-F7">figure 7</xref>
so as to maximize the likelihood using HYPHY (
<xref rid="msu173-B48" ref-type="bibr">Pond et al. 2005</xref>
) for sites 2–498 (site 1 was not included, because the N-terminal methionine is conserved and was not mutated in the plasmid mutant libraries). HYPHY was used to calculate all likelihoods (even for models that could be handled by codonPhyML) for consistency in case these programs differ slightly in numerical accuracy. The results are shown in
<xref ref-type="table" rid="msu173-T6">tables 6</xref>
and
<xref ref-type="table" rid="msu173-T7">7</xref>
. Regardless of which tree topology was used, the experimentally determined evolutionary models outperformed all variants of GY94 and KOSI07+F. The experimentally determined evolutionary models performed best when using the preferences determined from the combined data from both replicates and using
<xref ref-type="disp-formula" rid="msu173-M3">equation (3)</xref>
to compute the fixation probabilities. Using the data from just one replicate also outperforms GY94 and KOSI07+F, although the likelihoods are slightly worse. In terms of the completeness with which mutations are sampled in the mutant viruses, replicate A is superior to replicate B as discussed above—and the former replicate gives higher likelihoods. If the fixation probabilities are instead determined using the method of Halpern and Bruno (
<xref rid="msu173-B24" ref-type="bibr">Halpern and Bruno 1998</xref>
) as in
<xref ref-type="disp-formula" rid="msu173-M4">equation (4)</xref>
, the experimentally determined models still outperform GY94 and KOSI07+F—but the likelihoods are substantially worse. To check that the experimentally determined models really do utilize the site-specific preferences information, the preferences were randomized among sites and likelihoods were computed. These randomized models perform vastly worse than any of the alternatives.</p>
<p>The variants of GY94 and KOSI07+F tested are listed in
<xref ref-type="table" rid="msu173-T6">table 6</xref>
. Various methods were used to estimate the nonsynonymous–synonymous ratio (
<italic>ω</italic>
): A single
<italic>ω</italic>
optimized by maximum likelihood; three discrete categories of
<italic>ω</italic>
< 1,
<italic>ω</italic>
= 1, and
<italic>ω</italic>
> 1 with the proportions and the
<italic>ω</italic>
values
<inline-formula>
<inline-graphic xlink:href="msu173i128.jpg"></inline-graphic>
</inline-formula>
estimated by maximum likelihood;
<italic>ω</italic>
drawn from four gamma-distributed categories with mean and shape estimated by maximum likelihood; and a beta distribution (ten categories) plus an additional category of
<italic>ω</italic>
> 1 with the shape parameters,
<italic>ω</italic>
> 1 value, and proportion in the final category estimated by maximum likelihood. These models are referred to M0, M2a, M5, and M7 in the literature (
<xref rid="msu173-B73" ref-type="bibr">Yang et al. 2005</xref>
). Another model optimized a different
<italic>ω</italic>
for each branch. Another model optimized a single
<italic>ω</italic>
but allowed the rates to be drawn from four gamma-distributed categories. Parameters were counted as follows: All contained equilibrium frequency parameters that were empirically estimated from the sequences under analysis: There are nine such parameters for GY94 using
<italic>CF3x4</italic>
(
<xref rid="msu173-B20" ref-type="bibr">Goldman and Yang 1994</xref>
;
<xref rid="msu173-B47" ref-type="bibr">Pond et al. 2010</xref>
) and 60 such parameters for KOSI07+F (
<xref rid="msu173-B33" ref-type="bibr">Kosiol et al. 2007</xref>
). In addition, all variants contain a transition–transversion ratio optimized by likelihood. Finally, all variants contain one or more
<italic>ω</italic>
parameters as described above.</p>
</sec>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">Supplementary figures S1</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">S2</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">files S1</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msu173/-/DC1">S2</ext-link>
are available at
<italic>Molecular Biology and Evolution</italic>
online (
<ext-link ext-link-type="uri" xlink:href="http://www.mbe.oxfordjournals.org/">http://www.mbe.oxfordjournals.org/</ext-link>
).</p>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_31_8_1956__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_msu173_Supplementary_figure_1.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_msu173_Supplementary_figure_2.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_msu173_Supplementary_file_1.xls"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_msu173_Supplementary_file_2.xls"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The author thanks D. Fowler, J. Kitzman, A. Adey, O. Ashenberg, and T. Bedford for helpful discussions. This work was supported by the
<funding-source>National Institute of General Medical Sciences of the National Institutes of Health</funding-source>
(grant number
<award-id>R01 GM102198</award-id>
).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="msu173-B1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Araya</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Fowler</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>Deep mutational scanning: assessing protein function on a massive scale</article-title>
<source>Trends Biotechnol.</source>
<year>2011</year>
<volume>29</volume>
<fpage>435</fpage>
<lpage>442</lpage>
<pub-id pub-id-type="pmid">21561674</pub-id>
</element-citation>
</ref>
<ref id="msu173-B2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashenberg</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>LI</given-names>
</name>
<name>
<surname>Bloom</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>Mutational effects on stability are largely conserved during protein evolution</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2013</year>
<volume>110</volume>
<fpage>21071</fpage>
<lpage>21076</lpage>
<pub-id pub-id-type="pmid">24324165</pub-id>
</element-citation>
</ref>
<ref id="msu173-B3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bolotov</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dernovoy</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kiryutin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zaslavsky</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Tatusova</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ostell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>The influenza virus resource at the National Center for Biotechnology Information</article-title>
<source>J Virol.</source>
<year>2008</year>
<volume>82</volume>
<fpage>596</fpage>
<lpage>601</lpage>
<pub-id pub-id-type="pmid">17942553</pub-id>
</element-citation>
</ref>
<ref id="msu173-B4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bershtein</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Segal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bekerman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tokuriki</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tawfik</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein</article-title>
<source>Nature</source>
<year>2006</year>
<volume>444</volume>
<fpage>929</fpage>
<lpage>932</lpage>
<pub-id pub-id-type="pmid">17122770</pub-id>
</element-citation>
</ref>
<ref id="msu173-B5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bloom</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>LI</given-names>
</name>
<name>
<surname>Baltimore</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Permissive secondary mutations enable the evolution of influenza oseltamivir resistance</article-title>
<source>Science</source>
<year>2010</year>
<volume>328</volume>
<fpage>1272</fpage>
<lpage>1275</lpage>
<pub-id pub-id-type="pmid">20522774</pub-id>
</element-citation>
</ref>
<ref id="msu173-B6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bloom</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Labthavikul</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Otey</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Arnold</surname>
<given-names>FH</given-names>
</name>
</person-group>
<article-title>Protein stability promotes evolvability</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2006</year>
<volume>103</volume>
<fpage>5869</fpage>
<lpage>5874</lpage>
<pub-id pub-id-type="pmid">16581913</pub-id>
</element-citation>
</ref>
<ref id="msu173-B7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bloom</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Raval</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>CO</given-names>
</name>
</person-group>
<article-title>Thermodynamics of neutral protein evolution</article-title>
<source>Genetics</source>
<year>2007</year>
<volume>175</volume>
<fpage>255</fpage>
<lpage>266</lpage>
<pub-id pub-id-type="pmid">17110496</pub-id>
</element-citation>
</ref>
<ref id="msu173-B8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bloom</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Silberg</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>CO</given-names>
</name>
<name>
<surname>Drummond</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Adami</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Arnold</surname>
<given-names>FH</given-names>
</name>
</person-group>
<article-title>Thermodynamic prediction of protein neutrality</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2005</year>
<volume>102</volume>
<fpage>606</fpage>
<lpage>611</lpage>
<pub-id pub-id-type="pmid">15644440</pub-id>
</element-citation>
</ref>
<ref id="msu173-B9">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Cirino</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Mayer</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Umeno</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Directed evolution library creation: methods and protocols</article-title>
<source>Generating mutant libraries using error-prone PCR</source>
<year>2003</year>
<publisher-name>Humana Press</publisher-name>
<fpage>3</fpage>
<lpage>9</lpage>
</element-citation>
</ref>
<ref id="msu173-B10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crooks</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chandonia</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>Weblogo: a sequence logo generator</article-title>
<source>Genome Res.</source>
<year>2004</year>
<volume>14</volume>
<fpage>1188</fpage>
<lpage>1190</lpage>
<pub-id pub-id-type="pmid">15173120</pub-id>
</element-citation>
</ref>
<ref id="msu173-B11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dang</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>QS</given-names>
</name>
<name>
<surname>Gascuel</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>VS</given-names>
</name>
</person-group>
<article-title>Flu, an amino acid substitution model for influenza proteins</article-title>
<source>BMC Evol Biol.</source>
<year>2010</year>
<volume>10</volume>
<fpage>99</fpage>
<pub-id pub-id-type="pmid">20384985</pub-id>
</element-citation>
</ref>
<ref id="msu173-B12">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Maio</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Schlötterer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kosiol</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Estimating empirical codon hidden Markov models</article-title>
<source>Mol Biol Evol.</source>
<year>2013</year>
<volume>30</volume>
<fpage>725</fpage>
<lpage>736</lpage>
<pub-id pub-id-type="pmid">23188590</pub-id>
</element-citation>
</ref>
<ref id="msu173-B13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Desai</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Fisher</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>Beneficial mutation–selection balance and the effect of linkage on positive selection</article-title>
<source>Genetics</source>
<year>2007</year>
<volume>176</volume>
<fpage>1759</fpage>
<lpage>1798</lpage>
<pub-id pub-id-type="pmid">17483432</pub-id>
</element-citation>
</ref>
<ref id="msu173-B14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Felsenstein</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Maximum likelihood and minimum-step methods for estimating evolutionary trees from data on discrete characters</article-title>
<source>Syst Zool.</source>
<year>1973</year>
<volume>22</volume>
<fpage>240</fpage>
<lpage>249</lpage>
</element-citation>
</ref>
<ref id="msu173-B15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Felsenstein</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Evolutionary trees from DNA sequences: a maximum likelihood approach</article-title>
<source>J Mol Evol.</source>
<year>1981</year>
<volume>17</volume>
<fpage>368</fpage>
<lpage>376</lpage>
<pub-id pub-id-type="pmid">7288891</pub-id>
</element-citation>
</ref>
<ref id="msu173-B16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Firnberg</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ostermeier</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>PFunkel: efficient, expansive, user-defined mutagenesis</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<fpage>e52031</fpage>
<pub-id pub-id-type="pmid">23284860</pub-id>
</element-citation>
</ref>
<ref id="msu173-B17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fowler</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Araya</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Fleishman</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>EH</given-names>
</name>
<name>
<surname>Stephany</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Fields</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>High-resolution mapping of protein sequence-function relationships</article-title>
<source>Nat Methods.</source>
<year>2010</year>
<volume>7</volume>
<fpage>741</fpage>
<lpage>746</lpage>
<pub-id pub-id-type="pmid">20711194</pub-id>
</element-citation>
</ref>
<ref id="msu173-B18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gil</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zanetti</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Zoller</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Anisimova</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models</article-title>
<source>Mol Biol Evol.</source>
<year>2013</year>
<volume>30</volume>
<fpage>1270</fpage>
<lpage>1280</lpage>
<pub-id pub-id-type="pmid">23436912</pub-id>
</element-citation>
</ref>
<ref id="msu173-B19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Thorne</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>DT</given-names>
</name>
</person-group>
<article-title>Assessing the impact of secondary structure and solvent accessibility on protein evolution</article-title>
<source>Genetics</source>
<year>1998</year>
<volume>149</volume>
<fpage>445</fpage>
<lpage>458</lpage>
<pub-id pub-id-type="pmid">9584116</pub-id>
</element-citation>
</ref>
<ref id="msu173-B20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>A codon-based model of nucleotide substitution probabilities for protein-coding DNA sequences</article-title>
<source>Mol Biol Evol.</source>
<year>1994</year>
<volume>11</volume>
<fpage>725</fpage>
<lpage>736</lpage>
<pub-id pub-id-type="pmid">7968486</pub-id>
</element-citation>
</ref>
<ref id="msu173-B21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gong</surname>
<given-names>LI</given-names>
</name>
<name>
<surname>Suchard</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Bloom</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>Stability-mediated epistasis constrains the evolution of an influenza protein</article-title>
<source>eLife</source>
<year>2013</year>
<volume>2</volume>
<fpage>e00631</fpage>
<pub-id pub-id-type="pmid">23682315</pub-id>
</element-citation>
</ref>
<ref id="msu173-B22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goto</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kawaoka</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>A novel mechanism for the acquisition of virulence by a human influenza a virus</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>1998</year>
<volume>95</volume>
<fpage>10224</fpage>
<lpage>10228</lpage>
<pub-id pub-id-type="pmid">9707628</pub-id>
</element-citation>
</ref>
<ref id="msu173-B23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halligan</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Keightley</surname>
<given-names>PD</given-names>
</name>
</person-group>
<article-title>Spontaneous mutation accumulation studies in evolutionary genetics</article-title>
<source>Annu Rev Ecol Evol Syst.</source>
<year>2009</year>
<volume>40</volume>
<fpage>151</fpage>
<lpage>172</lpage>
</element-citation>
</ref>
<ref id="msu173-B24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Bruno</surname>
<given-names>WJ</given-names>
</name>
</person-group>
<article-title>Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies</article-title>
<source>Mol Biol Evol.</source>
<year>1998</year>
<volume>15</volume>
<fpage>910</fpage>
<lpage>917</lpage>
<pub-id pub-id-type="pmid">9656490</pub-id>
</element-citation>
</ref>
<ref id="msu173-B25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hiatt</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Patwardhan</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>EH</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shendure</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Parallel, tag-directed assembly of locally derived short sequence reads</article-title>
<source>Nat Methods.</source>
<year>2010</year>
<volume>7</volume>
<fpage>119</fpage>
<lpage>122</lpage>
<pub-id pub-id-type="pmid">20081835</pub-id>
</element-citation>
</ref>
<ref id="msu173-B26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmann</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Neumann</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kawaoka</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Hobom</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Webster</surname>
<given-names>RG</given-names>
</name>
</person-group>
<article-title>A DNA transfection system for generation of influenza A virus from eight plasmids</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2000</year>
<volume>97</volume>
<fpage>6108</fpage>
<lpage>6113</lpage>
<pub-id pub-id-type="pmid">10801978</pub-id>
</element-citation>
</ref>
<ref id="msu173-B27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hou</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Miranda</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Serious overestimation in quantitative PCR by circular (supercoiled) plasmid standard: microalgal pcna as the model gene</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<fpage>e9545</fpage>
<pub-id pub-id-type="pmid">20221433</pub-id>
</element-citation>
</ref>
<ref id="msu173-B28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huelsenbeck</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Ronquist</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bollback</surname>
<given-names>JP</given-names>
</name>
</person-group>
<article-title>Bayesian inference of phylogeny and its impact on evolutionary biology</article-title>
<source>Science</source>
<year>2001</year>
<volume>294</volume>
<fpage>2310</fpage>
<lpage>2314</lpage>
<pub-id pub-id-type="pmid">11743192</pub-id>
</element-citation>
</ref>
<ref id="msu173-B29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jain</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Varadarajan</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library</article-title>
<source>Anal Biochem.</source>
<year>2014</year>
<volume>449</volume>
<fpage>90</fpage>
<lpage>98</lpage>
<pub-id pub-id-type="pmid">24333246</pub-id>
</element-citation>
</ref>
<ref id="msu173-B30">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Joosten</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>te Beek</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Krieger</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Hekkelman</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Hooft</surname>
<given-names>RW</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Vriend</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>A series of PDB related databases for everyday needs</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D411</fpage>
<lpage>D419</lpage>
<pub-id pub-id-type="pmid">21071423</pub-id>
</element-citation>
</ref>
<ref id="msu173-B31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kabsch</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features</article-title>
<source>Biopolymers</source>
<year>1983</year>
<volume>22</volume>
<fpage>2577</fpage>
<lpage>2637</lpage>
<pub-id pub-id-type="pmid">6667333</pub-id>
</element-citation>
</ref>
<ref id="msu173-B32">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kleinman</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Rodrigue</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lartillot</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Statistical potentials for improved structurally constrained evolutionary models</article-title>
<source>Mol Biol Evol.</source>
<year>2010</year>
<volume>27</volume>
<fpage>1546</fpage>
<lpage>1560</lpage>
<pub-id pub-id-type="pmid">20159780</pub-id>
</element-citation>
</ref>
<ref id="msu173-B33">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kosiol</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>An empirical codon model for protein sequence evolution</article-title>
<source>Mol Biol Evol.</source>
<year>2007</year>
<volume>24</volume>
<fpage>1464</fpage>
<lpage>1479</lpage>
<pub-id pub-id-type="pmid">17400572</pub-id>
</element-citation>
</ref>
<ref id="msu173-B34">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krasnitz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Rabadan</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Anomalies in the influenza virus genome database: new biology or laboratory errors?</article-title>
<source>J Virology</source>
<year>2008</year>
<volume>82</volume>
<fpage>8947</fpage>
<lpage>8950</lpage>
<pub-id pub-id-type="pmid">18579605</pub-id>
</element-citation>
</ref>
<ref id="msu173-B35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lartillot</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process</article-title>
<source>Mol Biol Evol.</source>
<year>2004</year>
<volume>21</volume>
<fpage>1095</fpage>
<lpage>1109</lpage>
<pub-id pub-id-type="pmid">15014145</pub-id>
</element-citation>
</ref>
<ref id="msu173-B36">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>SQ</given-names>
</name>
<name>
<surname>Lartillot</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Gascuel</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Phylogenetic mixture models for proteins</article-title>
<source>Philos Trans R Soc B.</source>
<year>2008</year>
<volume>363</volume>
<fpage>3965</fpage>
<lpage>3976</lpage>
</element-citation>
</ref>
<ref id="msu173-B37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lou</surname>
<given-names>DI</given-names>
</name>
<name>
<surname>Hussmann</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>McBee</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Acevedo</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Andino</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Press</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Sawyer</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2013</year>
<volume>110</volume>
<fpage>19872</fpage>
<lpage>19877</lpage>
<pub-id pub-id-type="pmid">24243955</pub-id>
</element-citation>
</ref>
<ref id="msu173-B38">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lunzer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Golding</surname>
<given-names>GB</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Pervasive cryptic epistasis in molecular evolution</article-title>
<source>PLoS Genet.</source>
<year>2010</year>
<volume>6</volume>
<fpage>e1001162</fpage>
<pub-id pub-id-type="pmid">20975933</pub-id>
</element-citation>
</ref>
<ref id="msu173-B39">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marsh</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Rabadán</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Palese</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Highly conserved regions of influenza a virus polymerase gene segments are critical for efficient viral RNA packaging</article-title>
<source>J Virology.</source>
<year>2008</year>
<volume>82</volume>
<fpage>2295</fpage>
<lpage>2304</lpage>
<pub-id pub-id-type="pmid">18094182</pub-id>
</element-citation>
</ref>
<ref id="msu173-B40">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Melamed</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Gamble</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Fields</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Deep mutational scanning of an RRM domain of the
<italic>Saccharomyces cerevisiae</italic>
poly (a)-binding protein</article-title>
<source>RNA</source>
<year>2013</year>
<volume>19</volume>
<fpage>1537</fpage>
<lpage>1551</lpage>
<pub-id pub-id-type="pmid">24064791</pub-id>
</element-citation>
</ref>
<ref id="msu173-B41">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Metropolis</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rosenbluth</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>Rosenbluth</surname>
<given-names>MN</given-names>
</name>
<name>
<surname>Teller</surname>
<given-names>AH</given-names>
</name>
<name>
<surname>Teller</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Equation of state calculations by fast computing machines</article-title>
<source>J Chem Phys.</source>
<year>1953</year>
<volume>21</volume>
<fpage>1087</fpage>
<lpage>1092</lpage>
</element-citation>
</ref>
<ref id="msu173-B42">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neumann</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ito</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Perez</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Donis</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hoffmann</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Generation of influenza A viruses entirely from cloned cDNAs</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>1999</year>
<volume>96</volume>
<fpage>9345</fpage>
<lpage>9350</lpage>
<pub-id pub-id-type="pmid">10430945</pub-id>
</element-citation>
</ref>
<ref id="msu173-B43">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neylon</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>1448</fpage>
<lpage>1459</lpage>
<pub-id pub-id-type="pmid">14990750</pub-id>
</element-citation>
</ref>
<ref id="msu173-B44">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ogliore</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Huss</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Nagashima</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Ratio estimation in SIMS analysis.
<italic>Nucl Instr Meth Phys Res B</italic>
</article-title>
<year>2011</year>
<volume>269</volume>
<fpage>1910</fpage>
<lpage>1918</lpage>
</element-citation>
</ref>
<ref id="msu173-B45">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parvin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Moscona</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Leider</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Palese</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Measurement of the mutation rates of animal viruses: influenza a virus and poliovirus type 1</article-title>
<source>J Virology.</source>
<year>1986</year>
<volume>59</volume>
<fpage>377</fpage>
<lpage>383</lpage>
<pub-id pub-id-type="pmid">3016304</pub-id>
</element-citation>
</ref>
<ref id="msu173-B46">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearson</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>On the constants of index-distributions as deduced from the like constants for the components of the ratio, with special reference to the opsonic index</article-title>
<source>Biometrika</source>
<year>1910</year>
<volume>7</volume>
<fpage>531</fpage>
<lpage>541</lpage>
</element-citation>
</ref>
<ref id="msu173-B47">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pond</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Delport</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Muse</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Scheffler</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Correcting the bias of empirical frequency parameter estimators in codon models</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<fpage>e11230</fpage>
<pub-id pub-id-type="pmid">20689581</pub-id>
</element-citation>
</ref>
<ref id="msu173-B48">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pond</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Frost</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Muse</surname>
<given-names>SV</given-names>
</name>
</person-group>
<article-title>HyPhy: hypothesis testing using phylogenies</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>676</fpage>
<lpage>679</lpage>
<pub-id pub-id-type="pmid">15509596</pub-id>
</element-citation>
</ref>
<ref id="msu173-B49">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Portela</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Digard</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>The influenza virus nucleoprotein: a multifunctional RNA-binding protein pivotal to virus replication</article-title>
<source>J Gen Virol.</source>
<year>2002</year>
<volume>83</volume>
<fpage>723</fpage>
<lpage>734</lpage>
<pub-id pub-id-type="pmid">11907320</pub-id>
</element-citation>
</ref>
<ref id="msu173-B50">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Posada</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Buckley</surname>
<given-names>TR</given-names>
</name>
</person-group>
<article-title>Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests</article-title>
<source>Syst Biol.</source>
<year>2004</year>
<volume>53</volume>
<fpage>793</fpage>
<lpage>808</lpage>
<pub-id pub-id-type="pmid">15545256</pub-id>
</element-citation>
</ref>
<ref id="msu173-B51">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Potapov</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schreiber</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details</article-title>
<source>Prot Eng Des Sel.</source>
<year>2009</year>
<volume>22</volume>
<fpage>553</fpage>
<lpage>560</lpage>
</element-citation>
</ref>
<ref id="msu173-B52">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rice</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Longden</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Bleasby</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Emboss: the European molecular biology open software suite</article-title>
<source>Trends Genet.</source>
<year>2000</year>
<volume>16</volume>
<fpage>276</fpage>
<lpage>277</lpage>
<pub-id pub-id-type="pmid">10827456</pub-id>
</element-citation>
</ref>
<ref id="msu173-B53">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodrigue</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>On the statistical interpretation of site-specific variables in phylogeny-based substitution models</article-title>
<source>Genetics</source>
<year>2013</year>
<volume>193</volume>
<fpage>557</fpage>
<lpage>564</lpage>
<pub-id pub-id-type="pmid">23222651</pub-id>
</element-citation>
</ref>
<ref id="msu173-B54">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodrigue</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kleinman</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Lartillot</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Computational methods for evaluating phylogenetic models of coding sequence evolution with dependence between codons</article-title>
<source>Mol Biol Evol.</source>
<year>2009</year>
<volume>26</volume>
<fpage>1663</fpage>
<lpage>1676</lpage>
<pub-id pub-id-type="pmid">19383983</pub-id>
</element-citation>
</ref>
<ref id="msu173-B55">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodrigue</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Lartillot</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2010</year>
<volume>107</volume>
<fpage>4629</fpage>
<lpage>4634</lpage>
<pub-id pub-id-type="pmid">20176949</pub-id>
</element-citation>
</ref>
<ref id="msu173-B56">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roscoe</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Thayer</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Zeldovich</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Fushman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bolon</surname>
<given-names>DN</given-names>
</name>
</person-group>
<article-title>Analyses of the effects of all ubiquitin point mutants on yeast growth rate</article-title>
<source>J Mol Biol.</source>
<year>2013</year>
<volume>425</volume>
<issue>8</issue>
<fpage>1363</fpage>
<lpage>1377</lpage>
<pub-id pub-id-type="pmid">23376099</pub-id>
</element-citation>
</ref>
<ref id="msu173-B57">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schmitt</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Kennedy</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Salk</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Hiatt</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Loeb</surname>
<given-names>LA</given-names>
</name>
</person-group>
<article-title>Detection of ultra-rare mutations by next-generation sequencing</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2012</year>
<volume>109</volume>
<fpage>14508</fpage>
<lpage>14513</lpage>
<pub-id pub-id-type="pmid">22853953</pub-id>
</element-citation>
</ref>
<ref id="msu173-B58">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schneider</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>RM</given-names>
</name>
</person-group>
<article-title>Sequence logos: a new way to display consensus sequences</article-title>
<source>Nucleic Acids Res.</source>
<year>1990</year>
<volume>18</volume>
<fpage>6097</fpage>
<lpage>6100</lpage>
<pub-id pub-id-type="pmid">2172928</pub-id>
</element-citation>
</ref>
<ref id="msu173-B59">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Serrano</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Fersht</surname>
<given-names>AR</given-names>
</name>
</person-group>
<article-title>Step-wise mutation of barnase to binase: a procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability</article-title>
<source>J Mol Biol.</source>
<year>1993</year>
<volume>233</volume>
<fpage>305</fpage>
<lpage>312</lpage>
<pub-id pub-id-type="pmid">8377205</pub-id>
</element-citation>
</ref>
<ref id="msu173-B60">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>2688</fpage>
<lpage>2690</lpage>
<pub-id pub-id-type="pmid">16928733</pub-id>
</element-citation>
</ref>
<ref id="msu173-B61">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Starita</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Pruneda</surname>
<given-names>JN</given-names>
</name>
<name>
<surname>Lo</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Fowler</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Hiatt</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Shendure</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Brzovic</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Fields</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Klevit</surname>
<given-names>RE</given-names>
</name>
</person-group>
<article-title>Activity-enhancing mutations in an e3 ubiquitin ligase identified by high-throughput mutagenesis</article-title>
<source>Proc Natl Acad Sci U S A.</source>
<year>2013</year>
<volume>110</volume>
<fpage>E1263</fpage>
<lpage>E1272</lpage>
<pub-id pub-id-type="pmid">23509263</pub-id>
</element-citation>
</ref>
<ref id="msu173-B62">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tamuri</surname>
<given-names>AU</given-names>
</name>
<name>
<surname>dos Reis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Goldstein</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>Estimating the distribution of selection coefficients from phylogenetic data using sitewise mutation-selection models</article-title>
<source>Genetics</source>
<year>2012</year>
<volume>190</volume>
<fpage>1101</fpage>
<lpage>1115</lpage>
<pub-id pub-id-type="pmid">22209901</pub-id>
</element-citation>
</ref>
<ref id="msu173-B63">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tamuri</surname>
<given-names>AU</given-names>
</name>
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Dos Reis</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>A penalized likelihood method for estimating the distribution of selection coefficients from phylogenetic data</article-title>
<source>Genetics</source>
<year>2014</year>
<volume>197</volume>
<fpage>257</fpage>
<lpage>271</lpage>
<pub-id pub-id-type="pmid">24532780</pub-id>
</element-citation>
</ref>
<ref id="msu173-B64">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thorne</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Choi</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Higgs</surname>
<given-names>PG</given-names>
</name>
<name>
<surname>Kishino</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Population genetics without intraspecific data</article-title>
<source>Mol Biol Evol.</source>
<year>2007</year>
<volume>24</volume>
<fpage>1667</fpage>
<lpage>1677</lpage>
<pub-id pub-id-type="pmid">17470435</pub-id>
</element-citation>
</ref>
<ref id="msu173-B65">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thorne</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>DT</given-names>
</name>
</person-group>
<article-title>Combining protein evolution and secondary structure</article-title>
<source>Mol Biol Evol.</source>
<year>1996</year>
<volume>13</volume>
<fpage>666</fpage>
<lpage>673</lpage>
<pub-id pub-id-type="pmid">8676741</pub-id>
</element-citation>
</ref>
<ref id="msu173-B66">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tien</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Spielman</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>CO</given-names>
</name>
</person-group>
<article-title>Maximum allowed solvent accessibilites of residues in proteins</article-title>
<source>PLoS One</source>
<year>2013</year>
<volume>8</volume>
<fpage>e80635</fpage>
<pub-id pub-id-type="pmid">24278298</pub-id>
</element-citation>
</ref>
<ref id="msu173-B67">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Traxlmayr</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Hasenhindl</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hackl</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Stadlmayr</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rybka</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Borth</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Grillari</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rüker</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Obinger</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Construction of a stability landscape of the CH3 domain of human igg1 by combining directed evolution with high throughput sequencing</article-title>
<source>J Mol Biol.</source>
<year>2012</year>
<volume>423</volume>
<fpage>397</fpage>
<lpage>412</lpage>
<pub-id pub-id-type="pmid">22846908</pub-id>
</element-citation>
</ref>
<ref id="msu173-B68">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Susko</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Roger</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny</article-title>
<source>BMC Evol Biol.</source>
<year>2008</year>
<volume>8</volume>
<fpage>331</fpage>
<pub-id pub-id-type="pmid">19087270</pub-id>
</element-citation>
</ref>
<ref id="msu173-B69">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Suchard</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Drummond</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>Bayesian selection of nucleotide substitution models and their site assignments</article-title>
<source>Mol Biol Evol.</source>
<year>2013</year>
<volume>30</volume>
<fpage>669</fpage>
<lpage>688</lpage>
<pub-id pub-id-type="pmid">23233462</pub-id>
</element-citation>
</ref>
<ref id="msu173-B70">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods</article-title>
<source>J Mol Evol.</source>
<year>1994</year>
<volume>39</volume>
<fpage>306</fpage>
<lpage>314</lpage>
<pub-id pub-id-type="pmid">7932792</pub-id>
</element-citation>
</ref>
<ref id="msu173-B71">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Synonymous and nonsynonymous rate variation in nuclear genes of mammals</article-title>
<source>J Mol Evol.</source>
<year>1998</year>
<volume>46</volume>
<fpage>409</fpage>
<lpage>418</lpage>
<pub-id pub-id-type="pmid">9541535</pub-id>
</element-citation>
</ref>
<ref id="msu173-B72">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>AMK</given-names>
</name>
</person-group>
<article-title>Codon-substitution models for heterogeneous selection pressure at amino acid sites</article-title>
<source>Genetics</source>
<year>2000</year>
<volume>155</volume>
<fpage>431</fpage>
<lpage>449</lpage>
<pub-id pub-id-type="pmid">10790415</pub-id>
</element-citation>
</ref>
<ref id="msu173-B73">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>WS</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Bayes empirical Bayes inference of amino acid sites under positive selection</article-title>
<source>Mol Biol Evol.</source>
<year>2005</year>
<volume>22</volume>
<fpage>1107</fpage>
<lpage>1118</lpage>
<pub-id pub-id-type="pmid">15689528</pub-id>
</element-citation>
</ref>
<ref id="msu173-B74">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ye</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Krug</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Tao</surname>
<given-names>YJ</given-names>
</name>
</person-group>
<article-title>The mechanism by which influenza a virus nucleoprotein forms oligomers and binds RNA</article-title>
<source>Nature</source>
<year>2006</year>
<volume>444</volume>
<fpage>1078</fpage>
<lpage>1082</lpage>
<pub-id pub-id-type="pmid">17151603</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A28  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000A28  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021