Serveur d'exploration sur l'oranger

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Identifieur interne : 001262 ( Pmc/Corpus ); précédent : 001261; suivant : 001263

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Auteurs : Martin Mascher ; Gary J. Muehlbauer ; Daniel S. Rokhsar ; Jarrod Chapman ; Jeremy Schmutz ; Kerrie Barry ; María Mu Oz-Amatriaín ; Timothy J. Close ; Roger P. Wise ; Alan H. Schulman ; Axel Himmelbach ; Klaus Fx Mayer ; Uwe Scholz ; Jesse A. Poland ; Nils Stein ; Robbie Waugh

Source :

RBID : PMC:4298792

Abstract

Next-generation whole-genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence-based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost-efficient establishment of powerful genomic information for many species.


Url:
DOI: 10.1111/tpj.12319
PubMed: 23998490
PubMed Central: 4298792

Links to Exploration step

PMC:4298792

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)</title>
<author>
<name sortKey="Mascher, Martin" sort="Mascher, Martin" uniqKey="Mascher M" first="Martin" last="Mascher">Martin Mascher</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Muehlbauer, Gary J" sort="Muehlbauer, Gary J" uniqKey="Muehlbauer G" first="Gary J" last="Muehlbauer">Gary J. Muehlbauer</name>
<affiliation>
<nlm:aff id="au2">
<institution>University of Minnesota, Department of Agronomy and Plant Genetics</institution>
<addr-line>St Paul, MN, 55108, USA</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="au3">
<institution>University of Minnesota, Department of Plant Biology</institution>
<addr-line>St Paul, MN 55108, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rokhsar, Daniel S" sort="Rokhsar, Daniel S" uniqKey="Rokhsar D" first="Daniel S" last="Rokhsar">Daniel S. Rokhsar</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="au5">
<institution>Department of Molecular and Cell Biology, University of California</institution>
<addr-line>Berkeley, CA, 94720, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chapman, Jarrod" sort="Chapman, Jarrod" uniqKey="Chapman J" first="Jarrod" last="Chapman">Jarrod Chapman</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schmutz, Jeremy" sort="Schmutz, Jeremy" uniqKey="Schmutz J" first="Jeremy" last="Schmutz">Jeremy Schmutz</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="au6">
<institution>HudsonAlpha Institute of Biotechnology</institution>
<addr-line>Huntsville, AL, 35806, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Barry, Kerrie" sort="Barry, Kerrie" uniqKey="Barry K" first="Kerrie" last="Barry">Kerrie Barry</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mu Oz Amatriain, Maria" sort="Mu Oz Amatriain, Maria" uniqKey="Mu Oz Amatriain M" first="María" last="Mu Oz-Amatriaín">María Mu Oz-Amatriaín</name>
<affiliation>
<nlm:aff id="au2">
<institution>University of Minnesota, Department of Agronomy and Plant Genetics</institution>
<addr-line>St Paul, MN, 55108, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Close, Timothy J" sort="Close, Timothy J" uniqKey="Close T" first="Timothy J" last="Close">Timothy J. Close</name>
<affiliation>
<nlm:aff id="au7">
<institution>Department of Botany & Plant Sciences, University of California</institution>
<addr-line>Riverside, CA, 92521, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wise, Roger P" sort="Wise, Roger P" uniqKey="Wise R" first="Roger P" last="Wise">Roger P. Wise</name>
<affiliation>
<nlm:aff id="au8">
<institution>US Department of Agriculture/Agricultural Research Service, Department of Plant Pathology & Microbiology, Iowa State University</institution>
<addr-line>Ames, IA, 50011–1020, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schulman, Alan H" sort="Schulman, Alan H" uniqKey="Schulman A" first="Alan H" last="Schulman">Alan H. Schulman</name>
<affiliation>
<nlm:aff id="au9">
<institution>Institute of Biotechnology, University of Helsinki/MTT Agrifood Research</institution>
<addr-line>PO Box 65, 00014, Helsinki, Finland</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Himmelbach, Axel" sort="Himmelbach, Axel" uniqKey="Himmelbach A" first="Axel" last="Himmelbach">Axel Himmelbach</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mayer, Klaus Fx" sort="Mayer, Klaus Fx" uniqKey="Mayer K" first="Klaus Fx" last="Mayer">Klaus Fx Mayer</name>
<affiliation>
<nlm:aff id="au10">
<institution>Munich Information Center for Protein Sequences/Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München</institution>
<addr-line>D–85764, Neuherberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Scholz, Uwe" sort="Scholz, Uwe" uniqKey="Scholz U" first="Uwe" last="Scholz">Uwe Scholz</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Poland, Jesse A" sort="Poland, Jesse A" uniqKey="Poland J" first="Jesse A" last="Poland">Jesse A. Poland</name>
<affiliation>
<nlm:aff id="au11">
<institution>US Department of Agriculture/Agricultural Research Service, Hard Winter Wheat Genetics Research Unit and Department of Agronomy, Kansas State University</institution>
<addr-line>Manhattan, KS, 65506, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stein, Nils" sort="Stein, Nils" uniqKey="Stein N" first="Nils" last="Stein">Nils Stein</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Waugh, Robbie" sort="Waugh, Robbie" uniqKey="Waugh R" first="Robbie" last="Waugh">Robbie Waugh</name>
<affiliation>
<nlm:aff id="au12">
<institution>Division of Plant Sciences, University of Dundee at the James Hutton Institute</institution>
<addr-line>Invergowrie, Dundee, DD2 5DA, UK</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23998490</idno>
<idno type="pmc">4298792</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298792</idno>
<idno type="RBID">PMC:4298792</idno>
<idno type="doi">10.1111/tpj.12319</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">001262</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)</title>
<author>
<name sortKey="Mascher, Martin" sort="Mascher, Martin" uniqKey="Mascher M" first="Martin" last="Mascher">Martin Mascher</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Muehlbauer, Gary J" sort="Muehlbauer, Gary J" uniqKey="Muehlbauer G" first="Gary J" last="Muehlbauer">Gary J. Muehlbauer</name>
<affiliation>
<nlm:aff id="au2">
<institution>University of Minnesota, Department of Agronomy and Plant Genetics</institution>
<addr-line>St Paul, MN, 55108, USA</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="au3">
<institution>University of Minnesota, Department of Plant Biology</institution>
<addr-line>St Paul, MN 55108, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rokhsar, Daniel S" sort="Rokhsar, Daniel S" uniqKey="Rokhsar D" first="Daniel S" last="Rokhsar">Daniel S. Rokhsar</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="au5">
<institution>Department of Molecular and Cell Biology, University of California</institution>
<addr-line>Berkeley, CA, 94720, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chapman, Jarrod" sort="Chapman, Jarrod" uniqKey="Chapman J" first="Jarrod" last="Chapman">Jarrod Chapman</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schmutz, Jeremy" sort="Schmutz, Jeremy" uniqKey="Schmutz J" first="Jeremy" last="Schmutz">Jeremy Schmutz</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="au6">
<institution>HudsonAlpha Institute of Biotechnology</institution>
<addr-line>Huntsville, AL, 35806, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Barry, Kerrie" sort="Barry, Kerrie" uniqKey="Barry K" first="Kerrie" last="Barry">Kerrie Barry</name>
<affiliation>
<nlm:aff id="au4">
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mu Oz Amatriain, Maria" sort="Mu Oz Amatriain, Maria" uniqKey="Mu Oz Amatriain M" first="María" last="Mu Oz-Amatriaín">María Mu Oz-Amatriaín</name>
<affiliation>
<nlm:aff id="au2">
<institution>University of Minnesota, Department of Agronomy and Plant Genetics</institution>
<addr-line>St Paul, MN, 55108, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Close, Timothy J" sort="Close, Timothy J" uniqKey="Close T" first="Timothy J" last="Close">Timothy J. Close</name>
<affiliation>
<nlm:aff id="au7">
<institution>Department of Botany & Plant Sciences, University of California</institution>
<addr-line>Riverside, CA, 92521, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wise, Roger P" sort="Wise, Roger P" uniqKey="Wise R" first="Roger P" last="Wise">Roger P. Wise</name>
<affiliation>
<nlm:aff id="au8">
<institution>US Department of Agriculture/Agricultural Research Service, Department of Plant Pathology & Microbiology, Iowa State University</institution>
<addr-line>Ames, IA, 50011–1020, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schulman, Alan H" sort="Schulman, Alan H" uniqKey="Schulman A" first="Alan H" last="Schulman">Alan H. Schulman</name>
<affiliation>
<nlm:aff id="au9">
<institution>Institute of Biotechnology, University of Helsinki/MTT Agrifood Research</institution>
<addr-line>PO Box 65, 00014, Helsinki, Finland</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Himmelbach, Axel" sort="Himmelbach, Axel" uniqKey="Himmelbach A" first="Axel" last="Himmelbach">Axel Himmelbach</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mayer, Klaus Fx" sort="Mayer, Klaus Fx" uniqKey="Mayer K" first="Klaus Fx" last="Mayer">Klaus Fx Mayer</name>
<affiliation>
<nlm:aff id="au10">
<institution>Munich Information Center for Protein Sequences/Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München</institution>
<addr-line>D–85764, Neuherberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Scholz, Uwe" sort="Scholz, Uwe" uniqKey="Scholz U" first="Uwe" last="Scholz">Uwe Scholz</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Poland, Jesse A" sort="Poland, Jesse A" uniqKey="Poland J" first="Jesse A" last="Poland">Jesse A. Poland</name>
<affiliation>
<nlm:aff id="au11">
<institution>US Department of Agriculture/Agricultural Research Service, Hard Winter Wheat Genetics Research Unit and Department of Agronomy, Kansas State University</institution>
<addr-line>Manhattan, KS, 65506, USA</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stein, Nils" sort="Stein, Nils" uniqKey="Stein N" first="Nils" last="Stein">Nils Stein</name>
<affiliation>
<nlm:aff id="au1">
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Waugh, Robbie" sort="Waugh, Robbie" uniqKey="Waugh R" first="Robbie" last="Waugh">Robbie Waugh</name>
<affiliation>
<nlm:aff id="au12">
<institution>Division of Plant Sciences, University of Dundee at the James Hutton Institute</institution>
<addr-line>Invergowrie, Dundee, DD2 5DA, UK</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">The Plant Journal</title>
<idno type="ISSN">0960-7412</idno>
<idno type="eISSN">1365-313X</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Next-generation whole-genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows
<italic>de novo</italic>
production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence-based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost-efficient establishment of powerful genomic information for many species.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Andolfatto, P" uniqKey="Andolfatto P">P Andolfatto</name>
</author>
<author>
<name sortKey="Davison, D" uniqKey="Davison D">D Davison</name>
</author>
<author>
<name sortKey="Erezyilmaz, D" uniqKey="Erezyilmaz D">D Erezyilmaz</name>
</author>
<author>
<name sortKey="Hu, Tt" uniqKey="Hu T">TT Hu</name>
</author>
<author>
<name sortKey="Mast, J" uniqKey="Mast J">J Mast</name>
</author>
<author>
<name sortKey="Sunayama Morita, T" uniqKey="Sunayama Morita T">T Sunayama-Morita</name>
</author>
<author>
<name sortKey="Stern, Dl" uniqKey="Stern D">DL Stern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brenchley, R" uniqKey="Brenchley R">R Brenchley</name>
</author>
<author>
<name sortKey="Spannagl, M" uniqKey="Spannagl M">M Spannagl</name>
</author>
<author>
<name sortKey="Pfeifer, M" uniqKey="Pfeifer M">M Pfeifer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chandler, Vl" uniqKey="Chandler V">VL Chandler</name>
</author>
<author>
<name sortKey="Brendel, V" uniqKey="Brendel V">V Brendel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, M" uniqKey="Chen M">M Chen</name>
</author>
<author>
<name sortKey="Presting, G" uniqKey="Presting G">G Presting</name>
</author>
<author>
<name sortKey="Barbazuk, Wb" uniqKey="Barbazuk W">WB Barbazuk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Comadran, J" uniqKey="Comadran J">J Comadran</name>
</author>
<author>
<name sortKey="Kilian, B" uniqKey="Kilian B">B Kilian</name>
</author>
<author>
<name sortKey="Russell, J" uniqKey="Russell J">J Russell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feuillet, C" uniqKey="Feuillet C">C Feuillet</name>
</author>
<author>
<name sortKey="Stein, N" uniqKey="Stein N">N Stein</name>
</author>
<author>
<name sortKey="Rossini, L" uniqKey="Rossini L">L Rossini</name>
</author>
<author>
<name sortKey="Praud, S" uniqKey="Praud S">S Praud</name>
</author>
<author>
<name sortKey="Mayer, K" uniqKey="Mayer K">K Mayer</name>
</author>
<author>
<name sortKey="Schulman, A" uniqKey="Schulman A">A Schulman</name>
</author>
<author>
<name sortKey="Eversole, K" uniqKey="Eversole K">K Eversole</name>
</author>
<author>
<name sortKey="Appels, R" uniqKey="Appels R">R Appels</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I Maccallum</name>
</author>
<author>
<name sortKey="Przybylski, D" uniqKey="Przybylski D">D Przybylski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guo, S" uniqKey="Guo S">S Guo</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Sun, H" uniqKey="Sun H">H Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hernandez, P" uniqKey="Hernandez P">P Hernandez</name>
</author>
<author>
<name sortKey="Martis, M" uniqKey="Martis M">M Martis</name>
</author>
<author>
<name sortKey="Dorado, G" uniqKey="Dorado G">G Dorado</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Feng, Q" uniqKey="Feng Q">Q Feng</name>
</author>
<author>
<name sortKey="Qian, Q" uniqKey="Qian Q">Q Qian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Wei, X" uniqKey="Wei X">X Wei</name>
</author>
<author>
<name sortKey="Sang, T" uniqKey="Sang T">T Sang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iqbal, Z" uniqKey="Iqbal Z">Z Iqbal</name>
</author>
<author>
<name sortKey="Caccamo, M" uniqKey="Caccamo M">M Caccamo</name>
</author>
<author>
<name sortKey="Turner, I" uniqKey="Turner I">I Turner</name>
</author>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author>
<name sortKey="Mcvean, G" uniqKey="Mcvean G">G McVean</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lam, Et" uniqKey="Lam E">ET Lam</name>
</author>
<author>
<name sortKey="Hastie, A" uniqKey="Hastie A">A Hastie</name>
</author>
<author>
<name sortKey="Lin, C" uniqKey="Lin C">C Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Manickavelu, A" uniqKey="Manickavelu A">A Manickavelu</name>
</author>
<author>
<name sortKey="Kawaura, K" uniqKey="Kawaura K">K Kawaura</name>
</author>
<author>
<name sortKey="Imamura, H" uniqKey="Imamura H">H Imamura</name>
</author>
<author>
<name sortKey="Mori, M" uniqKey="Mori M">M Mori</name>
</author>
<author>
<name sortKey="Ogihara, Y" uniqKey="Ogihara Y">Y Ogihara</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martienssen, Ra" uniqKey="Martienssen R">RA Martienssen</name>
</author>
<author>
<name sortKey="Rabinowicz, Pd" uniqKey="Rabinowicz P">PD Rabinowicz</name>
</author>
<author>
<name sortKey="O Haughnessy, A" uniqKey="O Haughnessy A">A O’Shaughnessy</name>
</author>
<author>
<name sortKey="Mccombie, Wr" uniqKey="Mccombie W">WR McCombie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Jc" uniqKey="Nelson J">JC Nelson</name>
</author>
<author>
<name sortKey="Deynze, Ae" uniqKey="Deynze A">AE Deynze</name>
</author>
<author>
<name sortKey="Sorrells, Me" uniqKey="Sorrells M">ME Sorrells</name>
</author>
<author>
<name sortKey="Autrique, E" uniqKey="Autrique E">E Autrique</name>
</author>
<author>
<name sortKey="Lu, Yh" uniqKey="Lu Y">YH Lu</name>
</author>
<author>
<name sortKey="Negre, S" uniqKey="Negre S">S Negre</name>
</author>
<author>
<name sortKey="Bernard, M" uniqKey="Bernard M">M Bernard</name>
</author>
<author>
<name sortKey="Leroy, P" uniqKey="Leroy P">P Leroy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paux, E" uniqKey="Paux E">E Paux</name>
</author>
<author>
<name sortKey="Sourdille, P" uniqKey="Sourdille P">P Sourdille</name>
</author>
<author>
<name sortKey="Salse, J" uniqKey="Salse J">J Salse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Poland, Ja" uniqKey="Poland J">JA Poland</name>
</author>
<author>
<name sortKey="Brown, Pj" uniqKey="Brown P">PJ Brown</name>
</author>
<author>
<name sortKey="Sorrells, Me" uniqKey="Sorrells M">ME Sorrells</name>
</author>
<author>
<name sortKey="Jannink, Jl" uniqKey="Jannink J">JL Jannink</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author>
<name sortKey="Turner, S" uniqKey="Turner S">S Turner</name>
</author>
<author>
<name sortKey="Kasarskis, A" uniqKey="Kasarskis A">A Kasarskis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schnable, Ps" uniqKey="Schnable P">PS Schnable</name>
</author>
<author>
<name sortKey="Ware, D" uniqKey="Ware D">D Ware</name>
</author>
<author>
<name sortKey="Fulton, Rs" uniqKey="Fulton R">RS Fulton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wei, F" uniqKey="Wei F">F Wei</name>
</author>
<author>
<name sortKey="Coe, E" uniqKey="Coe E">E Coe</name>
</author>
<author>
<name sortKey="Nelson, W" uniqKey="Nelson W">W Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
<author>
<name sortKey="Bhat, Pr" uniqKey="Bhat P">PR Bhat</name>
</author>
<author>
<name sortKey="Close, Tj" uniqKey="Close T">TJ Close</name>
</author>
<author>
<name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xie, W" uniqKey="Xie W">W Xie</name>
</author>
<author>
<name sortKey="Feng, Q" uniqKey="Feng Q">Q Feng</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author>
<name sortKey="Zhao, Q" uniqKey="Zhao Q">Q Zhao</name>
</author>
<author>
<name sortKey="Xing, Y" uniqKey="Xing Y">Y Xing</name>
</author>
<author>
<name sortKey="Yu, S" uniqKey="Yu S">S Yu</name>
</author>
<author>
<name sortKey="Han, B" uniqKey="Han B">B Han</name>
</author>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, Q" uniqKey="Xu Q">Q Xu</name>
</author>
<author>
<name sortKey="Chen, Ll" uniqKey="Chen L">LL Chen</name>
</author>
<author>
<name sortKey="Ruan, X" uniqKey="Ruan X">X Ruan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author>
<name sortKey="Wagner, L" uniqKey="Wagner L">L Wagner</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Plant J</journal-id>
<journal-id journal-id-type="iso-abbrev">Plant J</journal-id>
<journal-id journal-id-type="publisher-id">tpj</journal-id>
<journal-title-group>
<journal-title>The Plant Journal</journal-title>
</journal-title-group>
<issn pub-type="ppub">0960-7412</issn>
<issn pub-type="epub">1365-313X</issn>
<publisher>
<publisher-name>BlackWell Publishing Ltd</publisher-name>
<publisher-loc>Oxford, UK</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23998490</article-id>
<article-id pub-id-type="pmc">4298792</article-id>
<article-id pub-id-type="doi">10.1111/tpj.12319</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Technical Advance</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Mascher</surname>
<given-names>Martin</given-names>
</name>
<xref ref-type="aff" rid="au1">1</xref>
<xref ref-type="corresp" rid="cor1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Muehlbauer</surname>
<given-names>Gary J</given-names>
</name>
<xref ref-type="aff" rid="au2">2</xref>
<xref ref-type="aff" rid="au3">3</xref>
<xref ref-type="corresp" rid="cor1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rokhsar</surname>
<given-names>Daniel S</given-names>
</name>
<xref ref-type="aff" rid="au4">4</xref>
<xref ref-type="aff" rid="au5">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chapman</surname>
<given-names>Jarrod</given-names>
</name>
<xref ref-type="aff" rid="au4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schmutz</surname>
<given-names>Jeremy</given-names>
</name>
<xref ref-type="aff" rid="au4">4</xref>
<xref ref-type="aff" rid="au6">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Barry</surname>
<given-names>Kerrie</given-names>
</name>
<xref ref-type="aff" rid="au4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Muñoz-Amatriaín</surname>
<given-names>María</given-names>
</name>
<xref ref-type="aff" rid="au2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Close</surname>
<given-names>Timothy J</given-names>
</name>
<xref ref-type="aff" rid="au7">7</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wise</surname>
<given-names>Roger P</given-names>
</name>
<xref ref-type="aff" rid="au8">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schulman</surname>
<given-names>Alan H</given-names>
</name>
<xref ref-type="aff" rid="au9">9</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Himmelbach</surname>
<given-names>Axel</given-names>
</name>
<xref ref-type="aff" rid="au1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mayer</surname>
<given-names>Klaus FX</given-names>
</name>
<xref ref-type="aff" rid="au10">10</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Scholz</surname>
<given-names>Uwe</given-names>
</name>
<xref ref-type="aff" rid="au1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Poland</surname>
<given-names>Jesse A</given-names>
</name>
<xref ref-type="aff" rid="au11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Stein</surname>
<given-names>Nils</given-names>
</name>
<xref ref-type="aff" rid="au1">1</xref>
<xref ref-type="author-notes" rid="fn1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Waugh</surname>
<given-names>Robbie</given-names>
</name>
<xref ref-type="aff" rid="au12">12</xref>
<xref ref-type="author-notes" rid="fn1">*</xref>
</contrib>
<aff id="au1">
<label>1</label>
<institution>Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)</institution>
<addr-line>D–06466 Seeland OT, Gatersleben, Germany</addr-line>
</aff>
<aff id="au2">
<label>2</label>
<institution>University of Minnesota, Department of Agronomy and Plant Genetics</institution>
<addr-line>St Paul, MN, 55108, USA</addr-line>
</aff>
<aff id="au3">
<label>3</label>
<institution>University of Minnesota, Department of Plant Biology</institution>
<addr-line>St Paul, MN 55108, USA</addr-line>
</aff>
<aff id="au4">
<label>4</label>
<institution>Department of Energy Joint Genome Institute</institution>
<addr-line>2800 Mitchell Drive, Walnut Creek, CA, 94598, USA</addr-line>
</aff>
<aff id="au5">
<label>5</label>
<institution>Department of Molecular and Cell Biology, University of California</institution>
<addr-line>Berkeley, CA, 94720, USA</addr-line>
</aff>
<aff id="au6">
<label>6</label>
<institution>HudsonAlpha Institute of Biotechnology</institution>
<addr-line>Huntsville, AL, 35806, USA</addr-line>
</aff>
<aff id="au7">
<label>7</label>
<institution>Department of Botany & Plant Sciences, University of California</institution>
<addr-line>Riverside, CA, 92521, USA</addr-line>
</aff>
<aff id="au8">
<label>8</label>
<institution>US Department of Agriculture/Agricultural Research Service, Department of Plant Pathology & Microbiology, Iowa State University</institution>
<addr-line>Ames, IA, 50011–1020, USA</addr-line>
</aff>
<aff id="au9">
<label>9</label>
<institution>Institute of Biotechnology, University of Helsinki/MTT Agrifood Research</institution>
<addr-line>PO Box 65, 00014, Helsinki, Finland</addr-line>
</aff>
<aff id="au10">
<label>10</label>
<institution>Munich Information Center for Protein Sequences/Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München</institution>
<addr-line>D–85764, Neuherberg, Germany</addr-line>
</aff>
<aff id="au11">
<label>11</label>
<institution>US Department of Agriculture/Agricultural Research Service, Hard Winter Wheat Genetics Research Unit and Department of Agronomy, Kansas State University</institution>
<addr-line>Manhattan, KS, 65506, USA</addr-line>
</aff>
<aff id="au12">
<label>12</label>
<institution>Division of Plant Sciences, University of Dundee at the James Hutton Institute</institution>
<addr-line>Invergowrie, Dundee, DD2 5DA, UK</addr-line>
</aff>
</contrib-group>
<author-notes>
<corresp id="cor1">For correspondence (e-mails
<email>stein@ipk-gatersleben.de</email>
;
<email>robbie.waugh@hutton.ac.uk</email>
). </corresp>
<fn id="fn1">
<p>These authors contributed equally to this work.</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<month>11</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>10</day>
<month>10</month>
<year>2013</year>
</pub-date>
<volume>76</volume>
<issue>4</issue>
<fpage>718</fpage>
<lpage>727</lpage>
<history>
<date date-type="received">
<day>25</day>
<month>6</month>
<year>2013</year>
</date>
<date date-type="rev-recd">
<day>07</day>
<month>8</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>9</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>© 2013 The Authors The Plant Journal © 2013 John Wiley & Sons Ltd</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
<license-p>This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Next-generation whole-genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows
<italic>de novo</italic>
production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence-based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost-efficient establishment of powerful genomic information for many species.</p>
</abstract>
<kwd-group>
<kwd>next-generation sequencing</kwd>
<kwd>genome assembly</kwd>
<kwd>genetic mapping</kwd>
<kwd>barley</kwd>
<kwd>
<italic>Hordeum vulgare</italic>
</kwd>
<kwd>population sequencing</kwd>
<kwd>technical advance</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>Next-generation sequencing provides the opportunity to rapidly establish gene space assemblies for virtually any species at relatively low cost. These assemblies consist of tens to hundreds of thousands of short contiguous pieces of DNA sequence (contigs), and often represent only the low-copy portion of the genome. Despite the limitations of such assemblies, they have been widely proposed as surrogates for draft genome sequences for the purposes of gene isolation, genomics-assisted breeding and the assessment of diversity within and between species (Brenchley
<xref rid="b2" ref-type="bibr">
<italic>et al</italic>
., 2012</xref>
;
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
;
<xref rid="b8" ref-type="bibr">Guo
<italic>et al</italic>
., 2013</xref>
;
<xref rid="b27" ref-type="bibr">Xu
<italic>et al</italic>
., 2013</xref>
). However, in most cases, particularly those concerning large and complex genomes, they remain disconnected collections of short sequence contigs that are not embedded in a genomic context. Bringing these together into a tentative linear order, or even associating contigs with individual chromosomes or chromosome arms, has been a major and costly undertaking. In a recent example, the International Barley Genome Sequencing Consortium reported the development and use of a BAC-based physical map, BAC end sequences, survey sequences of flow-sorted chromosome arms, fully sequenced BAC clones and conserved synteny to fully contextualize only 410 Mb of genomic sequence from the 5.1 Gb barley genome (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). These genomic resources provide an established path towards a reference sequence by sequencing a minimum tiling path of overlapping BAC clones hierarchically (
<xref rid="b6" ref-type="bibr">Feuillet
<italic>et al</italic>
., 2012</xref>
). Development of the necessary resources requires a substantial amount of time, labor and money, which makes this strategy prohibitive for smaller and more poorly resourced research communities, e.g. research in non-model organism or orphan crops. The establishment of a BAC-based reference sequence of the maize genome took approximately 7 years, required the coordinated effort of several laboratories, and cost approximately US $50 million (
<xref rid="b3" ref-type="bibr">Chandler and Brendel, 2002</xref>
;
<xref rid="b18" ref-type="bibr">Martienssen
<italic>et al</italic>
., 2004</xref>
;
<xref rid="b23" ref-type="bibr">Schnable
<italic>et al</italic>
., 2009</xref>
). Similarly, the reference sequence of a single 1 Gb chromosome of hexaploid wheat (
<italic>Triticum aestivum</italic>
) has not been completed 5 years after publication of a physical map (
<xref rid="b20" ref-type="bibr">Paux
<italic>et al</italic>
., 2008</xref>
).</p>
<p>Emerging technologies such as longer sequence reads (
<xref rid="b22" ref-type="bibr">Schadt
<italic>et al</italic>
., 2010</xref>
), optical mapping (
<xref rid="b14" ref-type="bibr">Lam
<italic>et al</italic>
., 2012</xref>
) and novel assembly algorithms (such as ALLPATHS–LG,
<xref rid="b7" ref-type="bibr">Gnerre
<italic>et al</italic>
., 2011</xref>
)) may speed up the process of data collection and analysis, as well as increasing the contiguity and completeness of whole-genome shotgun (WGS) assemblies, but their applicability to large genomes with abundant sequence repeats (the bane of any assembler), arising from paralogous duplications, repetitive elements, ancestral duplications and polyploidy, remains to be assessed.</p>
<p>It has been common practice to associate mapped genetic markers with sequence resources based on sequence similarity in order to link genetic and physical maps (
<xref rid="b4" ref-type="bibr">Chen
<italic>et al</italic>
., 2002</xref>
;
<xref rid="b24" ref-type="bibr">Wei
<italic>et al</italic>
., 2007</xref>
). While the number of BAC contigs on a physical map is in order of thousands, next-generation sequencing (NGS) technology produces hundreds of thousands of sequence contigs. For example, the
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium (2012)</xref>
reported an assembly that consists of more than 350 000 contigs longer than 1 kb. The number of markers afforded by conventional genotyping strategies is simply not commensurate with the large number of short sequence contigs.</p>
<p>Several methods for high-throughput genotyping of genetic mapping populations using next-generation sequencing technology have been developed. Genotyping by shallow survey sequencing (0.05–0.1×) in the model species rice (
<italic>Oryza sativa</italic>
) has been shown to yield genetic maps of unprecedented density (
<xref rid="b3" ref-type="bibr">Chandler and Brendel, 2002</xref>
;
<xref rid="b26" ref-type="bibr">Xie
<italic>et al</italic>
., 2010</xref>
). However, the high resolution of recombination breakpoints (approximately 40 kb) was provided by inferring marker order from a high-quality reference sequence. This approach cannot be applied to species with genomes of draft or even pre-draft quality as sequence contigs are not organized in pseudo-molecules representing the linear chromosomes.</p>
<p>The question of how several millions of markers provided by NGS technology may be used to bring contigs into a linear order (a procedure commonly referred to as anchoring) has only tentatively been raised.
<xref rid="b1" ref-type="bibr">Andolfatto
<italic>et al</italic>
. (2011)</xref>
used digestion with a frequently cutting restriction enzyme and subsequent multiplexed sequencing of a population of 94 individuals to assign 8 Mb of unassembled contigs to linkage groups. Similarly, a reduced-representation genotyping-by-sequencing method (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
) has been instrumental in anchoring the barley physical map to a genetic map (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). However, genotyping by WGS has not been used as a primary tool in
<italic>de novo</italic>
development of linearly ordered draft genome assemblies.</p>
<p>In the absence of an appropriate molecular or analytical method to establish short-range connectivity (i.e. to link physically close sequence contigs), we used the power of genetic segregation to directly and linearly arrange sequence contigs into closely associated recombination bins along a target genome. We show that whole-genome survey sequencing of a small experimental segregating population and genetic mapping of the millions of observed single nucleotide polymorphisms (SNPs) detected therein (Figure
<xref ref-type="fig" rid="fig01">1</xref>
) vastly improves the quality and utility of highly fragmented NGS shotgun assemblies. We illustrate the approach using the complex 5.1 Gb genome of cultivated barley (
<italic>Hordeum vulgare</italic>
L.) by comparing the output with a gene space assembly that has been partially ordered using extensive physical and genetic mapping resources (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). Our results are congruent with the current sequence assembly (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
) but increase the amount of genetically anchored contig sequences by a factor of three. Most importantly, the whole effort cost <$100K and was completed in a matter of months. This new assembly has greater value for comparative genetic studies, gene isolation and genomics-assisted breeding compared to the previous anchoring effort (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
) as more WGS contigs are positioned genetically. In principle, the approach, which we term POPSEQ, may be used for any species for which a segregating population may be derived and maintained.</p>
<fig id="fig01" position="float">
<label>Figure 1</label>
<caption>
<p>Schematic representation of POPSEQ.(a) A segregating population (80–100 individuals) is constructed from a bi-parental cross.(b) A whole-genome shotgun is generated for one parent, and used to construct a gene space assembly (alternatively, the POPSEQ data itself may be used for this purpose). On this assembly, gene models (green arrows) are defined using RNA–seq. In parallel, POPSEQ, and, if necessary, genotyping-by-sequencing (GBS), is performed on the population, and a medium-density framework genetic map is calculated (thousands to tens of thousands of loci).(c) SNPs detected and typed by POPSEQ along with associated WGS contigs are integrated into the framework map through nearest-neighbor search.(d) The result of POPSEQ is a sequence assembly in linear order that contains comprehensive information on the gene space. It may be enhanced by performing POPSEQ on additional populations.</p>
</caption>
<graphic xlink:href="tpj0076-0718-f1"></graphic>
</fig>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>Whole-genome survey sequencing of genetic populations</title>
<p>We generated survey sequences from 90 individuals (Table
<xref ref-type="table" rid="tbl1">1</xref>
) of a population of recombinant inbred lines (RILs) from a cross between barley cultivars Morex and Barke (M × B). DNA from individual plants was fragmented and bar-coded, and eight samples per lane were sequenced on an Illumina HiSeq 2000 instrument (yielding approximately 1× coverage per line). We de-convoluted and mapped the output reads against a 50× WGS sequence assembly of the barley cultivar Morex (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
) using BWA software (
<xref rid="b16" ref-type="bibr">Li and Durbin, 2009</xref>
), and performed
<italic>in silico</italic>
variant calling using SAMtools (
<xref rid="b15" ref-type="bibr">Li, 2011</xref>
) (see Experimental procedures). This resulted in a set of SNP positions on the Morex WGS assembly, and genotype calls (i.e. homozygous for one parent or heterozygous) for each individual at each SNP. After discarding variant positions with low quality or too much missing data (Figure S1), 5.1 million SNPs with a mean of 33 unambiguous genotypic calls across the population were considered for integration into a high-density SNP-based genetic map of the same population constructed by array-based genotyping (
<xref rid="b5" ref-type="bibr">Comadran
<italic>et al</italic>
., 2012</xref>
). We then used a heuristic algorithm to place the newly discovered SNPs into this existing genetic framework. Briefly, we performed a nearest-neighbor search, querying the set of framework markers for elements with minimal Hamming distance to a given SNP (i.e. the minimum number of alternative SNP alleles required to change an observed segregation pattern into the reference) If several framework markers exhibited identical minimal distances, we imposed a cutoff where =80% of the framework markers had to lie on the same chromosome and the median absolute deviation of their genetic positions was less than five centiMorgans (cM). Using these thresholds 4.3 million SNPs (85.5% of all detected SNPs) could be placed into the genetic map with less than two genotype calls differing from their closest framework marker.</p>
<table-wrap id="tbl1" position="float">
<label>Table 1</label>
<caption>
<p>Sequence data generated in this study</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">M x B WGS</th>
<th align="left" rowspan="1" colspan="1">OWB WGS</th>
<th align="left" rowspan="1" colspan="1">M × B GBS</th>
<th align="left" rowspan="1" colspan="1">Morex</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Population</td>
<td align="left" rowspan="1" colspan="1">Morex × Barke RIL F
<sub>8</sub>
</td>
<td align="left" rowspan="1" colspan="1">Oregon Wolfe Barleys DH</td>
<td align="left" rowspan="1" colspan="1">Morex × Barke RIL F
<sub>8</sub>
</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sequencing technology</td>
<td align="left" rowspan="1" colspan="1">Whole-genome shotgun; HiSeq 2000</td>
<td align="left" rowspan="1" colspan="1">Whole-genome shotgun; HiSeq 2000</td>
<td align="left" rowspan="1" colspan="1">Genotyping-by-sequencing; HiSeq 2000</td>
<td align="left" rowspan="1" colspan="1">Whole-genome shotgun; HiSeq 2000</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of sequencing lanes</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of sequenced individuals</td>
<td align="left" rowspan="1" colspan="1">90 (+ parents)</td>
<td align="left" rowspan="1" colspan="1">82 (+ parents)</td>
<td align="left" rowspan="1" colspan="1">92 (+ parents)</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Approximate coverage per sample</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">1× (10 Mb represented)</td>
<td align="left" rowspan="1" colspan="1">15×</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of SNPs detected</td>
<td align="left" rowspan="1" colspan="1">5 123 696</td>
<td align="left" rowspan="1" colspan="1">6 543 684</td>
<td align="left" rowspan="1" colspan="1">21 397</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Mean number of present genotype calls per marker</td>
<td align="left" rowspan="1" colspan="1">33</td>
<td align="left" rowspan="1" colspan="1">31</td>
<td align="left" rowspan="1" colspan="1">58</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We then assigned the WGS sequence contigs that harbored mapped polymorphisms to their defined genetic positions. As with positioning SNPs in a genetic map, we imposed a rule that multiple SNPs found on the same sequence contig were required to have concordant genetic positions. Overall, 498 856 contigs with a cumulative length of 927 Mb (49.5% of the total cv Morex WGS sequence assembly) could be ordered along the genetic map (Table
<xref ref-type="table" rid="tbl2">2</xref>
), more than doubling the 410 Mb that was anchored with the help of a genome-wide physical map to the same genetic framework. Tables containing the anchoring results are available for download from
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ipk-gatersleben.de/barley-popseq/">ftp://ftp.ipk-gatersleben.de/barley-popseq/</ext-link>
</p>
<table-wrap id="tbl2" position="float">
<label>Table 2</label>
<caption>
<p>Anchoring statistics</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">M x B (iSelect)
<xref ref-type="table-fn" rid="tf2-1"></xref>
</th>
<th align="left" rowspan="1" colspan="1">OWB</th>
<th align="left" rowspan="1" colspan="1">M x B (GBS map)</th>
<th align="left" rowspan="1" colspan="1">M x B  +  OWB</th>
<th align="left" rowspan="1" colspan="1">IBSC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Number of SNPs used for anchoring</td>
<td align="left" rowspan="1" colspan="1">4 381 020</td>
<td align="left" rowspan="1" colspan="1">6 117 837</td>
<td align="left" rowspan="1" colspan="1">4 429 475</td>
<td align="left" rowspan="1" colspan="1">11 229 709</td>
<td align="left" rowspan="1" colspan="1">498 165</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Framework map</td>
<td align="left" rowspan="1" colspan="1">iSelect</td>
<td align="left" rowspan="1" colspan="1">OWB GBS</td>
<td align="left" rowspan="1" colspan="1">M x B GBS</td>
<td align="left" rowspan="1" colspan="1">iSelect/OWB GBS</td>
<td align="left" rowspan="1" colspan="1">iSelect</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of anchored contigs</td>
<td align="left" rowspan="1" colspan="1">   498 856</td>
<td align="left" rowspan="1" colspan="1">   591 779</td>
<td align="left" rowspan="1" colspan="1">   512 293</td>
<td align="left" rowspan="1" colspan="1">   747 077</td>
<td align="left" rowspan="1" colspan="1">   138 443</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Size of anchored contigs (Mb)</td>
<td align="left" rowspan="1" colspan="1">           927 (50%)</td>
<td align="left" rowspan="1" colspan="1">          1000 (53%)</td>
<td align="left" rowspan="1" colspan="1">           934 (50%)</td>
<td align="left" rowspan="1" colspan="1">          1222 (65%)</td>
<td align="left" rowspan="1" colspan="1">           410 (16%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Median length of anchored contigs (bp)</td>
<td align="left" rowspan="1" colspan="1">         1006</td>
<td align="left" rowspan="1" colspan="1">            973</td>
<td align="left" rowspan="1" colspan="1">           977</td>
<td align="left" rowspan="1" colspan="1">            891</td>
<td align="left" rowspan="1" colspan="1">         1431</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of anchored HC genes
<xref ref-type="table-fn" rid="tf2-2"></xref>
</td>
<td align="left" rowspan="1" colspan="1">     16 682 (64%)</td>
<td align="left" rowspan="1" colspan="1">      15 743 (60%)</td>
<td align="left" rowspan="1" colspan="1">     16 729 (64%)</td>
<td align="left" rowspan="1" colspan="1">       20 932 (80%)</td>
<td align="left" rowspan="1" colspan="1">     15 719 (60%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of anchored LC genes
<xref ref-type="table-fn" rid="tf2-3"></xref>
</td>
<td align="left" rowspan="1" colspan="1">     28 337 (56%)</td>
<td align="left" rowspan="1" colspan="1">      29 033 (55%)</td>
<td align="left" rowspan="1" colspan="1">     28 559 (56%)</td>
<td align="left" rowspan="1" colspan="1">       37 609 (71%)</td>
<td align="left" rowspan="1" colspan="1">     19 415 (36%)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tf2-1">
<p>The Morex × Barke iSelect framework map is described in
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium (2012)</xref>
and
<xref rid="b5" ref-type="bibr">Comadran
<italic>et al</italic>
. (2012)</xref>
.</p>
</fn>
<fn id="tf2-2">
<p>High-confidence genes as described in
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium (2012)</xref>
.</p>
</fn>
<fn id="tf2-3">
<p>Low-confidence genes as described in
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium (2012)</xref>
.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Validation of population sequencing</title>
<p>We checked whether the genetic anchoring generated by POPSEQ was consistent with available short-range connectivity information. The (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
) had sequenced 6278 bacterial artificial chromosomes (BACs). Individuals BACs were sequenced to ‘Phase 1 quality and consisted on average of five to ten sequence contigs. From this set, we identified 3902 clones that harbored at least two WGS contigs that were mapped by POPSEQ. Our hypothesis was that in the majority of cases, pairs of contigs from the same BAC clone (i.e. within a physical distance of less than 200 kb) would exhibit the same genetic location. Using ultra-stringent homology (100% identity over 1000 bp), 95% of the contig pairs were placed within a 3 cM window on the ordered assembly (Table S1). Discordant chromosome assignments were found for only 1.7% of the contig pairs, and a further 3.3% had a genetic distance larger than 3 cM. We inspected 17 BACs with at least five anchored WGS contigs and discordant chromosome assignments. Nine of these BACs had two groups of contigs anchored to different locations and had either suspiciously large insert sizes of =180 kb suggestive of chimeric inserts or showed evidence of independent clones having been sequenced under the same name.</p>
<p>We then compared the POPSEQ anchoring of WGS contigs to a recently released integrated sequence-enriched genetic and physical map of barley (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). More than 77 000 WGS contigs (representing 315 Mb of sequence) were assigned by both methods to specific genetic positions. Chromosome assignments disagreed in 2.2% of the cases, and cM coordinates differed by more than 5 cM in 7.0% of the cases, similar to the 2–8% false-positive rate observed in PCR-based screening of BAC libraries (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). In general terms, incongruence appears to occur largely in the highly repetitive and extensive genetic centromeres. We believe that this is most likely the product of misplaced repetitive sequence-containing or chimeric BAC contigs in the barley physical map. Thus, employing POPSEQ alongside a fully sequenced minimum tiling path highlights errors in a physical map and its associated anchoring information, and may thereby be valuable in establishing a robust clone-by-clone assembly of a target genome.</p>
</sec>
<sec>
<title>Framework map construction by GBS</title>
<p>To further investigate the robustness of POPSEQ, we assessed the effect of using a different genotyping platform to construct the framework map. We genotyped the same 90 individuals using a two-enzyme genotyping-by-sequencing (GBS) approach (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
) (Table
<xref ref-type="table" rid="tbl1">1</xref>
). Prior to sequencing, DNA was digested using a rarely and a frequently cutting restriction enzyme, and only restriction fragments with two different restriction sites were sequenced, thus reducing the targeted interval on the genome to approximately 10 Mb. Compared to array-based genotyping, GBS has lower per-sample costs and does not require any prior knowledge of polymorphisms between the parents of the population. Instead, marker detection and scoring occur simultaneously, making GBS suitable for species without any genomic resources, or for which genomic resources are poorly developed. We constructed a
<italic>de novo</italic>
genetic map comprising 4056 bi-allelic SNP markers, and placed WGS contigs into this map using the same algorithm as described above. Altogether, 927 Mb of sequence represented by 512 293 sequence contigs was ordered (Table
<xref ref-type="table" rid="tbl2">2</xref>
), with 94.3% also linked to the iSelect framework (
<xref rid="b5" ref-type="bibr">Comadran
<italic>et al</italic>
., 2012</xref>
). Importantly, the genetic coordinates of contigs were consistent among the underlying framework maps (Figure
<xref ref-type="fig" rid="fig02">2</xref>
b): chromosome assignments were discordant in 0.1% of the cases, and the map position of only 0.6% of the contigs differed by more than 5 cM. If we only used the SNP markers (approximately 20 000) provided by GBS, we were able to anchor only 49 Mb of sequence, because the number of anchored contigs is limited by the number of available SNPs.</p>
<fig id="fig02" position="float">
<label>Figure 2</label>
<caption>
<p>POPSEQ validation. WGS contigs anchored to three genetic maps. These plots show the colinearity of contigs anchored to the Morex × Barke iSelect framework map and (a) the physical and genetic framework of barley (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
), (b) a Morex × Barke genetic map constructed by genotyping-by-sequencing (GBS), (c) a GBS map (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
) constructed in the OWB. WGS contigs are shown as dots, and are mostly within 5 cM of the diagonal: 90.8% in (a), 99.2% in (b) 93.2% in (c).</p>
</caption>
<graphic xlink:href="tpj0076-0718-f2"></graphic>
</fig>
</sec>
<sec>
<title>Robustness of the linear assembly</title>
<p>To test the robustness of the M x B POPSEQ anchored assembly, we constructed a
<italic>de novo</italic>
assembly of a second population for comparison. We used the Oregon Wolfe Barley (OWB) population, as a genetic map from GBS on 82 doubled haploid (DH) lines was already available (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
). We survey-sequenced these 82 individuals to approximately 1× whole genome coverage each (Table
<xref ref-type="table" rid="tbl1">1</xref>
), and, by performing the same steps as for M x B, assigned genetic positions to 591 779 WGS contigs corresponding to 1000 Mb of sequence. Of these contigs, 42% (295 Mb) were not anchored to the M x B iSelect framework. In most cases, these contigs either harbored no polymorphism between Morex and Barke, or SNPs were not assayed in a sufficient number of RILs to reach our threshold for inclusion. Contigs anchored to both M x B and OWB maps had highly congruent chromosome assignments (99.6% agreement, Figure
<xref ref-type="fig" rid="fig02">2</xref>
c). Only 6.4% of all contigs were placed more than 5 cM apart in the two anchored assemblies (falling to 2.1% if <7 cM). Given that we were comparing populations constructed with different parents and levels of recombination (approximately half in a DH population compared to RILs), this was not completely unexpected. However, the use of independent populations for anchoring has considerable value: the cumulative length of contigs anchored to either the M x B or OWB map is 1.22 Gb, an increase of one-third compared to use of only a single population. Additional polymorphisms in OWB thus enabled placement of contigs that were identical between Morex and Barke. More importantly, the POPSEQ ordered assembly positions an additional 5213 annotated high-confidence genes on the barley genome compared to the International Barley Genome Sequencing Consortium release.</p>
</sec>
<sec>
<title>Framework map construction using light shotgun population sequencing</title>
<p>We then explored whether the POPSEQ data could be used directly to construct a robust
<italic>de novo</italic>
genetic map without reference to other datasets or genotyping methods. Briefly, we identified a set of 65 357 contigs containing at least ten Morex/Barke SNPs per contig, requiring that these contigs be genotyped by shallow WGS sequencing in at least 75 of the 90 individuals within our M × B mapping population to avoid an excess of missing data points. Using stringent controls on log-odds scores, 98.5% of these contigs were readily clustered into seven major linkage groups and ordered by MSTMap (
<xref rid="b25" ref-type="bibr">Wu
<italic>et al</italic>
., 2008</xref>
). The resulting framework map has approximately 99% concordance with existing barley maps (Pearson correlation coefficient), and may be used to place additional contigs with fewer SNPs and/or more limited sampling using a majority rule approach as described above. Thus POPSEQ data may be used directly to generate a linear ordering of contigs, even in the absence of an independent genetic map.</p>
</sec>
<sec>
<title>POPSEQ does not require long mate-pair libraries</title>
<p>The set of whole-genome contigs (the ‘reference assembly’) used in the present study had been assembled from Illumina libraries with fragment sizes of 350 bp and 2.5 kb (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). Although large-insert mate-pair libraries may be used to establish links between contigs, and may be required input for some assemblers (
<xref rid="b7" ref-type="bibr">Gnerre
<italic>et al</italic>
., 2011</xref>
), the construction of such libraries is not straightforward and often yields sub-optimal results, such as a high fraction of PCR duplicates or short-insert read pairs. We therefore explored how POPSEQ performed using an assembly comprising only short-insert paired reads. We sequenced the same 350 bp insert libraries used for construction of the current barley reference assembly (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
) on two HiSeq lanes, yielding approximately 15× haploid genome coverage (Table
<xref ref-type="table" rid="tbl1">1</xref>
), and assembled the reads using the same program as previously (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). As the read coverage was approximately three times lower than used by
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium (2012)</xref>
and did not utilize mate-pair information, we expected the assembly to be of worse quality. The cumulative length of the resulting assembly was shorter (1.6 Gb versus 1.9 Gb), and the contig N50 (a weighted average contig size that is commonly used as a measure of assembly contiguity) was smaller (1238 bp versus 1450 bp). However, contigs of this size are sufficient to function as a reference for read mapping and to enable structural gene annotation via RNA sequencing (RNA-seq) as well as SNP detection. Notably, almost half of the contigs (49.8%) anchored to the M x B iSelect framework are shorter than 1000 bp. In species with smaller and less repetitive genomes, WGS assembly is expected to yield fewer and longer contigs that potentially yield a higher number of SNPs per contig (depending upon the level of polymorphism in the POPSEQ population). Alternatively, larger contigs may compensate for lower levels of polymorphism.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>Low-coverage (approximately 0.05–0.1×) NGS survey sequencing of the small genome (0.4 Gb) of the model crop plant rice has previously been used as a tool to generate many thousands of genetic markers for both bi-parental linkage studies and GWAS (
<xref rid="b10" ref-type="bibr">Huang
<italic>et al</italic>
., 2009</xref>
,
<xref rid="b11" ref-type="bibr">2010</xref>
). The effectiveness of this ‘genotyping by re-sequencing’ was afforded by the availability of high-quality reference sequences, a small target genome with comparatively few repeats, and innovative statistical approaches to data analysis. Here, we have explored a fundamentally different application of NGS combined with classical genetic analysis that should find application in many species, particularly those with recalcitrant, large or poorly characterized genomes, among them economically important species such as wheat, sugarcane, pine or
<italic>Miscanthus</italic>
.</p>
<p>We explored POPSEQ as a method for genetically anchoring and ordering
<italic>de novo</italic>
NGS assemblies, and have demonstrated its potential by re-synthesizing and improving a recently released sequence assembly of the large (5.1 Gb) and complex (= 80% repetitive sequence, ancestrally duplicated) barley genome. We used sequence data from two mapping populations, and used the large number of detected SNPs to integrate the sequence assembly with two established framework maps as well as genetic maps computed from GBS or WGS data. At its core, POPSEQ exploits the power of genetic segregation combined with shallow (1–2× per line) survey sequencing of one or more small experimental populations to genetically anchor NGS sequence assemblies. It is independent of physical mapping and all other genomic resources typically developed in large genome sequencing projects, and should be amenable to application in most population types.</p>
<p>We show that POPSEQ is both robust and reproducible. Using various genetic maps and mapping population, we obtain comparable results with a concordance of approximately 95%. Thus, POPSEQ is neither dependent upon the choice of mapping population nor the genotyping platform used for framework map construction. If more extensive short-range connectivity is established by longer sequence contigs or scaffolds (set of ordered sequence contigs with gaps between them), a sliding window approach (
<xref rid="b10" ref-type="bibr">Huang
<italic>et al</italic>
., 2009</xref>
) may be used for genotype calling and framework map construction from POPSEQ data alone, avoiding the need for GBS or SNP mapping platforms. In addition, partitioning of polymorphic sites according to their parental origin may be performed prior to
<italic>de novo</italic>
assembly, for example by using the colored de Bruijn graph method (
<xref rid="b13" ref-type="bibr">Iqbal
<italic>et al</italic>
., 2012</xref>
). The raw sequence reads from POPSEQ (the equivalent of 50× for each parent) should then be sufficient to compute the reference sequence assemblies that will ultimately be ordered along the genetic map.</p>
<p>POPSEQ performs effectively with highly fragmented sequence assemblies from short-insert libraries. We were able to construct a
<italic>de novo</italic>
WGS assembly from short Illumina reads that showed assembly statistics comparable to an assembly that incorporated mate-pair information. POPSEQ thus avoids the technical difficulties associated with construction and characterization of large-insert libraries. The simultaneous use of several mapping populations through sequence-based consensus map construction is straightforward, with the same caveats as observed in any genetic map integration. The outcome is not merely an ultra-dense genetic map of anonymous loci: at each genetic position, comprehensive information on the gene space may be obtained through RNA–seq-based structural annotation.</p>
<p>The POPSEQ resource we developed here both reproduces and substantially improves the multi-layered gene space assembly that was the result of a large collaborative effort by the International Barley Genome Sequencing Consortium over many years. By comparison, POPSEQ is inexpensive, rapid and conceptually simple, the most time-consuming step being the construction of a mapping population. In relation to the latter, while we used both DH lines and RILs, other population types including early-generation inbred lines (e.g. F
<sub>4</sub>
individuals) would also be suitable. Subsequent steps including sequence assembly from short-insert libraries, genotyping-by-sequencing (if required) and integrative computational analyses may be performed quickly. We stress that we do not advocate abandonment of on-going genome projects that are pursuing a clone-by-clone strategy. On the contrary, we believe these may profit from POPSEQ. BAC contigs may be validated though genetic mapping of each single clone, and the high number of mapped genetic markers should allow virtually any fully sequenced physical contig to be accurately placed.</p>
<p>Having performed a proof of principle in barley, the notion of advancing the closely related bread wheat genome (
<xref rid="b20" ref-type="bibr">Paux
<italic>et al</italic>
., 2008</xref>
) by adopting POPSEQ is of particular interest. Wheat will be the last of the world’s major crops to be fully sequenced. The cost-efficient construction of high-density genetic maps is routine in hexaploid wheat (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
), and the challenge of distinguishing homoeologous sequences has been largely overcome: sub-genome-specific shotgun assemblies have been released recently (Brenchley
<xref rid="b2" ref-type="bibr">
<italic>et al</italic>
., 2012</xref>
), and chromosome-specific survey sequences have also been generated (
<xref rid="b9" ref-type="bibr">Hernandez
<italic>et al</italic>
., 2012</xref>
). Furthermore, several populations of recombinant inbred lines are already available within the academic and commercial sectors, and are ripe for exploitation (
<xref rid="b19" ref-type="bibr">Nelson
<italic>et al</italic>
., 1995</xref>
;
<xref rid="b17" ref-type="bibr">Manickavelu
<italic>et al</italic>
., 2011</xref>
). While the ultimate goal should be a clone-by-clone sequence of the wheat genome with a quality on par with that of the the maize genome, POPSEQ opens the way to obtain, with comparative ease, an effective surrogate that would be valuable for basic research and breeding applications. In addition to wheat, many non-model species, orphan crops and old genetic models such as pea (
<italic>Pisum sativum</italic>
), have not yet benefited much from the genomics era. With moderate effort, POPSEQ could allow the generation of highly useful sequence resources for these and many other species.</p>
<p>For an uncharacterized =5 Gb diploid genome, between 14 and 30 HiSeq lanes are required for (i) producing a
<italic>de novo</italic>
sequence assembly for read mapping (two to eight lanes; not required if POPSEQ data itself is used to produce the ‘reference’ sequence assembly); (ii) genotyping-by-sequencing for map construction (one lane; not required if POPSEQ is used to construct the reference map); (iii) shallow population sequencing (minimum 12 lanes for each population of approximately 90 lines, although the depth may be varied); (iv) deep RNA-seq for structural gene annotation (more than two lanes) amounting to $50 000–$100 000 in sequencing costs. Together with a medium-sized computer server (32 CPU cores, 512 GB RAM, 3 TB of disk space), it is possible to generate a
<italic>de novo</italic>
linear gene space assembly.</p>
<p>The accuracy of POPSEQ may be improved if the members of the population are sequenced to higher depth. With the sequencing depth used in this study (1–2×), the sequencing reads of each individual cover only approximately 50% of the assembly. Doubling the amount of sequencing data per individual would result in genome coverage of approximately 80% according to the model of Lander and
<xref rid="b100" ref-type="bibr">Waterman (1988)</xref>
(Figure S2), thus reducing the number of missing genotype calls per individual. An increase in sequencing depth is mandatory for highly heterozygous populations such as F
<sub>2</sub>
populations in selfing organisms or F
<sub>1</sub>
populations in outcrossing species in order to correctly type heterozygous SNPs. Using an improved assembly with longer contigs or contigs organized into physically close scaffolds would benefit the analysis, as more SNPs could be used to place each sequence contig. An increase in the number of sequenced individuals (resulting in a proportional increase in the sequencing load) may improve the genetic resolution of the framework map.</p>
<p>We propose that POPSEQ may contribute substantially to fundamental research in plant genetics as well as in crop improvement (for examples, see Figure S3, Appendix S1 and Methods S1). However, its application is not restricted to plants. The fast and steady advances in sequencing technology will further increase the power of POPSEQ, allowing deeper coverage of larger and outbred populations. As long as the inherent complexity of genomes restricts the assembly of pseudo-molecules by shotgun sequencing, POPSEQ provides a rapid, low-cost and effective method for developing a highly useful ‘interim reference’ genome sequence in most species for which it is possible to construct a genetic map.</p>
</sec>
<sec sec-type="methods">
<title>Experimental procedures</title>
<sec>
<title>Whole-genome shotgun sequencing</title>
<p>Illumina paired-end libraries(fragment size approximately 350 bp) were generated from fragmented genomic DNA of 90 individuals from the Morex × Barke RIL population and 82 individuals of the OWB population. Individual libraries were bar-coded prior to combining in pools of eight and sequencing on Illumina HiSeq 2000 instruments (
<ext-link ext-link-type="uri" xlink:href="http://www.illumina.com">http://www.illumina.com</ext-link>
). The iSelect framework was available from a previous study (
<xref rid="b5" ref-type="bibr">Comadran
<italic>et al</italic>
., 2012</xref>
).</p>
</sec>
<sec>
<title>Read mapping and SNP calling</title>
<p>Sequencing reads were quality trimmed and mapped against the Morex WGS assembly (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
) using BWA version 0.6.2 (
<xref rid="b16" ref-type="bibr">Li and Durbin, 2009</xref>
). The command ‘bwa aln’ was used with the parameter ‘-q 15’ for quality trimming, otherwise default parameters were used. After removing duplicate reads using the SAMtools (
<xref rid="b15" ref-type="bibr">Li, 2011</xref>
) command rmdup, variant positions and genotypes of individuals at variant positions were called using the SAMtools mpileup/bcftools pipeline version 0.1.18 (
<xref rid="b15" ref-type="bibr">Li, 2011</xref>
) with default parameters. Additionally, the parameter ‘-D’ was used for SAMtools mpileup to record per-sample read depth. The resulting VCF file was filtered using a custom AWK script. The script removed SNPs with a SAMtools quality score below 40, and further filtered SAMtools genotype calls: a homozygous genotype call was retained if there was at least one read supporting it and its SAMtools genotype quality was at least 3. In the M x B data, a heterozygous call was retained if there were at least three supporting reads and its score was at least 5. In the OWB DH population, heterozygous calls were always discarded. Genotype calls not matching the specified criteria were set to a missing value. A variant position was removed if more than 10% of all samples were called heterozygous, there were more than 80% missing data, or the minor allele frequency (in the non-missing data) was smaller than 5%.</p>
</sec>
<sec>
<title>Mapping SNPs and WGS contigs to the framework map</title>
<p>The nearest neighbors of SNPs detected in the WGS shotgun data were searched for using a heuristic algorithm implemented in GNU C. The source code is available from
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ipk-gatersleben.de/barley-popseq/">ftp://ftp.ipk-gatersleben.de/barley-popseq/</ext-link>
.As a metric, we used the minimum Hamming distance. The nearest neighbors were searched for in the set of (i) 1723 non-redundant iSelect SNPs, (ii) 4056 GBS SNPs used for construction of the M x B GBS map, and (iii) 4632 non-redundant OWB GBS SNPs meeting these criteria described below. A SNP was considered redundant if there was another SNP with the same genotype (on the non-missing data) and the same genetic position. +Correction added on 25 October 2013 after original online publicationon 10 October 2013:
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.gatersleben.de/barley-popseq/">ftp://ftp.gatersleben.de/barley-popseq/</ext-link>
was changed to
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ipk-gatersleben.de/barley-popseq/">ftp://ftp.ipk-gatersleben.de/barley-popseq/</ext-link>
-</p>
<p>SNPs were used to anchor WGS contigs if they were scored unequivocally on more than 20% of the individuals in the population, the Hamming distance (number of different, non-missing genotypes) to their nearest SNPs was not larger than 2, at least 80% of all nearest SNPs lay on the same chromosome, and the median absolute deviation of the cM positions (on the chromosome with most markers) was <5 for the OWB map and the M x B iSelect framework. As we used the population type DH for the M x B RILs (as required for advanced RILs), the M x B GBS map over-estimated the map length by a factor of approximately 3, and we allowed a maximal median absolute deviation of 15. The cM coordinate of a SNP meeting these criteria was defined as the median cM position of its nearest neighbors. A WGS contig was assigned to a genetic position if at least 80% of all SNPs located on it had been mapped to the same chromosome and the median absolute deviation of the cM coordinates of the SNPs was <5 (15 for M × B GBS). The cM position of a contig was set to the median cM position of all SNPs located on the contig.</p>
</sec>
<sec>
<title>Estimation of the error rate</title>
<p>WGS contigs were compared using MegaBLAST version 2.2.26 (
<xref rid="b101" ref-type="bibr">Zhang
<italic>et al</italic>
., 2000</xref>
) to 6278 fully sequenced BACs. Under stringent criteria, we required 100% identity and a minimum alignment length of 1000 bp for each BLAST High Scoring Pair. Under relaxed criteria, we required 99% identity and 200 bp minimum alignment length. The genetic positions of all pairs of contigs on the same BAC were compared (Table S1). BACs with discordant chromosome assignments and with hits to at least five anchored contigs were further analyzed. For each BAC, the chromosome assignments of its contigs were tabulated. If at least 30% of all contigs on a BAC were anchored to the chromosome with the second highest number of contigs, the BAC was deemed problematic, and we checked whether it had been sequenced twice or its length (the cumulative length of its assembled sequence contigs) was unusually large (=180 kb).</p>
</sec>
<sec>
<title>Genetic map construction from M x B GBS data</title>
<p>GBS library production and sequencing for M x B populations were performed as described previously (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
). Reads were deconvoluted using a custom AWK script. Adapter sequences were removed using cutadapt version 1.1 (
<ext-link ext-link-type="uri" xlink:href="http://code.google.com/p/cutadapt">http://code.google.com/p/cutadapt</ext-link>
). Trimmed reads shorter than 30 bp were discarded. Read mapping, SNP and genotype calling, and filtering were performed essentially as described above for the WGS data. As only single ends were used, the BWA command samse was used for alignment. Additionally, only SNPs meeting the following criteria were considered for genetic map construction: no more than 10% missing data, no more than 10% heterozygous genotypes,
<inline-formula>
<inline-graphic xlink:href="tpj0076-0718-mu1.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, where
<italic>A</italic>
and
<italic>B</italic>
indicate the counts of the parental alleles; in the absence of heterozygous calls, this corresponds to a minimum minor allele frequency of 17.6%. For M × B, 4058 SNPs passed these filters. Genetic map construction was performed using MSTMap (
<xref rid="b25" ref-type="bibr">Wu
<italic>et al</italic>
., 2008</xref>
) with the following parameters: population type DH; distance function kosambi; cut_off_p_value 0.00001; no_map_dist 20; no_map_size 2; missing_threshold 0.8; estimation_before_clustering no; detect_bad_data yes; objective_function COUNT. The resulting map contained seven linkage groups with more than one marker. Two markers formed a linkage group of their own and were discarded. According to the obtained orders, orientations and distances between markers, the linkage groups corresponded to the seven barley chromosomes. The relationship between genetic positions in the new map and the iSelect map was obtained through Loess regression +R (
<ext-link ext-link-type="uri" xlink:href="http://www.r-project.org">http://www.r-project.org</ext-link>
) function loess, smoother span 0.3-. Interpolation into the iSelect map of WGS SNP positions integrated to the GBS framework was performed using the loess model with the R function predict.</p>
</sec>
<sec>
<title>De novo
<italic>map construction from POPSEQ</italic>
</title>
<p>To build an independent genetic map from the POPSEQ data without reference to existing maps or other marker data, we restricted our attention to the 115 258 sequence contigs that span at least ten SNPs that are polymorphic between the two parents Morex and Barke. For the purposes of developing a framework map, we further restricted our attention to contigs with highly concordant SNP genotype calls. We therefore set aside contigs that had two or more SNP genotype calls from both parents, indicating the possibility of mis-genotyping through incorrect SNP calls and/or limited cross-contamination between individuals. The resulting 80 189 contigs were then genotyped as either Morex or Barke based on the consensus of their genotyped SNPs, requiring at least three SNP calls. Finally, for the framework map, we only considered contigs that could be consensus genotyped in at least 75 of the 90 individuals. This left us with 66 357 contigs that could be reliably genotyped with limited missing data. We computed the recombination rate and logarithm of odds (LOD) score between each pair of contigs, and clustered contigs with LOD = 10 to form linkage groups: 64 476/65 357 (98.7%) of contigs formed 14 linkage groups, with approximately 98.87% of contig length placed in seven major linkage groups, corresponding to the seven barley chromosomes.</p>
</sec>
<sec>
<title>Integration of WGS SNPs to the OWB GBS bin map</title>
<p>A bin map (
<xref rid="b21" ref-type="bibr">Poland
<italic>et al</italic>
., 2012</xref>
) had previously been constructed from GBS data of 82 OWB DH lines. GBS marker sequences (64 bp long) were aligned against the Morex WGS assembly using the BWA command ‘aln’ and the command ‘samse’. Only alignments with the best possible mapping score of 37 were considered. SNPs with missing data for the parents or more than 10% missing data on the DH lines were not considered for the nearest-neighbor search. The anchoring of SNPs and contigs has been described above.</p>
</sec>
<sec>
<title>
<italic>De novo</italic>
assembly</title>
<p>Illumina paired-end libraries (insert size 350 bp) for barley cultivar Morex had been constructed previously (
<xref rid="b12" ref-type="bibr">International Barley Genome Sequencing Consortium, 2012</xref>
). Sequencing on the Illumina HiSeq 2000 was performed according to the manufacturer’s procedures (
<ext-link ext-link-type="uri" xlink:href="http://www.illumina.com">http://www.illumina.com</ext-link>
). Sequencing reads were quality trimmed and assembled using CLC assembly cell 3.2.2 (
<ext-link ext-link-type="uri" xlink:href="http://www.clcbio.com">http://www.clcbio.com</ext-link>
).</p>
</sec>
</sec>
</body>
<back>
<ack>
<p>The work performed by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US Department of Energy under contract number DE–AC02-05CH11231. The authors would also like to acknowledge the support given by funds received from the Triticeae Coordinated Agricultural Project, US Department of Agriculture/National Institute for Food and Agriculture grant number 2011-68002-30029 to G.J.M. and T.J.C., the Scottish Government Rural and Environment Science and Analytical Services Division Research Programme to R.W., and the Bundesministerium für Bildung und Forschung (TRITEX 0315954) to N.S. and U.S. We thank Sarah Ayling (The Genome Analysis Centre, Norwich, UK) for helpful discussions about simulating read coverage. Finally, we acknowledge S. Taudien and M. Platzer (Fritz Lipmann Institute, Jena, Germany), for providing a paired-end library of cv. Morex for HiSeq 2000 sequencing, and D. Stengel for sequence data submission.</p>
</ack>
<sec>
<title>Accession Numbers</title>
<p>The accession numbers for the sequences described are ERP002183 (GBS sequence data of the Morex × Barke RILs) and ERP002184 (whole-genome shotgun sequence data for the Morex × Barke RILs and OWB DH lines).</p>
</sec>
<ref-list>
<title>References</title>
<ref id="b1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andolfatto</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Davison</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Erezyilmaz</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>TT</given-names>
</name>
<name>
<surname>Mast</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sunayama-Morita</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Stern</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>Multiplexed shotgun genotyping for rapid and efficient genetic mapping</article-title>
<source>Genome Res</source>
<year>2011</year>
<volume>21</volume>
<fpage>610</fpage>
<lpage>617</lpage>
<pub-id pub-id-type="pmid">21233398</pub-id>
</element-citation>
</ref>
<ref id="b2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brenchley</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Spannagl</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Pfeifer</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Analysis of the bread wheat genome using whole-genome shotgun sequencing</article-title>
<source>Nature</source>
<year>2012</year>
<volume>491</volume>
<fpage>705</fpage>
<lpage>710</lpage>
<pub-id pub-id-type="pmid">23192148</pub-id>
</element-citation>
</ref>
<ref id="b3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chandler</surname>
<given-names>VL</given-names>
</name>
<name>
<surname>Brendel</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>The Maize Genome Sequencing Project</article-title>
<source>Plant Physiol</source>
<year>2002</year>
<volume>130</volume>
<fpage>1594</fpage>
<lpage>1597</lpage>
<pub-id pub-id-type="pmid">12481042</pub-id>
</element-citation>
</ref>
<ref id="b4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Presting</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Barbazuk</surname>
<given-names>WB</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An integrated physical and genetic map of the rice genome</article-title>
<source>Plant Cell</source>
<year>2002</year>
<volume>14</volume>
<fpage>537</fpage>
<lpage>545</lpage>
<pub-id pub-id-type="pmid">11910002</pub-id>
</element-citation>
</ref>
<ref id="b5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Comadran</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kilian</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Natural variation in a homolog of
<italic>Antirrhinum CENTRORADIALIS</italic>
contributed to spring growth habit and environmental adaptation in cultivated barley</article-title>
<source>Nat. Genet</source>
<year>2012</year>
<volume>44</volume>
<fpage>1388</fpage>
<lpage>1392</lpage>
<pub-id pub-id-type="pmid">23160098</pub-id>
</element-citation>
</ref>
<ref id="b6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feuillet</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rossini</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Praud</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mayer</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Schulman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Eversole</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Appels</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Integrating cereal genomics to support innovation in the Triticeae</article-title>
<source>Funct. Integr. Genomics</source>
<year>2012</year>
<volume>12</volume>
<fpage>573</fpage>
<lpage>583</lpage>
<pub-id pub-id-type="pmid">23161406</pub-id>
</element-citation>
</ref>
<ref id="b7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gnerre</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Maccallum</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Przybylski</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>High-quality draft assemblies of mammalian genomes from massively parallel sequence data</article-title>
<source>Proc. Natl. Acad. Sci. U. S. A</source>
<year>2011</year>
<volume>108</volume>
<fpage>1513</fpage>
<lpage>1518</lpage>
<pub-id pub-id-type="pmid">21187386</pub-id>
</element-citation>
</ref>
<ref id="b8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The draft genome of watermelon (
<italic>Citrullus lanatus</italic>
) and resequencing of 20 diverse accessions</article-title>
<source>Nat. Genet</source>
<year>2013</year>
<volume>45</volume>
<fpage>51</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">23179023</pub-id>
</element-citation>
</ref>
<ref id="b9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hernandez</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Martis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dorado</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content</article-title>
<source>Plant J</source>
<year>2012</year>
<volume>69</volume>
<fpage>377</fpage>
<lpage>386</lpage>
<pub-id pub-id-type="pmid">21974774</pub-id>
</element-citation>
</ref>
<ref id="b10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>Q</given-names>
</name>
<etal></etal>
</person-group>
<article-title>High-throughput genotyping by whole-genome resequencing</article-title>
<source>Genome Res</source>
<year>2009</year>
<volume>19</volume>
<fpage>1068</fpage>
<lpage>1076</lpage>
<pub-id pub-id-type="pmid">19420380</pub-id>
</element-citation>
</ref>
<ref id="b11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Sang</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide association studies of 14 agronomic traits in rice landraces</article-title>
<source>Nat. Genet</source>
<year>2010</year>
<volume>42</volume>
<fpage>961</fpage>
<lpage>967</lpage>
<pub-id pub-id-type="pmid">20972439</pub-id>
</element-citation>
</ref>
<ref id="b12">
<element-citation publication-type="journal">
<collab>International Barley Genome Sequencing Consortium</collab>
<article-title>A physical, genetic and functional sequence assembly of the barley genome</article-title>
<source>Nature</source>
<year>2012</year>
<volume>491</volume>
<fpage>711</fpage>
<lpage>716</lpage>
<pub-id pub-id-type="pmid">23075845</pub-id>
</element-citation>
</ref>
<ref id="b13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Iqbal</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Caccamo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Flicek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>McVean</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>
<italic>De novo</italic>
assembly and genotyping of variants using colored de Bruijn graphs</article-title>
<source>Nat. Genet</source>
<year>2012</year>
<volume>44</volume>
<fpage>226</fpage>
<lpage>232</lpage>
<pub-id pub-id-type="pmid">22231483</pub-id>
</element-citation>
</ref>
<ref id="b14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lam</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Hastie</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly</article-title>
<source>Nat. Biotechnol</source>
<year>2012</year>
<volume>30</volume>
<fpage>771</fpage>
<lpage>776</lpage>
<pub-id pub-id-type="pmid">22797562</pub-id>
</element-citation>
</ref>
<ref id="b100">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
</person-group>
<article-title>Genomic mapping by fingerprinting random clones: a mathematical analysis</article-title>
<source>Genomics</source>
<year>1988</year>
<volume>2</volume>
<issue>3</issue>
<fpage>231</fpage>
<lpage>239</lpage>
<pub-id pub-id-type="pmid">3294162</pub-id>
</element-citation>
</ref>
<ref id="b15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>2987</fpage>
<lpage>2993</lpage>
<pub-id pub-id-type="pmid">21903627</pub-id>
</element-citation>
</ref>
<ref id="b16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Fast and accurate short read alignment with Burrows–Wheeler transform</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>1754</fpage>
<lpage>1760</lpage>
<pub-id pub-id-type="pmid">19451168</pub-id>
</element-citation>
</ref>
<ref id="b17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Manickavelu</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kawaura</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Imamura</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Mori</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ogihara</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Molecular mapping of quantitative trait loci for domestication traits and β–glucan content in a wheat recombinant inbred line population</article-title>
<source>Euphytica</source>
<year>2011</year>
<volume>177</volume>
<fpage>179</fpage>
<lpage>190</lpage>
</element-citation>
</ref>
<ref id="b18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martienssen</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Rabinowicz</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>O’Shaughnessy</surname>
<given-names>A</given-names>
</name>
<name>
<surname>McCombie</surname>
<given-names>WR</given-names>
</name>
</person-group>
<article-title>Sequencing the maize genome</article-title>
<source>Curr. Opin. Plant Biol</source>
<year>2004</year>
<volume>7</volume>
<fpage>102</fpage>
<lpage>107</lpage>
<pub-id pub-id-type="pmid">15003207</pub-id>
</element-citation>
</ref>
<ref id="b19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nelson</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Deynze</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Sorrells</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Autrique</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Negre</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bernard</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Leroy</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Molecular mapping of wheat. Homoeologous group 3</article-title>
<source>Genome</source>
<year>1995</year>
<volume>38</volume>
<fpage>525</fpage>
<lpage>533</lpage>
<pub-id pub-id-type="pmid">18470186</pub-id>
</element-citation>
</ref>
<ref id="b20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paux</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sourdille</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Salse</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A physical map of the 1–gigabase bread wheat chromosome 3B</article-title>
<source>Science</source>
<year>2008</year>
<volume>322</volume>
<fpage>101</fpage>
<lpage>104</lpage>
<pub-id pub-id-type="pmid">18832645</pub-id>
</element-citation>
</ref>
<ref id="b21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Poland</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Sorrells</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Jannink</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach</article-title>
<source>PLoS ONE</source>
<year>2012</year>
<volume>7</volume>
<fpage>e32253</fpage>
<pub-id pub-id-type="pmid">22389690</pub-id>
</element-citation>
</ref>
<ref id="b22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schadt</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kasarskis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>A window into third-generation sequencing</article-title>
<source>Hum. Mol. Genet</source>
<year>2010</year>
<volume>19</volume>
<fpage>R227</fpage>
<lpage>R240</lpage>
<pub-id pub-id-type="pmid">20858600</pub-id>
</element-citation>
</ref>
<ref id="b23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schnable</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Ware</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>RS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The B73 maize genome: complexity, diversity, and dynamics</article-title>
<source>Science</source>
<year>2009</year>
<volume>326</volume>
<fpage>1112</fpage>
<lpage>1115</lpage>
<pub-id pub-id-type="pmid">19965430</pub-id>
</element-citation>
</ref>
<ref id="b24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Physical and genetic structure of the maize genome reflects its complex evolutionary history</article-title>
<source>PLoS Genet</source>
<year>2007</year>
<volume>3</volume>
<fpage>e123</fpage>
<pub-id pub-id-type="pmid">17658954</pub-id>
</element-citation>
</ref>
<ref id="b25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bhat</surname>
<given-names>PR</given-names>
</name>
<name>
<surname>Close</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Lonardi</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph</article-title>
<source>PLoS Genet</source>
<year>2008</year>
<volume>4</volume>
<fpage>e1000212</fpage>
<pub-id pub-id-type="pmid">18846212</pub-id>
</element-citation>
</ref>
<ref id="b26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Xing</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
</person-group>
<article-title>Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing</article-title>
<source>Proc. Natl. Acad. Sci. U. S. A</source>
<year>2010</year>
<volume>107</volume>
<fpage>10578</fpage>
<lpage>10583</lpage>
<pub-id pub-id-type="pmid">20498060</pub-id>
</element-citation>
</ref>
<ref id="b27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>LL</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The draft genome of sweet orange (
<italic>Citrus sinensis</italic>
</article-title>
<source>Nat. Genet</source>
<year>2013</year>
<volume>45</volume>
<fpage>59</fpage>
<lpage>66</lpage>
<pub-id pub-id-type="pmid">23179022</pub-id>
</element-citation>
</ref>
<ref id="b101">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>A greedy algorithm for aligning DNA sequences</article-title>
<source>J. Comput. Biol</source>
<year>2000</year>
<volume>7</volume>
<issue>1–2</issue>
<fpage>203</fpage>
<lpage>214</lpage>
<pub-id pub-id-type="pmid">10890397</pub-id>
</element-citation>
</ref>
</ref-list>
<sec sec-type="supplementary-material">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="sd1">
<media mimetype="docx" mime-subtype="docx" xlink:href="tpj0076-0718-sd1.docx" xlink:type="simple" id="d35e2144" position="anchor"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="sd2">
<caption>
<p>Figure S1. Distribution of the number of successful genotype calls at variant positions detected in the whole data of the Morex x Barke and OWB populations.</p>
</caption>
<media mimetype="png" mime-subtype="png" xlink:href="tpj0076-0718-sd2.png" xlink:type="simple" id="d35e2149" position="anchor"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="sd3">
<caption>
<p>Figure S2. Observed and expected sequence coverage according to the model of Lander and
<xref rid="b100" ref-type="bibr">Waterman (1988)</xref>
.</p>
</caption>
<media mimetype="png" mime-subtype="png" xlink:href="tpj0076-0718-sd3.png" xlink:type="simple" id="d35e2157" position="anchor"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="sd4">
<caption>
<p>Figure S3. Potential uses for an assembly ordered by POPSEQ.</p>
</caption>
<media mimetype="png" mime-subtype="png" xlink:href="tpj0076-0718-sd4.png" xlink:type="simple" id="d35e2162" position="anchor"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="sd5">
<caption>
<p>Table S1. The percentage of WGS contigs pairs assigned to the same BAC that are positioned farther apart than the specified distance.</p>
</caption>
<media mimetype="docx" mime-subtype="docx" xlink:href="tpj0076-0718-sd5.docx" xlink:type="simple" id="d35e2167" position="anchor"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="sd6">
<caption>
<p>Appendix S1. Applications of a POPSEQ assembly for comparative genomics, reference-based genetic mapping and gene isolation.</p>
</caption>
<media mimetype="docx" mime-subtype="docx" xlink:href="tpj0076-0718-sd6.docx" xlink:type="simple" id="d35e2172" position="anchor"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="sd7">
<caption>
<p>Methods S1. Experimental procedures for Appendix S1.</p>
</caption>
<media mimetype="docx" mime-subtype="docx" xlink:href="tpj0076-0718-sd7.docx" xlink:type="simple" id="d35e2177" position="anchor"></media>
</supplementary-material>
</sec>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Bois/explor/OrangerV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001262 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001262 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Bois
   |area=    OrangerV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4298792
   |texte=   Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:23998490" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OrangerV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 17:11:04 2016. Site generation: Wed Mar 6 18:18:32 2024