MersV1, Pmc, Corpus, bibRecord, 001058

***** Acces problem to record *****\

Identifieur interne : 001058 ( Pmc/Corpus ); précédent : 0010579; suivant : 0010590 ***** probable Xml problem with record *****

Links to Exploration step

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps</title>
<author><name sortKey="Roberts, Michael" sort="Roberts, Michael" uniqKey="Roberts M" first="Michael" last="Roberts">Michael Roberts</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zimin, Aleksey V" sort="Zimin, Aleksey V" uniqKey="Zimin A" first="Aleksey V." last="Zimin">Aleksey V. Zimin</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hayes, Wayne" sort="Hayes, Wayne" uniqKey="Hayes W" first="Wayne" last="Hayes">Wayne Hayes</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hunt, Brian R" sort="Hunt, Brian R" uniqKey="Hunt B" first="Brian R." last="Hunt">Brian R. Hunt</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ustun, Cevat" sort="Ustun, Cevat" uniqKey="Ustun C" first="Cevat" last="Ustun">Cevat Ustun</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="White, James R" sort="White, James R" uniqKey="White J" first="James R." last="White">James R. White</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Havlak, Paul" sort="Havlak, Paul" uniqKey="Havlak P" first="Paul" last="Havlak">Paul Havlak</name>
<affiliation><nlm:aff id="aff2"><addr-line>Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Yorke, James" sort="Yorke, James" uniqKey="Yorke J" first="James" last="Yorke">James Yorke</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">18350171</idno>
<idno type="pmc">2266800</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2266800</idno>
<idno type="RBID">PMC:2266800</idno>
<idno type="doi">10.1371/journal.pone.0001836</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">001058</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001058</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps</title>
<author><name sortKey="Roberts, Michael" sort="Roberts, Michael" uniqKey="Roberts M" first="Michael" last="Roberts">Michael Roberts</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zimin, Aleksey V" sort="Zimin, Aleksey V" uniqKey="Zimin A" first="Aleksey V." last="Zimin">Aleksey V. Zimin</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hayes, Wayne" sort="Hayes, Wayne" uniqKey="Hayes W" first="Wayne" last="Hayes">Wayne Hayes</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hunt, Brian R" sort="Hunt, Brian R" uniqKey="Hunt B" first="Brian R." last="Hunt">Brian R. Hunt</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ustun, Cevat" sort="Ustun, Cevat" uniqKey="Ustun C" first="Cevat" last="Ustun">Cevat Ustun</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="White, James R" sort="White, James R" uniqKey="White J" first="James R." last="White">James R. White</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Havlak, Paul" sort="Havlak, Paul" uniqKey="Havlak P" first="Paul" last="Havlak">Paul Havlak</name>
<affiliation><nlm:aff id="aff2"><addr-line>Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Yorke, James" sort="Yorke, James" uniqKey="Yorke J" first="James" last="Yorke">James Yorke</name>
<affiliation><nlm:aff id="aff1"><addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version <italic>PhrapUMD</italic>
. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the <italic>Rattus norvegicus</italic>
 genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at <ext-link ext-link-type="uri" xlink:href="http://www.genome.umd.edu">http://www.genome.umd.edu</ext-link>
. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Ewing, B" uniqKey="Ewing B">B Ewing</name>
</author>
<author><name sortKey="Hillier, L" uniqKey="Hillier L">L Hillier</name>
</author>
<author><name sortKey="Wendl, Mc" uniqKey="Wendl M">MC Wendl</name>
</author>
<author><name sortKey="Green, P" uniqKey="Green P">P Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ewing, B" uniqKey="Ewing B">B Ewing</name>
</author>
<author><name sortKey="Green, P" uniqKey="Green P">P Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author><name sortKey="White, O" uniqKey="White O">O White</name>
</author>
<author><name sortKey="Adams, Md" uniqKey="Adams M">MD Adams</name>
</author>
<author><name sortKey="Kerlavage, Ar" uniqKey="Kerlavage A">AR Kerlavage</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author><name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author><name sortKey="Dew, Im" uniqKey="Dew I">IM Dew</name>
</author>
<author><name sortKey="Fasulo, Dp" uniqKey="Fasulo D">DP Fasulo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Havlak, P" uniqKey="Havlak P">P Havlak</name>
</author>
<author><name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author><name sortKey="Durbin, Kj" uniqKey="Durbin K">KJ Durbin</name>
</author>
<author><name sortKey="Egan, A" uniqKey="Egan A">A Egan</name>
</author>
<author><name sortKey="Ren, Y" uniqKey="Ren Y">Y Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
<author><name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author><name sortKey="Stanley, K" uniqKey="Stanley K">K Stanley</name>
</author>
<author><name sortKey="Butler, J" uniqKey="Butler J">J Butler</name>
</author>
<author><name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mullikin, Jc" uniqKey="Mullikin J">JC Mullikin</name>
</author>
<author><name sortKey="Ning, Z" uniqKey="Ning Z">Z Ning</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Aparicio, S" uniqKey="Aparicio S">S Aparicio</name>
</author>
<author><name sortKey="Chapman, J" uniqKey="Chapman J">J Chapman</name>
</author>
<author><name sortKey="Stupka, E" uniqKey="Stupka E">E Stupka</name>
</author>
<author><name sortKey="Putnam, N" uniqKey="Putnam N">N Putnam</name>
</author>
<author><name sortKey="Chia, Jm" uniqKey="Chia J">JM Chia</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author><name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author><name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
<author><name sortKey="Yang, Sp" uniqKey="Yang S">SP Yang</name>
</author>
<author><name sortKey="Hillier, L" uniqKey="Hillier L">L Hillier</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Green, P" uniqKey="Green P">P Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Roberts, M" uniqKey="Roberts M">M Roberts</name>
</author>
<author><name sortKey="Hunt, Br" uniqKey="Hunt B">BR Hunt</name>
</author>
<author><name sortKey="Yorke, Ja" uniqKey="Yorke J">JA Yorke</name>
</author>
<author><name sortKey="Bolanos, R" uniqKey="Bolanos R">R Bolanos</name>
</author>
<author><name sortKey="Delcher, A" uniqKey="Delcher A">A Delcher</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author><name sortKey="Yorke, J" uniqKey="Yorke J">J Yorke</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author><name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
<author><name sortKey="Smit, A" uniqKey="Smit A">A Smit</name>
</author>
<author><name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author><name sortKey="Baertsch, R" uniqKey="Baertsch R">R Baertsch</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author><name sortKey="Kasif, S" uniqKey="Kasif S">S Kasif</name>
</author>
<author><name sortKey="Fleischmann, Rd" uniqKey="Fleischmann R">RD Fleischmann</name>
</author>
<author><name sortKey="Peterson, J" uniqKey="Peterson J">J Peterson</name>
</author>
<author><name sortKey="White, O" uniqKey="White O">O White</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group><journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher><publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">18350171</article-id>
<article-id pub-id-type="pmc">2266800</article-id>
<article-id pub-id-type="publisher-id">07-PONE-RA-02495R1</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0001836</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline"><subject>Computational Biology</subject>
<subject>Genetics and Genomics/Bioinformatics</subject>
<subject>Genetics and Genomics/Genome Projects</subject>
</subj-group>
</article-categories>
<title-group><article-title>Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps</article-title>
<alt-title alt-title-type="running-head">Improving Phrap-Based Assembly</alt-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Roberts</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Zimin</surname>
<given-names>Aleksey V.</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor1"><sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Hayes</surname>
<given-names>Wayne</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn1"><sup>¤a</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Hunt</surname>
<given-names>Brian R.</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Ustun</surname>
<given-names>Cevat</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn2"><sup>¤b</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>White</surname>
<given-names>James R.</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Havlak</surname>
<given-names>Paul</given-names>
</name>
<xref ref-type="aff" rid="aff2"><sup>2</sup>
</xref>
<xref ref-type="author-notes" rid="fn3"><sup>¤c</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Yorke</surname>
<given-names>James</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1"><label>1</label>
<addr-line>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America</addr-line>
</aff>
<aff id="aff2"><label>2</label>
<addr-line>Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America</addr-line>
</aff>
<contrib-group><contrib contrib-type="editor"><name><surname>Hall</surname>
<given-names>Neil</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">University of Liverpool, United Kingdom</aff>
<author-notes><corresp id="cor1">* E-mail: <email>alekseyz@ipst.umd.edu</email>
</corresp>
<fn fn-type="con"><p>Conceived and designed the experiments: MR AZ WH BH JY CU. Performed the experiments: MR AZ WH JW CU. Analyzed the data: MR PH AZ WH JW CU. Contributed reagents/materials/analysis tools: PH. Wrote the paper: AZ WH JY. Other: Headed the project: JY. PI on the grant: JY.</p>
</fn>
<fn id="fn1" fn-type="current-aff"><label>¤a</label>
<p>Current address: Department of Computer Science, University of California Irvine, Irvine, California, United States of America</p>
</fn>
<fn id="fn2" fn-type="current-aff"><label>¤b</label>
<p>Current address: Department of Biology, California Institute of Technology, Pasadena, California, United States of America</p>
</fn>
<fn id="fn3" fn-type="current-aff"><label>¤c</label>
<p>Current address: National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, United States of America</p>
</fn>
</author-notes>
<pub-date pub-type="collection"><year>2008</year>
</pub-date>
<pub-date pub-type="epub"><day>19</day>
<month>3</month>
<year>2008</year>
</pub-date>
<volume>3</volume>
<issue>3</issue>
<elocation-id>e1836</elocation-id>
<history><date date-type="received"><day>15</day>
<month>10</month>
<year>2007</year>
</date>
<date date-type="accepted"><day>9</day>
<month>2</month>
<year>2008</year>
</date>
</history>
<permissions><copyright-statement>Roberts et al.</copyright-statement>
<copyright-year>2008</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.</license-p>
</license>
</permissions>
<abstract><p>The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version <italic>PhrapUMD</italic>
. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the <italic>Rattus norvegicus</italic>
 genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at <ext-link ext-link-type="uri" xlink:href="http://www.genome.umd.edu">http://www.genome.umd.edu</ext-link>
. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.</p>
</abstract>
<counts><page-count count="5"></page-count>
</counts>
</article-meta>
</front>
<body><sec id="s1"><title>Introduction</title>
<p>Most genomes for which draft assemblies are available have been assembled using the whole-genome shotgun (WGS) method or a hybrid-WGS technique. In the WGS method many copies of the genome are randomly fractured into fragments, with estimated lengths that usually run from several thousand bases (Kb) (plasmids and fosmids) to some that are well over 100 Kb (Bacterial Articial Chromosomes or BACs). The actual length of each fragment is likely to differ from the estimated length by perhaps 10% to 20%. The sequences of the two ends of each fragment are then read imperfectly. The sequence of each end is called a <italic>read</italic>
. Two reads that were created from opposite ends of the same fragment are said to be <italic>mates</italic>
, and they form a <italic>mate-pair</italic>
. Each read has up to 1000 bases. As the sequence is created, each base is assigned a quality score related to the probability that the base is being reported incorrectly <xref rid="pone.0001836-Ewing1" ref-type="bibr">[1]</xref>
, <xref rid="pone.0001836-Ewing2" ref-type="bibr">[2]</xref>
. Enough fragments are created so that a typical base in the genome is represented in several reads, usually about seven to thirteen. Given this data and no more, the WGS assembly problem is to assemble the genome as completely and correctly as possible. Several genome assembly programs have been developed, such as the TIGR assembler <xref rid="pone.0001836-Sutton1" ref-type="bibr">[3]</xref>
, the Celera Assembler <xref rid="pone.0001836-Myers1" ref-type="bibr">[4]</xref>
, Atlas <xref rid="pone.0001836-Havlak1" ref-type="bibr">[5]</xref>
, Arachne <xref rid="pone.0001836-Batzoglou1" ref-type="bibr">[6]</xref>
, Phusion <xref rid="pone.0001836-Mullikin1" ref-type="bibr">[7]</xref>
, JAZZ <xref rid="pone.0001836-Aparicio1" ref-type="bibr">[8]</xref>
, and PCAP <xref rid="pone.0001836-Huang1" ref-type="bibr">[9]</xref>
. Although appearing deceptively simple, genome assembly is remarkably difficult in practice. This is evident by the fact that despite using the same input, different assemblers can produce draft assemblies that differ considerably in size and error rates.</p>
<p>Several assembly programs (e.g. Phusion and Atlas) utilize Phrap <xref rid="pone.0001836-Green1" ref-type="bibr">[10]</xref>
 at the early stages of the assembly. Phrap is also widely used as a standalone tool for creating local assemblies of the BAC-sized (up to about 250K bases) regions of genomic sequence. Given a set of reads and optional quality scores, Phrap computes overlaps and assembles the reads into contigs, generating a read multi-alignment, a contig sequence and sequence quality information. We have produced <italic>PhrapUMD</italic>
, a modified version of Phrap that allows the user to control which overlaps Phrap uses in building contigs. We paired PhrapUMD with the UMD Overlapper <xref rid="pone.0001836-Roberts1" ref-type="bibr">[11]</xref>
, which corrects errors in the reads and accurately computes a set of high-quality overlaps that we call “reliable”. This paper shows how a Phrap-based assembler can be improved by simply substituting the UMD Overlapper and PhrapUMD for Phrap in its pipeline. To demonstrate the power of our techniques, we integrated them into Atlas, the Baylor College of Medicine (Baylor) assembly program. We used the modified Atlas to produce assemblies of approximately 20,000 BACs from the rat genome project. We report here on how the modification improves Atlas' ability to assemble the genome of the rat <italic>Rattus Norvegicus</italic>
<xref rid="pone.0001836-Rat1" ref-type="bibr">[12]</xref>
.</p>
<p>We note that the methods that we propose and evaluate in this manuscript are mostly useful for assembly programs utilizing Phrap for building contigs. We do not expect that the use of “reliable” overlaps will bring about any improvements for assembly programs that do not use Phrap, such as Celera Assembler, Arachne, or PCAP. Still, there are many centers that use Phrap for assembling genomes or fragments of genomes, such as National Intramural Sequencing Center at NIH, Human Genome Sequencing Center at Baylor College of Medicine, Sanger Centre, and many others. The methods discussed in this paper will be of great benefit to these centers.</p>
<p>We evaluate our assembly methods by comparing the resulting draft with the finished sequence of a part of the genome. By commonly accepted definition, the finished sequence is a gapless sequence with less than 1 error per 10000 bases, whose validity has been checked and corrected with additional local sequencing. However, it is important to note that this sequence may not be completely correct <xref rid="pone.0001836-Salzberg1" ref-type="bibr">[13]</xref>
.</p>
</sec>
<sec id="s2"><title>Methods</title>
<p>One of the first steps in creating an assembly from WGS data is to determine which reads overlap each other based on comparison of their sequences. The fact that two reads' sequences agree over some interval does not necessarily imply that these reads came from the same part of the genome. They might have come from different copies of a repetitive region. We call the set of all overlaps between reads <italic>plausible</italic>
. Some portion of the plausible overlaps is spurious due to repetitive regions in the genomes. In this paper, we describe a technique that identifies a subset of the plausible overlaps that we call <italic>reliable</italic>
. The reason for creating reliable overlaps is to avoid creating misassemblies at the early stages of the assembly when the contiguous chunks of sequence (contigs) are built using only overlap information.</p>
<p><xref ref-type="fig" rid="pone-0001836-g001">Figure 1a</xref>
 shows a scenario where a genome contains two copies of a repeat region R. The correct positions of reads A, B, C and D are shown. The repeat region causes a “fork” in the overlaps, as shown in <xref ref-type="fig" rid="pone-0001836-g001">Figure 1b</xref>
. The fork is created because read A has a plausible overlap with reads B, C and D, but D does not overlap B and C. We call the overlaps of A with B and D “fork overlaps”. Our goal is to design a method that eliminates the fork overlaps from the list of plausible overlaps, thereby producing a list of overlaps we call “reliable”. In <xref ref-type="fig" rid="pone-0001836-g001">Figure 1b</xref>
, the only overlap that we would like to call reliable is between reads A and C, because part of the overlap region is outside the repeat region.</p>
<fig id="pone-0001836-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0001836.g001</object-id>
<label>Figure 1</label>
<caption><title>Illustration of the technique that identifies reliable overlaps: (a) a scenario where a genome contains two copies of a repeat region R.</title>
<p>The correct positions of reads A, B, C and D are shown. (b) A “fork” in the overlaps. (c) a scenario where reads A and D have the same sequencing error at the same base.</p>
</caption>
<graphic xlink:href="pone.0001836.g001"></graphic>
</fig>
<p>We accomplish the task of eliminating the fork overlaps by identifying the fork 20-mers. In <xref ref-type="fig" rid="pone-0001836-g001">Figure 1b</xref>
 all 20-mers belonging to the region between the dashed lines are considered to be “fork 20-mers” because they are present in reads B and D, which do not overlap. More generally, we define a 20-mer to be a “fork 20-mer” if there are two non-overlapping reads that have this 20-mer in common. We define a 20-mer to be “reliable” if it is not a fork 20-mer, that is if all reads containing the 20-mer plausibly overlap.</p>
<p>We define an overlap to be reliable if the reads have at least two non-overlapping reliable 20-mers in common (see <xref ref-type="fig" rid="pone-0001836-g001">Figure 1b</xref>
). We might like to call an overlap reliable if the overlapping sequence contains even one reliable 20-mer, but this might be an illusion caused by sequencing errors. <xref ref-type="fig" rid="pone-0001836-g001">Figure 1c</xref>
 shows a scenario where reads A and D have the same sequencing error at the same base. For example, a C was read as a G in both reads at the same location (marked by a cross). This error will cause each 20-mer spanning the error's location to be declared reliable, assuming that only A and D contain these error-induced 20-mers, because A and D plausibly overlap. We impose the requirement of two non-overlapping reliable 20-mers to make sure that the overlap between A and D is not declared reliable because of a sequencing error. Our method will not declare the overlap between A and D to be reliable unless these reads have two sequencing errors of the same kind at two matching bases. In practice there are rare occasions in which a spurious overlap is labeled reliable, but as assembly results show, such overlaps do not cause major problems.</p>
<p>To test the effectiveness of using the UMD reliable overlaps with PhrapUMD, we incorporated our methods into Atlas, the Baylor College of Medicine assembly program. We applied the resulting software to the assembly of the rat genome. Atlas utilized the hybrid WGS – BAC approach to sequence the rat genome. Most of the rat genome was covered by a tiling of about 20,000 BACs, each averaging over 200 Kb of sequence. These BACs were individually sequenced at low coverage (generally 1x to 2x). The Atlas strategy was to consider the set of reads from each BAC (BAC reads), find which WGS reads appeared to overlap the BAC reads, and then add in these WGS reads and their mates. This approach resulted in independent data sets (buckets) such that an assembly could be created for each BAC. With these sets of reads, Atlas ran Phrap on each bucket to build contigs and then arranged the contigs into scaffolds using mate pair information. We assembled each BAC, but did not merge the scaffolds of the different BACs.</p>
<p>The UMD+Atlas results reported in the following section were obtained by incorporating the following UMD techniques into Atlas:</p>
<list list-type="order"><list-item><p>We use the UMD Overlapper <xref rid="pone.0001836-Roberts1" ref-type="bibr">[11]</xref>
 to determine plausible overlaps. Since the UMD Overlapper is capable of error correcting the reads, we trim reads only when the expected error rate reaches 10%, based of the reported quality scores. This process yields reads that are about 12% longer. We chose such trimming because it provided the longest contigs.</p>
</list-item>
<list-item><p>We determine reliable overlaps, and then use PhrapUMD to create a set of high quality contigs that we call <italic>reliable</italic>
 contigs. These are generally shorter than regular Phrap contigs, but they are lengthened in the following step.</p>
</list-item>
<list-item><p>After scaffolding with Atlas, we examine each pair of adjacent contigs to see if their ends would overlap according to the set of plausible overlaps produced in (1), if at most one read were removed from each end. If this is the case, we then create an extended set of overlaps consisting of the reliable overlaps combined with plausible overlaps of the end reads from the adjacent contigs. We find that a second pass of PhrapUMD using this slightly extended set of reliable overlaps results in much bigger contigs without sacrificing the error rate of the resulting sequence or the fraction of finished sequence covered. These contigs are then scaffolded with the Atlas scaffolder to get the final result. In this way, we effectively force PhrapUMD to use mate pair information to build contigs. The ability to limit the overlaps PhrapUMD may consider turns it into a tool that can be used iteratively.</p>
</list-item>
</list>
<p>We note that our method's ability to resolve repeats is still limited by the size of the largest insert library that is available. Any repeat that is larger than the longest library available may cause misassemblies. Original Phrap does not use mate pair information in building contigs. It would be very beneficial to implement some direct way to have Phrap use mate pairing data, but this would require major changes to the code and may result in reduced useability and stability of the software. One of the reasons why Phrap is so widely used is that it is stable and easy to install and run software, and our goal was to gain maximum improvement while introducing minimal changes to the Phrap software. Reliable overlaps allow Phrap to build “unitigs” (for more information on unitigs see <xref rid="pone.0001836-Myers1" ref-type="bibr">[4]</xref>
). Unitigs are contigs that can be assembled in a unique way, and thus repeat and unique regions are assembled into separate unitigs. The subsequent step of scaffolding the unitigs and then expanding the set of overlaps allows Phrap to indirectly use mate pair information in building its final contigs.</p>
</sec>
<sec id="s3"><title>Results and Discussion</title>
<p>In this paper, we use the data set Freeze02, a complete collection of read data, and a corresponding Atlas assembly of the rat produced by the Rat Genome Sequencing Consortium. We restrict our report to the subset of reads covering 4.3 million bases of finished sequence in 21 BACs. At the time this work was performed this was the largest contiguous chunk of finished sequence that was available to us. The average read coverage of the 21 BACs is about 7. For all 33 million reads, the average coverage is about 7.3. While 4.3Mb is only a bit more than 0.1% of the rat genome, it does provide a substantial test bed. Later data sets such as Freeze03 and Freeze04 incorporate finished sequence, so they cannot be used to test the skill of the WGS assembly techniques.</p>
<p>Following the Atlas standard, we consider only those contigs output by UMD+Atlas that are 1 Kb or longer. We then match these contigs against finished sequence using BLASTZ software <xref rid="pone.0001836-Schwartz1" ref-type="bibr">[14]</xref>
). We score each match. Experience has shown that if a contig has more than one BLASTZ match to finished sequence, the longer match is not necessarily the more desirable one. Often, another match with a slightly shorter length but many fewer errors will be present, and better alignments can be found by defining a score <italic>S</italic>
 that severely penalizes errors. If <italic>K</italic>
 is the factor by which we penalize each base error, we define the score of an alignment to be<disp-formula><graphic xlink:href="pone.0001836.e001.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
for each alignment. We use <italic>K</italic>
 = 125, which means that a successful match can have at most a 0.8% error rate, compared to finished sequence. The parameters we have used for Blastz comparisons are <italic>C = 2 W = 16 T = 0 K = 25000</italic>
, where <italic>K</italic>
 relates to the the gap penalty (the default value for which is <italic>K = 2500</italic>
, <italic>W</italic>
 is the word length used in initiating a match, and <italic>C = 2</italic>
 ensures that BLASTZ uses a “chain and extend" approach in matching sequences (the default is to not chain).We use all matches that are at least 1kb in length. If a contig matches in multiple places, we pick the match that has the highest positive score. The <italic>tails</italic>
 of a contig are the parts at either or both ends that are outside the successful match.</p>
<p>We measure the following quantities for each assembly:</p>
<list list-type="bullet"><list-item><p>% Non-Matching Contig Tails: the percentage of assembly bases that are in non-matching tails of contigs. These reflect assembly errors on the ends of contigs.</p>
</list-item>
<list-item><p>% of Finished Sequence Matched: the percentage of the span of finished sequence that is matched by contigs longer than 1 Kb. Erroneous bases are counted in this number. If a finished base is matched by more than one contig, the base is counted only once.</p>
</list-item>
<list-item><p>Number of Contigs: total number of contigs in the scaffolds of the assembly of the 21 BACs.</p>
</list-item>
<list-item><p>Interior Error rate: We take only the highest scoring alignment of a contig to the finished sequence as the match and define the interior error rate across a set of BACs to be<disp-formula><graphic xlink:href="pone.0001836.e002.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
where the sums are carried out over all matching contigs in all BACs. (Note that the denominator here is the sum of the matching lengths of contigs rather than the number of finished bases covered, and that all errors in the contigs are counted. For example, if two contigs cover a given finished base, and both get the base wrong, then both errors are counted.) The cumulative results for a data set consisting of 21 BACs are presented in <xref ref-type="table" rid="pone-0001836-t001">Table 1:</xref>
</p>
</list-item>
<list-item><p>The top line of the table gives the results of the original Atlas utilizing Phrap from the Freeze 02 assembly. The original Atlas assembly has an interior error rate of 0.045% and matches 93.4% of the finished sequence.</p>
</list-item>
<list-item><p>The line, “original Atlas with UMD Plausible”, shows the result of substituting PhrapUMD with UMD <italic>plausible</italic>
 overlaps for Phrap in the original Atlas. The primary impact of the switch is that 2.7% more of the finished sequence is matched.</p>
</list-item>
<list-item><p>The third line, “original Atlas with UMD Reliable”, shows the result of substituting PhrapUMD with UMD <italic>reliable</italic>
 overlaps for Phrap in the original Atlas. This assembly covers slightly more finished sequence, but more importantly, decreases both the interior error rate and the tail errors by roughly a factor of 4. However, the number of contigs increases to 480, i.e. the assembly becomes more fragmented.</p>
</list-item>
<list-item><p>The forth line, “two-pass Atlas with UMD Reliable,” shows the result of using UMD reliable overlaps and the two-pass approach described in the <xref ref-type="sec" rid="s2">methods</xref>
 section. This is our best assembly. At 1/4 the original Atlas error rate, this assembly has approximately 3% more bases matching finished sequence than the Atlas assembly.</p>
</list-item>
</list>
<table-wrap id="pone-0001836-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0001836.t001</object-id>
<label>Table 1</label>
<caption><title>Comparison of the three assemblies for the subset of the 21 BACs from the Rat genome.</title>
</caption>
<alternatives><graphic id="pone-0001836-t001-1" xlink:href="pone.0001836.t001"></graphic>
<table frame="hsides" rules="groups"><colgroup span="1"><col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead><tr><td align="left" rowspan="1" colspan="1">Assembly</td>
<td align="left" rowspan="1" colspan="1">% Non-Matching Contig Tails</td>
<td align="left" rowspan="1" colspan="1">% of Finished Sequence Matched</td>
<td align="left" rowspan="1" colspan="1">% Interior Error Rate</td>
<td align="left" rowspan="1" colspan="1">Number Of Conigs</td>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">original Atlas</td>
<td align="left" rowspan="1" colspan="1">0.331</td>
<td align="left" rowspan="1" colspan="1">93.4</td>
<td align="left" rowspan="1" colspan="1">0.045</td>
<td align="left" rowspan="1" colspan="1">377</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">original Atlas with UMD Plausible</td>
<td align="left" rowspan="1" colspan="1">0.448</td>
<td align="left" rowspan="1" colspan="1">96.1</td>
<td align="left" rowspan="1" colspan="1">0.041</td>
<td align="left" rowspan="1" colspan="1">375</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">original Atlas with UMD Reliable</td>
<td align="left" rowspan="1" colspan="1">0.118</td>
<td align="left" rowspan="1" colspan="1">96.3</td>
<td align="left" rowspan="1" colspan="1">0.012</td>
<td align="left" rowspan="1" colspan="1">480</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">two-pass Atlas with UMD Reliable</td>
<td align="left" rowspan="1" colspan="1">0.075</td>
<td align="left" rowspan="1" colspan="1">96.3</td>
<td align="left" rowspan="1" colspan="1">0.011</td>
<td align="left" rowspan="1" colspan="1">371</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot><fn id="nt101"><p>The “original Atlas with UMD Plausible” and “original Atlas with UMD reliable” assembly results obtained by substituting Phrap for PhrapUMD with UMD plausible and reliable overlaps respectively. The best assembly (the bottom line) uses PhrapUMD and UMD reliable overlaps utilizing the 2-pass approach described in the “<xref ref-type="sec" rid="s2">Methods</xref>
” section. It has almost 3% more sequence matching finished sequence than original Atlas with Phrap at less than 1/4 the original base error rate.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>We note that the reduction in the interior error rate is mostly due to error-correction and trimming routines in the UMD Overlapper. By providing Phrap with trimmed and error-corrected reads, we reduce the possibility of errors in Phrap consensus.</p>
<p><xref ref-type="fig" rid="pone-0001836-g002">Figure 2</xref>
 shows one of our most dramatically improved BACs. We used NUCmer, a variant of the MUMmer program <xref rid="pone.0001836-Delcher1" ref-type="bibr">[15]</xref>
, to align the assemblies of the BAC GQQD to the finished sequence. This particular BAC was initially assembled by Atlas into two scaffolds, and one scaffold contained a 20 Kb section that was reversed and misplaced. Using PhrapUMD with reliable overlaps, UMD+Atlas assembled the entire BAC into one scaffold and fixed the major misassembly. Our assembly of this BAC matched 20.0% more finished sequence than the Atlas assembly and reduced the interior error rate from 4.3 errors per 10 Kb in the Atlas assembly to 1.7 errors per 10 Kb. <xref ref-type="fig" rid="pone-0001836-g003">Figure 3</xref>
 demonstrates the worst UMD+Atlas assembly. This was the only BAC that got assembled into two separate scaffolds; the rest of them were assembled into a single scaffold. In this BAC a 26Kb section in the middle was assembled into a separate scaffold, Scaffold 1, whereas the rest of the BAC was assembled into Scaffold 2. The gap in the middle of Scaffold 2, matching the size and position of the Scaffold 1, was estimated correctly. We do not view this scenario as a misassembly.</p>
<fig id="pone-0001836-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0001836.g002</object-id>
<label>Figure 2</label>
<caption><title>Two alignments of assemblies to the finished sequence of BAC GQQD.</title>
<p>The original Atlas assembly created two scaffolds only covering 73.2% of the finished sequence. Note the misplaced 20 Kb segment in the Atlas assembly. The UMD+Atlas assembly of GQQD correctly places the 20 Kb section originally misplaced and creates a single scaffold of the BAC covering 93.3% of the finished sequence. This UMD+Atlas assembly used reliable overlaps. This was the BAC that gave Atlas the most trouble.</p>
</caption>
<graphic xlink:href="pone.0001836.g002"></graphic>
</fig>
<fig id="pone-0001836-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0001836.g003</object-id>
<label>Figure 3</label>
<caption><title>Two alignments of assemblies to the finished sequence of BAC GMEZ.</title>
<p>The original Atlas assembly created a single scaffold. The UMD+Atlas assembly of GMEZ assembled a 26 Kb section from the middle of the bigger scaffold into a separate Scaffold 1. Note that the large scaffold gap in the Scaffold 2 is estimated correctly. This UMD+Atlas assembly used reliable overlaps. This was the BAC that gave UMD+Atlas the most trouble and the only case where UMD+Atas assembly had two scaffolds.</p>
</caption>
<graphic xlink:href="pone.0001836.g003"></graphic>
</fig>
</sec>
</body>
<back><ack><p>We thank Phil Green for providing a copy of the Phrap software.</p>
</ack>
<fn-group><fn fn-type="COI-statement"><p><bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="financial-disclosure"><p><bold>Funding: </bold>
This work was supported under NSF grant DMS0616585 and under NIH grant 1R01HG0294501.</p>
</fn>
</fn-group>
<ref-list><title>References</title>
<ref id="pone.0001836-Ewing1"><label>1</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ewing</surname>
<given-names>B</given-names>
</name>
<name><surname>Hillier</surname>
<given-names>L</given-names>
</name>
<name><surname>Wendl</surname>
<given-names>MC</given-names>
</name>
<name><surname>Green</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Base-calling of automated sequencer traces using Phred. I. Accuracy assessment.</article-title>
<source>Genome Res.</source>
<volume>8</volume>
<fpage>175</fpage>
<lpage>185</lpage>
<pub-id pub-id-type="pmid">9521921</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Ewing2"><label>2</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ewing</surname>
<given-names>B</given-names>
</name>
<name><surname>Green</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Base-calling of automated sequencer traces using Phred. II. Error probabilities.</article-title>
<source>Genome Res.</source>
<volume>8</volume>
<fpage>186</fpage>
<lpage>194</lpage>
<pub-id pub-id-type="pmid">9521922</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Sutton1"><label>3</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sutton</surname>
<given-names>GG</given-names>
</name>
<name><surname>White</surname>
<given-names>O</given-names>
</name>
<name><surname>Adams</surname>
<given-names>MD</given-names>
</name>
<name><surname>Kerlavage</surname>
<given-names>AR</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects.</article-title>
<source>Genome Sci. and Technology</source>
<volume>1</volume>
<fpage>9</fpage>
<lpage>19</lpage>
</element-citation>
</ref>
<ref id="pone.0001836-Myers1"><label>4</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name><surname>Sutton</surname>
<given-names>GG</given-names>
</name>
<name><surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name><surname>Dew</surname>
<given-names>IM</given-names>
</name>
<name><surname>Fasulo</surname>
<given-names>DP</given-names>
</name>
<etal></etal>
</person-group>
<year>2000</year>
<article-title>A Whole-Genome Assembly of <italic>Drosophila</italic>
.</article-title>
<source>Science</source>
<volume>287</volume>
<fpage>2196</fpage>
<lpage>2204</lpage>
<pub-id pub-id-type="pmid">10731133</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Havlak1"><label>5</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Havlak</surname>
<given-names>P</given-names>
</name>
<name><surname>Chen</surname>
<given-names>R</given-names>
</name>
<name><surname>Durbin</surname>
<given-names>KJ</given-names>
</name>
<name><surname>Egan</surname>
<given-names>A</given-names>
</name>
<name><surname>Ren</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<year>2004</year>
<article-title>The Atlas Genome Assembly System.</article-title>
<source>Genome Res.</source>
<volume>14</volume>
<fpage>721</fpage>
<lpage>732</lpage>
<pub-id pub-id-type="pmid">15060016</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Batzoglou1"><label>6</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<name><surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name><surname>Stanley</surname>
<given-names>K</given-names>
</name>
<name><surname>Butler</surname>
<given-names>J</given-names>
</name>
<name><surname>Gnerre</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>ARACHNE: A Whole Genome Shotgun Assembler.</article-title>
<source>Genome Res.</source>
<volume>12</volume>
<fpage>177</fpage>
<lpage>189</lpage>
<pub-id pub-id-type="pmid">11779843</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Mullikin1"><label>7</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mullikin</surname>
<given-names>JC</given-names>
</name>
<name><surname>Ning</surname>
<given-names>Z</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>The Phusion Assembler.</article-title>
<source>Genome Res.</source>
<volume>13</volume>
<fpage>81</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="pmid">12529309</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Aparicio1"><label>8</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aparicio</surname>
<given-names>S</given-names>
</name>
<name><surname>Chapman</surname>
<given-names>J</given-names>
</name>
<name><surname>Stupka</surname>
<given-names>E</given-names>
</name>
<name><surname>Putnam</surname>
<given-names>N</given-names>
</name>
<name><surname>Chia</surname>
<given-names>JM</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes.</article-title>
<source>Science</source>
<volume>297</volume>
<fpage>1301</fpage>
<lpage>1310</lpage>
<pub-id pub-id-type="pmid">12142439</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Huang1"><label>9</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname>
<given-names>X</given-names>
</name>
<name><surname>Wang</surname>
<given-names>J</given-names>
</name>
<name><surname>Aluru</surname>
<given-names>S</given-names>
</name>
<name><surname>Yang</surname>
<given-names>SP</given-names>
</name>
<name><surname>Hillier</surname>
<given-names>L</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>PCAP: a whole-genome assembly program.</article-title>
<source>Genome Res.</source>
<volume>13(9)</volume>
<fpage>2164</fpage>
<lpage>70</lpage>
<pub-id pub-id-type="pmid">12952883</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Green1"><label>10</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Green</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>1996</year>
<article-title>Phrap documentation.</article-title>
<comment><ext-link ext-link-type="uri" xlink:href="http://www.phrap.org/phredphrap/phrap.html">http://www.phrap.org/phredphrap/phrap.html</ext-link>
</comment>
</element-citation>
</ref>
<ref id="pone.0001836-Roberts1"><label>11</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Roberts</surname>
<given-names>M</given-names>
</name>
<name><surname>Hunt</surname>
<given-names>BR</given-names>
</name>
<name><surname>Yorke</surname>
<given-names>JA</given-names>
</name>
<name><surname>Bolanos</surname>
<given-names>R</given-names>
</name>
<name><surname>Delcher</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>A Preprocessor for Shotgun Assembly of Large Genomes.</article-title>
<source>Journal of Computational Biology</source>
<volume>11</volume>
<fpage>734</fpage>
<lpage>752</lpage>
</element-citation>
</ref>
<ref id="pone.0001836-Rat1"><label>12</label>
<element-citation publication-type="journal"><collab>Rat Genome Sequencing Project Consortium</collab>
<year>2004</year>
<article-title>Genome sequence of the Brown Norway rat yields insights into mammalian evolution.</article-title>
<source>Nature</source>
<volume>428</volume>
<fpage>493</fpage>
<lpage>521</lpage>
<pub-id pub-id-type="pmid">15057822</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Salzberg1"><label>13</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<name><surname>Yorke</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Beware of mis-assembled genomes.</article-title>
<source>Bioinformatics</source>
<volume>21</volume>
<fpage>133</fpage>
<lpage>154</lpage>
</element-citation>
</ref>
<ref id="pone.0001836-Schwartz1"><label>14</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name><surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<name><surname>Smit</surname>
<given-names>A</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name><surname>Baertsch</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2003</year>
<article-title>Human-Mouse Alignments with BLASTZ.</article-title>
<source>Genome Research</source>
<volume>13</volume>
<fpage>103</fpage>
<lpage>107</lpage>
<pub-id pub-id-type="pmid">12529312</pub-id>
</element-citation>
</ref>
<ref id="pone.0001836-Delcher1"><label>15</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name><surname>Kasif</surname>
<given-names>S</given-names>
</name>
<name><surname>Fleischmann</surname>
<given-names>RD</given-names>
</name>
<name><surname>Peterson</surname>
<given-names>J</given-names>
</name>
<name><surname>White</surname>
<given-names>O</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Alignment of whole genomes.</article-title>
<source>Nucleic Acids Res.</source>
<volume>27</volume>
<fpage>2369</fpage>
<lpage>2376</lpage>
<pub-id pub-id-type="pmid">10325427</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001058  | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001058  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri