Improving Phrap-based assembly of the rat using "reliable" overlaps.
Identifieur interne : 002117 ( PubMed/Curation ); précédent : 002116; suivant : 002118Improving Phrap-based assembly of the rat using "reliable" overlaps.
Auteurs : Michael Roberts [États-Unis] ; Aleksey V. Zimin ; Wayne Hayes ; Brian R. Hunt ; Cevat Ustun ; James R. White ; Paul Havlak ; James YorkeSource :
- PloS one [ 1932-6203 ] ; 2008.
Descripteurs français
- KwdFr :
- MESH :
- génétique : Rats.
- Animaux, Chromosomes artificiels de bactérie, Génome, Reproductibilité des résultats.
English descriptors
- KwdEn :
- MESH :
- genetics : Rats.
- Animals, Chromosomes, Artificial, Bacterial, Genome, Reproducibility of Results.
Abstract
The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of "reliable" overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our "reliable-overlap" algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.
DOI: 10.1371/journal.pone.0001836
PubMed: 18350171
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: Pour aller vers cette notice dans l'étape Curation :002117
Links to Exploration step
pubmed:18350171Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Phrap-based assembly of the rat using "reliable" overlaps.</title>
<author><name sortKey="Roberts, Michael" sort="Roberts, Michael" uniqKey="Roberts M" first="Michael" last="Roberts">Michael Roberts</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Zimin, Aleksey V" sort="Zimin, Aleksey V" uniqKey="Zimin A" first="Aleksey V" last="Zimin">Aleksey V. Zimin</name>
</author>
<author><name sortKey="Hayes, Wayne" sort="Hayes, Wayne" uniqKey="Hayes W" first="Wayne" last="Hayes">Wayne Hayes</name>
</author>
<author><name sortKey="Hunt, Brian R" sort="Hunt, Brian R" uniqKey="Hunt B" first="Brian R" last="Hunt">Brian R. Hunt</name>
</author>
<author><name sortKey="Ustun, Cevat" sort="Ustun, Cevat" uniqKey="Ustun C" first="Cevat" last="Ustun">Cevat Ustun</name>
</author>
<author><name sortKey="White, James R" sort="White, James R" uniqKey="White J" first="James R" last="White">James R. White</name>
</author>
<author><name sortKey="Havlak, Paul" sort="Havlak, Paul" uniqKey="Havlak P" first="Paul" last="Havlak">Paul Havlak</name>
</author>
<author><name sortKey="Yorke, James" sort="Yorke, James" uniqKey="Yorke J" first="James" last="Yorke">James Yorke</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2008">2008</date>
<idno type="RBID">pubmed:18350171</idno>
<idno type="pmid">18350171</idno>
<idno type="doi">10.1371/journal.pone.0001836</idno>
<idno type="wicri:Area/PubMed/Corpus">002117</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002117</idno>
<idno type="wicri:Area/PubMed/Curation">002117</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002117</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Improving Phrap-based assembly of the rat using "reliable" overlaps.</title>
<author><name sortKey="Roberts, Michael" sort="Roberts, Michael" uniqKey="Roberts M" first="Michael" last="Roberts">Michael Roberts</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Zimin, Aleksey V" sort="Zimin, Aleksey V" uniqKey="Zimin A" first="Aleksey V" last="Zimin">Aleksey V. Zimin</name>
</author>
<author><name sortKey="Hayes, Wayne" sort="Hayes, Wayne" uniqKey="Hayes W" first="Wayne" last="Hayes">Wayne Hayes</name>
</author>
<author><name sortKey="Hunt, Brian R" sort="Hunt, Brian R" uniqKey="Hunt B" first="Brian R" last="Hunt">Brian R. Hunt</name>
</author>
<author><name sortKey="Ustun, Cevat" sort="Ustun, Cevat" uniqKey="Ustun C" first="Cevat" last="Ustun">Cevat Ustun</name>
</author>
<author><name sortKey="White, James R" sort="White, James R" uniqKey="White J" first="James R" last="White">James R. White</name>
</author>
<author><name sortKey="Havlak, Paul" sort="Havlak, Paul" uniqKey="Havlak P" first="Paul" last="Havlak">Paul Havlak</name>
</author>
<author><name sortKey="Yorke, James" sort="Yorke, James" uniqKey="Yorke J" first="James" last="Yorke">James Yorke</name>
</author>
</analytic>
<series><title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2008" type="published">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Animals</term>
<term>Chromosomes, Artificial, Bacterial</term>
<term>Genome</term>
<term>Rats (genetics)</term>
<term>Reproducibility of Results</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Animaux</term>
<term>Chromosomes artificiels de bactérie</term>
<term>Génome</term>
<term>Rats (génétique)</term>
<term>Reproductibilité des résultats</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Rats</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Rats</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Animals</term>
<term>Chromosomes, Artificial, Bacterial</term>
<term>Genome</term>
<term>Reproducibility of Results</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Animaux</term>
<term>Chromosomes artificiels de bactérie</term>
<term>Génome</term>
<term>Reproductibilité des résultats</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of "reliable" overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our "reliable-overlap" algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" Owner="NLM"><PMID Version="1">18350171</PMID>
<DateCompleted><Year>2008</Year>
<Month>08</Month>
<Day>08</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic"><Journal><ISSN IssnType="Electronic">1932-6203</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>3</Volume>
<Issue>3</Issue>
<PubDate><Year>2008</Year>
<Month>Mar</Month>
<Day>19</Day>
</PubDate>
</JournalIssue>
<Title>PloS one</Title>
<ISOAbbreviation>PLoS ONE</ISOAbbreviation>
</Journal>
<ArticleTitle>Improving Phrap-based assembly of the rat using "reliable" overlaps.</ArticleTitle>
<Pagination><MedlinePgn>e1836</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pone.0001836</ELocationID>
<Abstract><AbstractText>The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of "reliable" overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our "reliable-overlap" algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Roberts</LastName>
<ForeName>Michael</ForeName>
<Initials>M</Initials>
<AffiliationInfo><Affiliation>Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Zimin</LastName>
<ForeName>Aleksey V</ForeName>
<Initials>AV</Initials>
</Author>
<Author ValidYN="Y"><LastName>Hayes</LastName>
<ForeName>Wayne</ForeName>
<Initials>W</Initials>
</Author>
<Author ValidYN="Y"><LastName>Hunt</LastName>
<ForeName>Brian R</ForeName>
<Initials>BR</Initials>
</Author>
<Author ValidYN="Y"><LastName>Ustun</LastName>
<ForeName>Cevat</ForeName>
<Initials>C</Initials>
</Author>
<Author ValidYN="Y"><LastName>White</LastName>
<ForeName>James R</ForeName>
<Initials>JR</Initials>
</Author>
<Author ValidYN="Y"><LastName>Havlak</LastName>
<ForeName>Paul</ForeName>
<Initials>P</Initials>
</Author>
<Author ValidYN="Y"><LastName>Yorke</LastName>
<ForeName>James</ForeName>
<Initials>J</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y"><Grant><GrantID>1R01HG0294501</GrantID>
<Acronym>HG</Acronym>
<Agency>NHGRI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2008</Year>
<Month>03</Month>
<Day>19</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>PLoS One</MedlineTA>
<NlmUniqueID>101285081</NlmUniqueID>
<ISSNLinking>1932-6203</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D022202" MajorTopicYN="N">Chromosomes, Artificial, Bacterial</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016678" MajorTopicYN="Y">Genome</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D051381" MajorTopicYN="N">Rats</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D015203" MajorTopicYN="N">Reproducibility of Results</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2007</Year>
<Month>10</Month>
<Day>15</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2008</Year>
<Month>02</Month>
<Day>09</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2008</Year>
<Month>3</Month>
<Day>20</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2008</Year>
<Month>8</Month>
<Day>9</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2008</Year>
<Month>3</Month>
<Day>20</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">18350171</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pone.0001836</ArticleId>
<ArticleId IdType="pmc">PMC2266800</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Science. 2000 Mar 24;287(5461):2196-204</Citation>
<ArticleIdList><ArticleId IdType="pubmed">10731133</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2002 Jan;12(1):177-89</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11779843</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Science. 2002 Aug 23;297(5585):1301-10</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12142439</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2003 Jan;13(1):81-90</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12529309</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2003 Jan;13(1):103-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12529312</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2003 Sep;13(9):2164-70</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12952883</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nucleic Acids Res. 1999 Jun 1;27(11):2369-76</Citation>
<ArticleIdList><ArticleId IdType="pubmed">10325427</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>J Comput Biol. 2004;11(4):734-52</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15579242</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2005 Dec 15;21(24):4320-1</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16332717</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2004 Apr;14(4):721-32</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15060016</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 1998 Mar;8(3):175-85</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9521921</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 1998 Mar;8(3):186-94</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9521922</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nature. 2004 Apr 1;428(6982):493-521</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15057822</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002117 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 002117 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Curation |type= RBID |clé= pubmed:18350171 |texte= Improving Phrap-based assembly of the rat using "reliable" overlaps. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i -Sk "pubmed:18350171" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |