Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SEQuel: improving the accuracy of genome assemblies.

Identifieur interne : 001D75 ( PubMed/Corpus ); précédent : 001D74; suivant : 001D76

SEQuel: improving the accuracy of genome assemblies.

Auteurs : Roy Ronen ; Christina Boucher ; Hamidreza Chitsaz ; Pavel Pevzner

Source :

RBID : pubmed:22689760

English descriptors

Abstract

Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model.

DOI: 10.1093/bioinformatics/bts219
PubMed: 22689760

Links to Exploration step

pubmed:22689760

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SEQuel: improving the accuracy of genome assemblies.</title>
<author>
<name sortKey="Ronen, Roy" sort="Ronen, Roy" uniqKey="Ronen R" first="Roy" last="Ronen">Roy Ronen</name>
<affiliation>
<nlm:affiliation>Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Boucher, Christina" sort="Boucher, Christina" uniqKey="Boucher C" first="Christina" last="Boucher">Christina Boucher</name>
</author>
<author>
<name sortKey="Chitsaz, Hamidreza" sort="Chitsaz, Hamidreza" uniqKey="Chitsaz H" first="Hamidreza" last="Chitsaz">Hamidreza Chitsaz</name>
</author>
<author>
<name sortKey="Pevzner, Pavel" sort="Pevzner, Pavel" uniqKey="Pevzner P" first="Pavel" last="Pevzner">Pavel Pevzner</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="RBID">pubmed:22689760</idno>
<idno type="pmid">22689760</idno>
<idno type="doi">10.1093/bioinformatics/bts219</idno>
<idno type="wicri:Area/PubMed/Corpus">001D75</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001D75</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">SEQuel: improving the accuracy of genome assemblies.</title>
<author>
<name sortKey="Ronen, Roy" sort="Ronen, Roy" uniqKey="Ronen R" first="Roy" last="Ronen">Roy Ronen</name>
<affiliation>
<nlm:affiliation>Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Boucher, Christina" sort="Boucher, Christina" uniqKey="Boucher C" first="Christina" last="Boucher">Christina Boucher</name>
</author>
<author>
<name sortKey="Chitsaz, Hamidreza" sort="Chitsaz, Hamidreza" uniqKey="Chitsaz H" first="Hamidreza" last="Chitsaz">Hamidreza Chitsaz</name>
</author>
<author>
<name sortKey="Pevzner, Pavel" sort="Pevzner, Pavel" uniqKey="Pevzner P" first="Pavel" last="Pevzner">Pavel Pevzner</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Contig Mapping</term>
<term>Escherichia coli (genetics)</term>
<term>Genome, Bacterial</term>
<term>INDEL Mutation</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Escherichia coli</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Contig Mapping</term>
<term>Genome, Bacterial</term>
<term>INDEL Mutation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">22689760</PMID>
<DateCompleted>
<Year>2013</Year>
<Month>01</Month>
<Day>31</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>01</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>28</Volume>
<Issue>12</Issue>
<PubDate>
<Year>2012</Year>
<Month>Jun</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>SEQuel: improving the accuracy of genome assemblies.</ArticleTitle>
<Pagination>
<MedlinePgn>i188-96</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/bts219</ELocationID>
<Abstract>
<AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly.</AbstractText>
<AbstractText Label="AVAILABILITY" NlmCategory="BACKGROUND">SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Ronen</LastName>
<ForeName>Roy</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Boucher</LastName>
<ForeName>Christina</ForeName>
<Initials>C</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Chitsaz</LastName>
<ForeName>Hamidreza</ForeName>
<Initials>H</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Pevzner</LastName>
<ForeName>Pavel</ForeName>
<Initials>P</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>P41 RR024851</GrantID>
<Acronym>RR</Acronym>
<Agency>NCRR NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>3P41RR024851-02S1</GrantID>
<Acronym>RR</Acronym>
<Agency>NCRR NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020451" MajorTopicYN="N">Contig Mapping</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004926" MajorTopicYN="N">Escherichia coli</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016680" MajorTopicYN="Y">Genome, Bacterial</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D054643" MajorTopicYN="N">INDEL Mutation</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2012</Year>
<Month>6</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2012</Year>
<Month>6</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2013</Year>
<Month>2</Month>
<Day>1</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">22689760</ArticleId>
<ArticleId IdType="pii">bts219</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/bts219</ArticleId>
<ArticleId IdType="pmc">PMC3371851</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>J Comput Biol. 1995 Summer;2(2):291-306</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7497130</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2004 Sep;14(9):1786-96</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15342561</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 1998 Mar;8(3):175-85</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9521921</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 1998 Mar;8(3):186-94</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9521922</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Appl Environ Microbiol. 2005 Jun;71(6):3342-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15933038</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):54-64</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17277413</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2008 Feb;18(2):324-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18083777</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2008 Apr 17;452(7189):872-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18421352</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2008 May;18(5):810-20</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18340039</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2008 May;18(5):821-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18349386</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2008 Nov 6;456(7218):53-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18987734</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2009 Jun;19(6):1117-23</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19251739</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2009 Jul 15;25(14):1754-60</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19451168</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2009;4(9):e6864</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19724646</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Hered. 2009 Nov-Dec;100(6):659-74</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19892720</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Genet. 2009 Dec;41(12):1275-81</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19881527</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2010 Jan 21;463(7279):311-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20010809</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2010 Feb;20(2):265-72</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20019144</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2010 Sep;20(9):1297-303</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20644199</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2011 Jan;8(1):61-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21102452</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2010;11(11):R116</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21114842</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2011 Mar 18;331(6023):1386</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21415334</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Genet. 2011 May;43(5):491-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21478889</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Jul 1;27(13):i137-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21685062</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2011;6(8):e23455</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21858125</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2011 Oct;29(10):915-21</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21926975</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2011 Nov;29(11):987-91</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22068540</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2012 May;19(5):455-77</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22506599</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11504945</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Inform. 2001;12:165-74</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11791235</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2002 Apr;12(4):656-64</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11932250</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2002 Mar;18(3):379-88</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11934736</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Comput Appl Biosci. 1996 Feb;12(1):19-24</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">8670615</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D75 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001D75 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:22689760
   |texte=   SEQuel: improving the accuracy of genome assemblies.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:22689760" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021