Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SEQuel: improving the accuracy of genome assemblies

Identifieur interne : 000B07 ( Pmc/Curation ); précédent : 000B06; suivant : 000B08

SEQuel: improving the accuracy of genome assemblies

Auteurs : Roy Ronen ; Christina Boucher [États-Unis] ; Hamidreza Chitsaz [États-Unis] ; Pavel Pevzner [États-Unis]

Source :

RBID : PMC:3371851

Abstract

Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model.

Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly.

Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.

Contact:ppevzner@cs.ucsd.edu


Url:
DOI: 10.1093/bioinformatics/bts219
PubMed: 22689760
PubMed Central: 3371851

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3371851

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SEQuel: improving the accuracy of genome assemblies</title>
<author>
<name sortKey="Ronen, Roy" sort="Ronen, Roy" uniqKey="Ronen R" first="Roy" last="Ronen">Roy Ronen</name>
<affiliation>
<nlm:aff id="AFF1">Bioinformatics Graduate Program,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boucher, Christina" sort="Boucher, Christina" uniqKey="Boucher C" first="Christina" last="Boucher">Christina Boucher</name>
<affiliation wicri:level="2">
<nlm:aff wicri:cut=" and" id="AFF1">Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Department of Computer Science and Engineering, University of California, San Diego, La Jolla</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chitsaz, Hamidreza" sort="Chitsaz, Hamidreza" uniqKey="Chitsaz H" first="Hamidreza" last="Chitsaz">Hamidreza Chitsaz</name>
<affiliation wicri:level="1">
<nlm:aff id="AFF1">Department of Computer Science, Wayne State University, Detroit, MI 48202, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Wayne State University, Detroit, MI 48202</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pevzner, Pavel" sort="Pevzner, Pavel" uniqKey="Pevzner P" first="Pavel" last="Pevzner">Pavel Pevzner</name>
<affiliation wicri:level="2">
<nlm:aff wicri:cut=" and" id="AFF1">Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Department of Computer Science and Engineering, University of California, San Diego, La Jolla</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22689760</idno>
<idno type="pmc">3371851</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371851</idno>
<idno type="RBID">PMC:3371851</idno>
<idno type="doi">10.1093/bioinformatics/bts219</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000B07</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B07</idno>
<idno type="wicri:Area/Pmc/Curation">000B07</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B07</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">SEQuel: improving the accuracy of genome assemblies</title>
<author>
<name sortKey="Ronen, Roy" sort="Ronen, Roy" uniqKey="Ronen R" first="Roy" last="Ronen">Roy Ronen</name>
<affiliation>
<nlm:aff id="AFF1">Bioinformatics Graduate Program,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boucher, Christina" sort="Boucher, Christina" uniqKey="Boucher C" first="Christina" last="Boucher">Christina Boucher</name>
<affiliation wicri:level="2">
<nlm:aff wicri:cut=" and" id="AFF1">Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Department of Computer Science and Engineering, University of California, San Diego, La Jolla</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chitsaz, Hamidreza" sort="Chitsaz, Hamidreza" uniqKey="Chitsaz H" first="Hamidreza" last="Chitsaz">Hamidreza Chitsaz</name>
<affiliation wicri:level="1">
<nlm:aff id="AFF1">Department of Computer Science, Wayne State University, Detroit, MI 48202, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Wayne State University, Detroit, MI 48202</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pevzner, Pavel" sort="Pevzner, Pavel" uniqKey="Pevzner P" first="Pavel" last="Pevzner">Pavel Pevzner</name>
<affiliation wicri:level="2">
<nlm:aff wicri:cut=" and" id="AFF1">Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Department of Computer Science and Engineering, University of California, San Diego, La Jolla</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>
<bold>Motivation:</bold>
Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop
<italic>SEQuel</italic>
, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the
<italic>positional de Bruijn graph</italic>
, a graph structure that models
<italic>k</italic>
-mers within reads while incorporating the approximate positions of reads into the model.</p>
<p>
<bold>Results:</bold>
SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell
<italic>Escherichia coli</italic>
data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium
<italic>SAR324</italic>
genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly.</p>
<p>
<bold>Availability:</bold>
SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at
<ext-link ext-link-type="uri" xlink:href="http://bix.ucsd.edu/SEQuel/">http://bix.ucsd.edu/SEQuel/</ext-link>
.</p>
<p>
<bold>Contact:</bold>
<email>ppevzner@cs.ucsd.edu</email>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Alkan, S" uniqKey="Alkan S">S. Alkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A. Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bentley, D R" uniqKey="Bentley D">D.R. Bentley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Butler, J" uniqKey="Butler J">J. Butler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chitsaz, H" uniqKey="Chitsaz H">H. Chitsaz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Compeau, P E C" uniqKey="Compeau P">P.E.C. Compeau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Depristo, M" uniqKey="Depristo M">M. DePristo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Donmez, N" uniqKey="Donmez N">N. Donmez</name>
</author>
<author>
<name sortKey="Brudno, M" uniqKey="Brudno M">M. Brudno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ewing, B" uniqKey="Ewing B">B. Ewing</name>
</author>
<author>
<name sortKey="Green, P" uniqKey="Green P">P. Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ewing, B" uniqKey="Ewing B">B. Ewing</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mckenna, A" uniqKey="Mckenna A">A. McKenna</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hannenhalli, S" uniqKey="Hannenhalli S">S. Hannenhalli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hirschberg, D S" uniqKey="Hirschberg D">D.S. Hirschberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, S" uniqKey="Huang S">S. Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Idury, R M" uniqKey="Idury R">R.M. Idury</name>
</author>
<author>
<name sortKey="Waterman, M S" uniqKey="Waterman M">M.S. Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kececioglu, J" uniqKey="Kececioglu J">J. Kececioglu</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J. Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelley, D R" uniqKey="Kelley D">D.R. Kelley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, W J" uniqKey="Kent W">W.J. Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klein, J D" uniqKey="Klein J">J.D. Klein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R. Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, G" uniqKey="Myers G">G. Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P. Medvedev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Genome, 10k Community Of Scientists" uniqKey="Genome 1">10K Community of Scientists. Genome</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, P" uniqKey="Pevzner P">P. Pevzner</name>
</author>
<author>
<name sortKey="Chaisson, M" uniqKey="Chaisson M">M. Chaisson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, P" uniqKey="Pevzner P">P. Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, P A" uniqKey="Pevzner P">P.A. Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raghunathan, A" uniqKey="Raghunathan A">A. Raghunathan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robinson, G E" uniqKey="Robinson G">G.E. Robinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodrigue, S" uniqKey="Rodrigue S">S. Rodrigue</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, J T" uniqKey="Simpson J">J.T. Simpson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tammi, M T" uniqKey="Tammi M">M.T. Tammi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, D A" uniqKey="Wheeler D">D.A. Wheeler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, D R" uniqKey="Zerbino D">D.R. Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E. Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhi, D" uniqKey="Zhi D">D. Zhi</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-id journal-id-type="hwp">bioinfo</journal-id>
<journal-title-group>
<journal-title>Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1367-4811</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22689760</article-id>
<article-id pub-id-type="pmc">3371851</article-id>
<article-id pub-id-type="doi">10.1093/bioinformatics/bts219</article-id>
<article-id pub-id-type="publisher-id">bts219</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa</subject>
</subj-group>
<subj-group>
<subject>Original Papers</subject>
<subj-group>
<subject>Sequence Analysis</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>SEQuel: improving the accuracy of genome assemblies</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ronen</surname>
<given-names>Roy</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="FN1">
<sup></sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Boucher</surname>
<given-names>Christina</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
<xref ref-type="author-notes" rid="FN1">
<sup></sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chitsaz</surname>
<given-names>Hamidreza</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pevzner</surname>
<given-names>Pavel</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="COR1">
<sup>*</sup>
</xref>
</contrib>
</contrib-group>
<aff id="AFF1">
<sup>1</sup>
Bioinformatics Graduate Program,
<sup>2</sup>
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093 and
<sup>3</sup>
Department of Computer Science, Wayne State University, Detroit, MI 48202, USA</aff>
<author-notes>
<corresp id="COR1">* To whom correspondence should be addressed.</corresp>
<fn id="FN1">
<p>
<sup></sup>
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<day>15</day>
<month>6</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>9</day>
<month>6</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>9</day>
<month>6</month>
<year>2012</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>28</volume>
<issue>12</issue>
<fpage>i188</fpage>
<lpage>i196</lpage>
<permissions>
<copyright-statement>© The Author(s) 2012. Published by Oxford University Press.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">http://creativecommons.org/licenses/by-nc/3.0</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>
<bold>Motivation:</bold>
Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop
<italic>SEQuel</italic>
, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the
<italic>positional de Bruijn graph</italic>
, a graph structure that models
<italic>k</italic>
-mers within reads while incorporating the approximate positions of reads into the model.</p>
<p>
<bold>Results:</bold>
SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell
<italic>Escherichia coli</italic>
data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium
<italic>SAR324</italic>
genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly.</p>
<p>
<bold>Availability:</bold>
SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at
<ext-link ext-link-type="uri" xlink:href="http://bix.ucsd.edu/SEQuel/">http://bix.ucsd.edu/SEQuel/</ext-link>
.</p>
<p>
<bold>Contact:</bold>
<email>ppevzner@cs.ucsd.edu</email>
</p>
</abstract>
<counts>
<page-count count="9"></page-count>
</counts>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B07 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000B07 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:3371851
   |texte=   SEQuel: improving the accuracy of genome assemblies
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:22689760" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021