Accurate self-correction of errors in long reads using de Bruijn graphs
Identifieur interne : 001653 ( Ncbi/Checkpoint ); précédent : 001652; suivant : 001654Accurate self-correction of errors in long reads using de Bruijn graphs
Auteurs : Leena Salmela [Finlande] ; Riku Walve [Finlande] ; Eric Rivals [France] ; Esko Ukkonen [Finlande]Source :
- Bioinformatics [ 1367-4803 ] ; 2016.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- genetics : Escherichia coli, Saccharomyces cerevisiae.
- methods : High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA.
- Algorithms, Genome, Software.
Abstract
New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g.
We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of
LoRMA is freely available at
Url:
DOI: 10.1093/bioinformatics/btw321
PubMed: 27273673
PubMed Central: 5351550
Affiliations:
- Finlande, France
- Languedoc-Roussillon, Occitanie (région administrative), Uusimaa
- Helsinki, Montpellier
- Université d'Helsinki
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000B15
- to stream Pmc, to step Curation: 000B15
- to stream Pmc, to step Checkpoint: 000C58
- to stream PubMed, to step Corpus: 001091
- to stream PubMed, to step Curation: 001091
- to stream PubMed, to step Checkpoint: 000E00
- to stream Ncbi, to step Merge: 001653
- to stream Ncbi, to step Curation: 001653
Links to Exploration step
PMC:5351550Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Accurate self-correction of errors in long reads using de Bruijn graphs</title>
<author><name sortKey="Salmela, Leena" sort="Salmela, Leena" uniqKey="Salmela L" first="Leena" last="Salmela">Leena Salmela</name>
<affiliation wicri:level="4"><nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName><settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Walve, Riku" sort="Walve, Riku" uniqKey="Walve R" first="Riku" last="Walve">Riku Walve</name>
<affiliation wicri:level="4"><nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName><settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
<affiliation wicri:level="3"><nlm:aff id="btw321-aff2">LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier, France</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea>LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier</wicri:regionArea>
<placeName><region type="region">Occitanie (région administrative)</region>
<region type="old region">Languedoc-Roussillon</region>
<settlement type="city">Montpellier</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Ukkonen, Esko" sort="Ukkonen, Esko" uniqKey="Ukkonen E" first="Esko" last="Ukkonen">Esko Ukkonen</name>
<affiliation wicri:level="4"><nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName><settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27273673</idno>
<idno type="pmc">5351550</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5351550</idno>
<idno type="RBID">PMC:5351550</idno>
<idno type="doi">10.1093/bioinformatics/btw321</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B15</idno>
<idno type="wicri:Area/Pmc/Curation">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B15</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000C58</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000C58</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:27273673</idno>
<idno type="wicri:Area/PubMed/Corpus">001091</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001091</idno>
<idno type="wicri:Area/PubMed/Curation">001091</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001091</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000E00</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000E00</idno>
<idno type="wicri:Area/Ncbi/Merge">001653</idno>
<idno type="wicri:Area/Ncbi/Curation">001653</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001653</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Accurate self-correction of errors in long reads using de Bruijn graphs</title>
<author><name sortKey="Salmela, Leena" sort="Salmela, Leena" uniqKey="Salmela L" first="Leena" last="Salmela">Leena Salmela</name>
<affiliation wicri:level="4"><nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName><settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Walve, Riku" sort="Walve, Riku" uniqKey="Walve R" first="Riku" last="Walve">Riku Walve</name>
<affiliation wicri:level="4"><nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName><settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
<affiliation wicri:level="3"><nlm:aff id="btw321-aff2">LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier, France</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea>LIRMM and Institut de Biologie Computationelle, CNRS and Université Montpellier, Montpellier</wicri:regionArea>
<placeName><region type="region">Occitanie (région administrative)</region>
<region type="old region">Languedoc-Roussillon</region>
<settlement type="city">Montpellier</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Ukkonen, Esko" sort="Ukkonen, Esko" uniqKey="Ukkonen E" first="Esko" last="Ukkonen">Esko Ukkonen</name>
<affiliation wicri:level="4"><nlm:aff id="btw321-aff1">Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki</wicri:regionArea>
<orgName type="university">Université d'Helsinki</orgName>
<placeName><settlement type="city">Helsinki</settlement>
<region type="région" nuts="2">Uusimaa</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Escherichia coli (genetics)</term>
<term>Genome</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Saccharomyces cerevisiae (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Escherichia coli (génétique)</term>
<term>Génome</term>
<term>Logiciel</term>
<term>Saccharomyces cerevisiae (génétique)</term>
<term>Séquençage nucléotidique à haut débit ()</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Escherichia coli</term>
<term>Saccharomyces cerevisiae</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Escherichia coli</term>
<term>Saccharomyces cerevisiae</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Genome</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génome</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Abstract</title>
<sec id="SA1"><title>Motivation</title>
<p>New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. <italic>de novo</italic>
genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.</p>
</sec>
<sec id="SA2"><title>Results</title>
<p>We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of <italic>k</italic>
-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.</p>
</sec>
<sec id="SA3"><title>Availability and Implementation</title>
<p>LoRMA is freely available at <ext-link ext-link-type="uri" xlink:href="http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/">http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/</ext-link>
.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Au, K F" uniqKey="Au K">K.F. Au</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bankevich, A" uniqKey="Bankevich A">A. Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berlin, K" uniqKey="Berlin K">K. Berlin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Boucher, C" uniqKey="Boucher C">C. Boucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cazaux, B" uniqKey="Cazaux B">B. Cazaux</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chaisson, M J" uniqKey="Chaisson M">M.J. Chaisson</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G. Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chin, C S" uniqKey="Chin C">C.S. Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Drezen, E" uniqKey="Drezen E">E. Drezen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hackl, T" uniqKey="Hackl T">T. Hackl</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koren, S" uniqKey="Koren S">S. Koren</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koren, S" uniqKey="Koren S">S. Koren</name>
</author>
<author><name sortKey="Philippy, A M" uniqKey="Philippy A">A.M. Philippy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Laehnemann, D" uniqKey="Laehnemann D">D. Laehnemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Laver, T" uniqKey="Laver T">T. Laver</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lee, C" uniqKey="Lee C">C. Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Madoui, M A" uniqKey="Madoui M">M.A. Madoui</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miclotte, G" uniqKey="Miclotte G">G. Miclotte</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nakamura, K" uniqKey="Nakamura K">K. Nakamura</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ono, Y" uniqKey="Ono Y">Y. Ono</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Peng, Y" uniqKey="Peng Y">Y. Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L. Salmela</name>
</author>
<author><name sortKey="Rivals, E" uniqKey="Rivals E">E. Rivals</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L. Salmela</name>
</author>
<author><name sortKey="Schroder, J" uniqKey="Schroder J">J. Schröder</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schirmer, M" uniqKey="Schirmer M">M. Schirmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yang, X" uniqKey="Yang X">X. Yang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>Finlande</li>
<li>France</li>
</country>
<region><li>Languedoc-Roussillon</li>
<li>Occitanie (région administrative)</li>
<li>Uusimaa</li>
</region>
<settlement><li>Helsinki</li>
<li>Montpellier</li>
</settlement>
<orgName><li>Université d'Helsinki</li>
</orgName>
</list>
<tree><country name="Finlande"><region name="Uusimaa"><name sortKey="Salmela, Leena" sort="Salmela, Leena" uniqKey="Salmela L" first="Leena" last="Salmela">Leena Salmela</name>
</region>
<name sortKey="Ukkonen, Esko" sort="Ukkonen, Esko" uniqKey="Ukkonen E" first="Esko" last="Ukkonen">Esko Ukkonen</name>
<name sortKey="Walve, Riku" sort="Walve, Riku" uniqKey="Walve R" first="Riku" last="Walve">Riku Walve</name>
</country>
<country name="France"><region name="Occitanie (région administrative)"><name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
</region>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001653 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd -nk 001653 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Ncbi |étape= Checkpoint |type= RBID |clé= PMC:5351550 |texte= Accurate self-correction of errors in long reads using de Bruijn graphs }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/RBID.i -Sk "pubmed:27273673" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |