Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

QuorUM: An Error Corrector for Illumina Reads

Identifieur interne : 001008 ( Pmc/Curation ); précédent : 001007; suivant : 001009

QuorUM: An Error Corrector for Illumina Reads

Auteurs : Guillaume Marçais ; James A. Yorke ; Aleksey Zimin

Source :

RBID : PMC:4471408

Abstract

Motivation

Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.

Results

We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Availability

QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu.

Contact

gmarcais@umd.edu.


Url:
DOI: 10.1371/journal.pone.0130821
PubMed: 26083032
PubMed Central: 4471408

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4471408

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">QuorUM: An Error Corrector for Illumina Reads</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation>
<nlm:aff id="aff001"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yorke, James A" sort="Yorke, James A" uniqKey="Yorke J" first="James A." last="Yorke">James A. Yorke</name>
<affiliation>
<nlm:aff id="aff001"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zimin, Aleksey" sort="Zimin, Aleksey" uniqKey="Zimin A" first="Aleksey" last="Zimin">Aleksey Zimin</name>
<affiliation>
<nlm:aff id="aff001"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26083032</idno>
<idno type="pmc">4471408</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4471408</idno>
<idno type="RBID">PMC:4471408</idno>
<idno type="doi">10.1371/journal.pone.0130821</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">001008</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001008</idno>
<idno type="wicri:Area/Pmc/Curation">001008</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001008</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">QuorUM: An Error Corrector for Illumina Reads</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation>
<nlm:aff id="aff001"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yorke, James A" sort="Yorke, James A" uniqKey="Yorke J" first="James A." last="Yorke">James A. Yorke</name>
<affiliation>
<nlm:aff id="aff001"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zimin, Aleksey" sort="Zimin, Aleksey" uniqKey="Zimin A" first="Aleksey" last="Zimin">Aleksey Zimin</name>
<affiliation>
<nlm:aff id="aff001"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec id="sec001">
<title>Motivation</title>
<p>Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous
<italic>k</italic>
-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous
<italic>k</italic>
-mers in the output reads and preserving the most true
<italic>k</italic>
-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.</p>
</sec>
<sec id="sec002">
<title>Results</title>
<p>We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.</p>
</sec>
<sec id="sec003">
<title>Availability</title>
<p>QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at
<ext-link ext-link-type="uri" xlink:href="http://www.genome.umd.edu">http://www.genome.umd.edu</ext-link>
.</p>
</sec>
<sec id="sec004">
<title>Contact</title>
<p>
<email>gmarcais@umd.edu</email>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Qian, W" uniqKey="Qian W">W Qian</name>
</author>
<author>
<name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
<author>
<name sortKey="Shi, Z" uniqKey="Shi Z">Z Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I Maccallum</name>
</author>
<author>
<name sortKey="Przybylski, D" uniqKey="Przybylski D">D Przybylski</name>
</author>
<author>
<name sortKey="Ribeiro, Fj" uniqKey="Ribeiro F">FJ Ribeiro</name>
</author>
<author>
<name sortKey="Burton, Jn" uniqKey="Burton J">JN Burton</name>
</author>
<author>
<name sortKey="Walker, Bj" uniqKey="Walker B">BJ Walker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Phillippy, Am" uniqKey="Phillippy A">AM Phillippy</name>
</author>
<author>
<name sortKey="Zimin, Av" uniqKey="Zimin A">AV Zimin</name>
</author>
<author>
<name sortKey="Puiu, D" uniqKey="Puiu D">D Puiu</name>
</author>
<author>
<name sortKey="Magoc, T" uniqKey="Magoc T">T Magoc</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magoc, T" uniqKey="Magoc T">T Magoc</name>
</author>
<author>
<name sortKey="Pabinger, S" uniqKey="Pabinger S">S Pabinger</name>
</author>
<author>
<name sortKey="Canzar, S" uniqKey="Canzar S">S Canzar</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Su, Q" uniqKey="Su Q">Q Su</name>
</author>
<author>
<name sortKey="Puiu, D" uniqKey="Puiu D">D Puiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zimin, Av" uniqKey="Zimin A">AV Zimin</name>
</author>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Puiu, D" uniqKey="Puiu D">D Puiu</name>
</author>
<author>
<name sortKey="Roberts, M" uniqKey="Roberts M">M Roberts</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Yorke, Ja" uniqKey="Yorke J">JA Yorke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ilie, L" uniqKey="Ilie L">L Ilie</name>
</author>
<author>
<name sortKey="Fazayeli, F" uniqKey="Fazayeli F">F Fazayeli</name>
</author>
<author>
<name sortKey="Ilie, S" uniqKey="Ilie S">S Ilie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kao, Wc" uniqKey="Kao W">WC Kao</name>
</author>
<author>
<name sortKey="Chan, Ah" uniqKey="Chan A">AH Chan</name>
</author>
<author>
<name sortKey="Song, Ys" uniqKey="Song Y">YS Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salmela, L" uniqKey="Salmela L">L Salmela</name>
</author>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ilie, L" uniqKey="Ilie L">L Ilie</name>
</author>
<author>
<name sortKey="Molnar, M" uniqKey="Molnar M">M Molnar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Schrder, J" uniqKey="Schrder J">J Schrder</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelley, Dr" uniqKey="Kelley D">DR Kelley</name>
</author>
<author>
<name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, X" uniqKey="Zhao X">X Zhao</name>
</author>
<author>
<name sortKey="Palmer, Le" uniqKey="Palmer L">LE Palmer</name>
</author>
<author>
<name sortKey="Bolanos, R" uniqKey="Bolanos R">R Bolanos</name>
</author>
<author>
<name sortKey="Mircean, C" uniqKey="Mircean C">C Mircean</name>
</author>
<author>
<name sortKey="Fasulo, D" uniqKey="Fasulo D">D Fasulo</name>
</author>
<author>
<name sortKey="Wittenberg, Gm" uniqKey="Wittenberg G">GM Wittenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, H" uniqKey="Shi H">H Shi</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
<author>
<name sortKey="Liu, W" uniqKey="Liu W">W Liu</name>
</author>
<author>
<name sortKey="Mller Wittig, W" uniqKey="Mller Wittig W">W Mller-Wittig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, X" uniqKey="Yang X">X Yang</name>
</author>
<author>
<name sortKey="Chockalingam, Sp" uniqKey="Chockalingam S">SP Chockalingam</name>
</author>
<author>
<name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bentley, Dr" uniqKey="Bentley D">DR Bentley</name>
</author>
<author>
<name sortKey="Balasubramanian, S" uniqKey="Balasubramanian S">S Balasubramanian</name>
</author>
<author>
<name sortKey="Swerdlow, Hp" uniqKey="Swerdlow H">HP Swerdlow</name>
</author>
<author>
<name sortKey="Smith, Gp" uniqKey="Smith G">GP Smith</name>
</author>
<author>
<name sortKey="Milton, J" uniqKey="Milton J">J Milton</name>
</author>
<author>
<name sortKey="Brown, Cg" uniqKey="Brown C">CG Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mackenzie, C" uniqKey="Mackenzie C">C Mackenzie</name>
</author>
<author>
<name sortKey="Choudhary, M" uniqKey="Choudhary M">M Choudhary</name>
</author>
<author>
<name sortKey="Larimer, Fw" uniqKey="Larimer F">FW Larimer</name>
</author>
<author>
<name sortKey="Predki, Pf" uniqKey="Predki P">PF Predki</name>
</author>
<author>
<name sortKey="Stilwagen, S" uniqKey="Stilwagen S">S Stilwagen</name>
</author>
<author>
<name sortKey="Armitage, Jp" uniqKey="Armitage J">JP Armitage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waterston, Rh" uniqKey="Waterston R">RH Waterston</name>
</author>
<author>
<name sortKey="Lindblad Toh, K" uniqKey="Lindblad Toh K">K Lindblad-Toh</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
<author>
<name sortKey="Rogers, J" uniqKey="Rogers J">J Rogers</name>
</author>
<author>
<name sortKey="Abril, Jf" uniqKey="Abril J">JF Abril</name>
</author>
<author>
<name sortKey="Agarwal, P" uniqKey="Agarwal P">P Agarwal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zimin, A" uniqKey="Zimin A">A Zimin</name>
</author>
<author>
<name sortKey="Stevens, Ka" uniqKey="Stevens K">KA Stevens</name>
</author>
<author>
<name sortKey="Crepeau, Mw" uniqKey="Crepeau M">MW Crepeau</name>
</author>
<author>
<name sortKey="Holtz Morris, A" uniqKey="Holtz Morris A">A Holtz-Morris</name>
</author>
<author>
<name sortKey="Koriabine, M" uniqKey="Koriabine M">M Koriabine</name>
</author>
<author>
<name sortKey="Marais, G" uniqKey="Marais G">G Marais</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Xie, Y" uniqKey="Xie Y">Y Xie</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Yuan, J" uniqKey="Yuan J">J Yuan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurevich, A" uniqKey="Gurevich A">A Gurevich</name>
</author>
<author>
<name sortKey="Saveliev, V" uniqKey="Saveliev V">V Saveliev</name>
</author>
<author>
<name sortKey="Vyahhi, N" uniqKey="Vyahhi N">N Vyahhi</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26083032</article-id>
<article-id pub-id-type="pmc">4471408</article-id>
<article-id pub-id-type="publisher-id">PONE-D-14-42333</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0130821</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>QuorUM: An Error Corrector for Illumina Reads</article-title>
<alt-title alt-title-type="running-head">QuorUM</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Marçais</surname>
<given-names>Guillaume</given-names>
</name>
<xref ref-type="corresp" rid="cor001">*</xref>
<xref ref-type="aff" rid="aff001"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yorke</surname>
<given-names>James A.</given-names>
</name>
<xref ref-type="aff" rid="aff001"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zimin</surname>
<given-names>Aleksey</given-names>
</name>
<xref ref-type="corresp" rid="cor001">*</xref>
<xref ref-type="aff" rid="aff001"></xref>
</contrib>
</contrib-group>
<aff id="aff001">
<addr-line>IPST, University of Maryland, College Park, MD, USA</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Gibas</surname>
<given-names>Cynthia</given-names>
</name>
<role>Academic Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>University of North Carolina at Charlotte, UNITED STATES</addr-line>
</aff>
<author-notes>
<fn fn-type="COI-statement" id="coi001">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: GM AZ JY. Performed the experiments: GM AZ. Analyzed the data: GM AZ. Wrote the paper: GM AZ JY.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>alekseyz@ipst.umd.edu</email>
(AZ),
<email>gmarcais@umd.edu</email>
(GM)</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>17</day>
<month>6</month>
<year>2015</year>
</pub-date>
<volume>10</volume>
<issue>6</issue>
<elocation-id>e0130821</elocation-id>
<history>
<date date-type="received">
<day>19</day>
<month>9</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>5</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© 2015 Marçais et al</copyright-statement>
<copyright-year>2015</copyright-year>
<copyright-holder>Marçais et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0130821.pdf"></self-uri>
<abstract>
<sec id="sec001">
<title>Motivation</title>
<p>Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous
<italic>k</italic>
-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous
<italic>k</italic>
-mers in the output reads and preserving the most true
<italic>k</italic>
-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.</p>
</sec>
<sec id="sec002">
<title>Results</title>
<p>We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.</p>
</sec>
<sec id="sec003">
<title>Availability</title>
<p>QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at
<ext-link ext-link-type="uri" xlink:href="http://www.genome.umd.edu">http://www.genome.umd.edu</ext-link>
.</p>
</sec>
<sec id="sec004">
<title>Contact</title>
<p>
<email>gmarcais@umd.edu</email>
.</p>
</sec>
</abstract>
<funding-group>
<funding-statement>This project was supported by Agriculture and Food Research Initiative Competitive Grant no. 2008-04049 and 2010-15739-01 from the USDA National Institute of Food and Agriculture and Grant R01HG002945 from the National Institutes of Health.</funding-statement>
</funding-group>
<counts>
<fig-count count="1"></fig-count>
<table-count count="6"></table-count>
<page-count count="13"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>All sequencing data is available from the NIH SRA database
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra">http://www.ncbi.nlm.nih.gov/sra</ext-link>
(SRR081522, SRR022868, SRP001314).</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>All sequencing data is available from the NIH SRA database
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra">http://www.ncbi.nlm.nih.gov/sra</ext-link>
(SRR081522, SRR022868, SRP001314).</p>
</notes>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001008 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 001008 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4471408
   |texte=   QuorUM: An Error Corrector for Illumina Reads
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:26083032" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021