MersV1, Pmc, Curation, bibRecord, 000371

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads

Identifieur interne : 000371 ( Pmc/Curation ); précédent : 000370; suivant : 000372

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads

Auteurs : Li Song [États-Unis] ; Liliana Florea [États-Unis]

Source :

GigaScience [ 2047-217X ] ; 2015.

RBID : PMC:4615873

Abstract

Background

Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing.

Findings

We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read.

Conclusions

Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

Electronic supplementary material

The online version of this article (doi:10.1186/s13742-015-0089-y) contains supplementary material, which is available to authorized users.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4615873

DOI: 10.1186/s13742-015-0089-y
PubMed: 26500767
PubMed Central: 4615873

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000371

Links to Exploration step

PMC:4615873

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads</title>
<author><name sortKey="Song, Li" sort="Song, Li" uniqKey="Song L" first="Li" last="Song">Li Song</name>
<affiliation wicri:level="1"><nlm:aff id="Aff1">Department of Computer Science, Johns Hopkins University, Baltimore, 21218 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Johns Hopkins University, Baltimore</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Florea, Liliana" sort="Florea, Liliana" uniqKey="Florea L" first="Liliana" last="Florea">Liliana Florea</name>
<affiliation wicri:level="1"><nlm:aff id="Aff1">Department of Computer Science, Johns Hopkins University, Baltimore, 21218 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Johns Hopkins University, Baltimore</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="Aff2">McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, 21205 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26500767</idno>
<idno type="pmc">4615873</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4615873</idno>
<idno type="RBID">PMC:4615873</idno>
<idno type="doi">10.1186/s13742-015-0089-y</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000371</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000371</idno>
<idno type="wicri:Area/Pmc/Curation">000371</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000371</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads</title>
<author><name sortKey="Song, Li" sort="Song, Li" uniqKey="Song L" first="Li" last="Song">Li Song</name>
<affiliation wicri:level="1"><nlm:aff id="Aff1">Department of Computer Science, Johns Hopkins University, Baltimore, 21218 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Johns Hopkins University, Baltimore</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Florea, Liliana" sort="Florea, Liliana" uniqKey="Florea L" first="Liliana" last="Florea">Liliana Florea</name>
<affiliation wicri:level="1"><nlm:aff id="Aff1">Department of Computer Science, Johns Hopkins University, Baltimore, 21218 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Johns Hopkins University, Baltimore</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="Aff2">McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, 21205 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">GigaScience</title>
<idno type="eISSN">2047-217X</idno>
<imprint><date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing.</p>
</sec>
<sec><title>Findings</title>
<p>We developed a <italic>k</italic>
-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted <italic>k</italic>
-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted <italic>k</italic>
-mers, Rcorrector computes a local threshold at every position in a read.</p>
</sec>
<sec><title>Conclusions</title>
<p>Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from <ext-link ext-link-type="uri" xlink:href="https://github.com/mourisl/Rcorrector/">https://github.com/mourisl/Rcorrector/</ext-link>
.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s13742-015-0089-y) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Heo, Y" uniqKey="Heo Y">Y Heo</name>
</author>
<author><name sortKey="Wu, Xl" uniqKey="Wu X">XL Wu</name>
</author>
<author><name sortKey="Chen, D" uniqKey="Chen D">D Chen</name>
</author>
<author><name sortKey="Ma, J" uniqKey="Ma J">J Ma</name>
</author>
<author><name sortKey="Hwu, Wm" uniqKey="Hwu W">WM Hwu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Song, L" uniqKey="Song L">L Song</name>
</author>
<author><name sortKey="Florea, L" uniqKey="Florea L">L Florea</name>
</author>
<author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yang, X" uniqKey="Yang X">X Yang</name>
</author>
<author><name sortKey="Chockalingam, Sp" uniqKey="Chockalingam S">SP Chockalingam</name>
</author>
<author><name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kelley, D" uniqKey="Kelley D">D Kelley</name>
</author>
<author><name sortKey="Schatz, M" uniqKey="Schatz M">M Schatz</name>
</author>
<author><name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
<author><name sortKey="Scott, E" uniqKey="Scott E">E Scott</name>
</author>
<author><name sortKey="Kakaradov, B" uniqKey="Kakaradov B">B Kakaradov</name>
</author>
<author><name sortKey="Pevzner, P" uniqKey="Pevzner P">P Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author><name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author><name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author><name sortKey="Schroder, H" uniqKey="Schroder H">H Schröder</name>
</author>
<author><name sortKey="Puglisi, Sj" uniqKey="Puglisi S">SJ Puglisi</name>
</author>
<author><name sortKey="Sinha, R" uniqKey="Sinha R">R Sinha</name>
</author>
<author><name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L Salmela</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ilie, L" uniqKey="Ilie L">L Ilie</name>
</author>
<author><name sortKey="Fazayeli, F" uniqKey="Fazayeli F">F Fazayeli</name>
</author>
<author><name sortKey="Ilie, S" uniqKey="Ilie S">S Ilie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L Salmela</name>
</author>
<author><name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Le, Hs" uniqKey="Le H">HS Le</name>
</author>
<author><name sortKey="Schulz, Mh" uniqKey="Schulz M">MH Schulz</name>
</author>
<author><name sortKey="Mccauley, Bm" uniqKey="Mccauley B">BM McCauley</name>
</author>
<author><name sortKey="Hinman, Vf" uniqKey="Hinman V">VF Hinman</name>
</author>
<author><name sortKey="Bar Joseph, Z" uniqKey="Bar Joseph Z">Z Bar-Joseph</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author><name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Griebel, T" uniqKey="Griebel T">T Griebel</name>
</author>
<author><name sortKey="Zacher, B" uniqKey="Zacher B">B Zacher</name>
</author>
<author><name sortKey="Ribeca, P" uniqKey="Ribeca P">P Ribeca</name>
</author>
<author><name sortKey="Raineri, E" uniqKey="Raineri E">E Raineri</name>
</author>
<author><name sortKey="Lacroix, V" uniqKey="Lacroix V">V Lacroix</name>
</author>
<author><name sortKey="Guig, R" uniqKey="Guig R">R Guigó</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Doring, A" uniqKey="Doring A">A Doring</name>
</author>
<author><name sortKey="Weese, D" uniqKey="Weese D">D Weese</name>
</author>
<author><name sortKey="Rausch, T" uniqKey="Rausch T">T Rausch</name>
</author>
<author><name sortKey="Reinert, K" uniqKey="Reinert K">K Reinert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author><name sortKey="Pertea, G" uniqKey="Pertea G">G Pertea</name>
</author>
<author><name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author><name sortKey="Pimentel, H" uniqKey="Pimentel H">H Pimentel</name>
</author>
<author><name sortKey="Kelley, R" uniqKey="Kelley R">R Kelley</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Walenz, B" uniqKey="Walenz B">B Walenz</name>
</author>
<author><name sortKey="Florea, L" uniqKey="Florea L">L Florea</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author><name sortKey="Papanicolaou, A" uniqKey="Papanicolaou A">A Papanicolaou</name>
</author>
<author><name sortKey="Yassour, M" uniqKey="Yassour M">M Yassour</name>
</author>
<author><name sortKey="Grabherr, M" uniqKey="Grabherr M">M Grabherr</name>
</author>
<author><name sortKey="Blood, Pd" uniqKey="Blood P">PD Blood</name>
</author>
<author><name sortKey="Bowden, J" uniqKey="Bowden J">J Bowden</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
<author><name sortKey="Nurk, S" uniqKey="Nurk S">S Nurk</name>
</author>
<author><name sortKey="Antipov, D" uniqKey="Antipov D">D Antipov</name>
</author>
<author><name sortKey="Gurevich, Aa" uniqKey="Gurevich A">AA Gurevich</name>
</author>
<author><name sortKey="Dvorkin, M" uniqKey="Dvorkin M">M Dvorkin</name>
</author>
<author><name sortKey="Kulikov, As" uniqKey="Kulikov A">AS Kulikov</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gurevich, A" uniqKey="Gurevich A">A Gurevich</name>
</author>
<author><name sortKey="Saveliev, V" uniqKey="Saveliev V">V Saveliev</name>
</author>
<author><name sortKey="Vyahhi, N" uniqKey="Vyahhi N">N Vyahhi</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Gigascience</journal-id>
<journal-id journal-id-type="iso-abbrev">Gigascience</journal-id>
<journal-title-group><journal-title>GigaScience</journal-title>
</journal-title-group>
<issn pub-type="epub">2047-217X</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">26500767</article-id>
<article-id pub-id-type="pmc">4615873</article-id>
<article-id pub-id-type="publisher-id">89</article-id>
<article-id pub-id-type="doi">10.1186/s13742-015-0089-y</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Technical Note</subject>
</subj-group>
</article-categories>
<title-group><article-title>Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Song</surname>
<given-names>Li</given-names>
</name>
<address><email>lsong10@jhu.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Florea</surname>
<given-names>Liliana</given-names>
</name>
<address><email>florea@jhu.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1"><label>1</label>
Department of Computer Science, Johns Hopkins University, Baltimore, 21218 USA</aff>
<aff id="Aff2"><label>2</label>
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, 21205 USA</aff>
</contrib-group>
<pub-date pub-type="epub"><day>19</day>
<month>10</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>19</day>
<month>10</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection"><year>2015</year>
</pub-date>
<volume>4</volume>
<elocation-id>48</elocation-id>
<history><date date-type="received"><day>1</day>
<month>6</month>
<year>2015</year>
</date>
<date date-type="accepted"><day>9</day>
<month>10</month>
<year>2015</year>
</date>
</history>
<permissions><copyright-statement>© Song and Florea. 2015</copyright-statement>
<license license-type="OpenAccess"><license-p><bold>Open Access</bold>
 This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1"><sec><title>Background</title>
<p>Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing.</p>
</sec>
<sec><title>Findings</title>
<p>We developed a <italic>k</italic>
-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted <italic>k</italic>
-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted <italic>k</italic>
-mers, Rcorrector computes a local threshold at every position in a read.</p>
</sec>
<sec><title>Conclusions</title>
<p>Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from <ext-link ext-link-type="uri" xlink:href="https://github.com/mourisl/Rcorrector/">https://github.com/mourisl/Rcorrector/</ext-link>
.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s13742-015-0089-y) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en"><title>Keywords</title>
<kwd>Next-generation sequencing</kwd>
<kwd>RNA-seq</kwd>
<kwd>Error correction</kwd>
<kwd><italic>k</italic>
-mers</kwd>
</kwd-group>
<custom-meta-group><custom-meta><meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2015</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000371 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000371 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4615873
   |texte=   Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:26500767" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki