Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Bi-level error correction for PacBio long reads.

Identifieur interne : 000E79 ( Main/Exploration ); précédent : 000E78; suivant : 000E80

Bi-level error correction for PacBio long reads.

Auteurs : Yuansheng Liu ; Chaowang Lan ; Michael Blumenstein ; Jinyan Li

Source :

RBID : pubmed:29990239

Abstract

The latest sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines can generate long reads at the length of thousands of nucleic bases which is much longer than the reads at the length of hundreds generated by Illumina machines. However, these long reads are prone to much higher error rates, for example 15%, making downstream analysis and applications very difficult. Error correction is a process to improve the quality of sequencing data. Hybrid correction strategies have been recently proposed to combine Illumina reads of low error rates to fix sequencing errors in the noisy long reads with good performance. In this paper, we propose a new method named Bicolor, a bi-level framework of hybrid error correction for further improving the quality of PacBio long reads. At the first level, our method uses a de Bruijn graph-based error correction idea to search paths in pairs of solid -mers iteratively with an increasing length of -mer. At the second level, we combine the processed results under different parameters from the first level. In particular, a multiple sequence alignment algorithm is used to align those similar long reads, followed by a voting algorithm which determines the final base at each position of the reads. We compare the superior performance of Bicolor with three state-of-the-art methods on three real data sets. Results demonstrate that Bicolor always achieves the highest identity ratio. Bicolor also achieves a higher alignment ratio () and a higher number of aligned reads than the current methods on two data sets. On the third data set, our method is closely competitive to the current methods in terms of number of aligned reads and genome coverage. The C++ source codes of our algorithm are freely available at https://github.com/yuansliu/Bicolor.

DOI: 10.1109/TCBB.2017.2780832
PubMed: 29990239


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Bi-level error correction for PacBio long reads.</title>
<author>
<name sortKey="Liu, Yuansheng" sort="Liu, Yuansheng" uniqKey="Liu Y" first="Yuansheng" last="Liu">Yuansheng Liu</name>
</author>
<author>
<name sortKey="Lan, Chaowang" sort="Lan, Chaowang" uniqKey="Lan C" first="Chaowang" last="Lan">Chaowang Lan</name>
</author>
<author>
<name sortKey="Blumenstein, Michael" sort="Blumenstein, Michael" uniqKey="Blumenstein M" first="Michael" last="Blumenstein">Michael Blumenstein</name>
</author>
<author>
<name sortKey="Li, Jinyan" sort="Li, Jinyan" uniqKey="Li J" first="Jinyan" last="Li">Jinyan Li</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:29990239</idno>
<idno type="pmid">29990239</idno>
<idno type="doi">10.1109/TCBB.2017.2780832</idno>
<idno type="wicri:Area/PubMed/Corpus">000842</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000842</idno>
<idno type="wicri:Area/PubMed/Curation">000842</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000842</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000D80</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000D80</idno>
<idno type="wicri:Area/Ncbi/Merge">001E94</idno>
<idno type="wicri:Area/Ncbi/Curation">001E94</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001E94</idno>
<idno type="wicri:Area/Main/Merge">000E82</idno>
<idno type="wicri:Area/Main/Curation">000E79</idno>
<idno type="wicri:Area/Main/Exploration">000E79</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Bi-level error correction for PacBio long reads.</title>
<author>
<name sortKey="Liu, Yuansheng" sort="Liu, Yuansheng" uniqKey="Liu Y" first="Yuansheng" last="Liu">Yuansheng Liu</name>
</author>
<author>
<name sortKey="Lan, Chaowang" sort="Lan, Chaowang" uniqKey="Lan C" first="Chaowang" last="Lan">Chaowang Lan</name>
</author>
<author>
<name sortKey="Blumenstein, Michael" sort="Blumenstein, Michael" uniqKey="Blumenstein M" first="Michael" last="Blumenstein">Michael Blumenstein</name>
</author>
<author>
<name sortKey="Li, Jinyan" sort="Li, Jinyan" uniqKey="Li J" first="Jinyan" last="Li">Jinyan Li</name>
</author>
</analytic>
<series>
<title level="j">IEEE/ACM transactions on computational biology and bioinformatics</title>
<idno type="eISSN">1557-9964</idno>
<imprint>
<date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The latest sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines can generate long reads at the length of thousands of nucleic bases which is much longer than the reads at the length of hundreds generated by Illumina machines. However, these long reads are prone to much higher error rates, for example 15%, making downstream analysis and applications very difficult. Error correction is a process to improve the quality of sequencing data. Hybrid correction strategies have been recently proposed to combine Illumina reads of low error rates to fix sequencing errors in the noisy long reads with good performance. In this paper, we propose a new method named Bicolor, a bi-level framework of hybrid error correction for further improving the quality of PacBio long reads. At the first level, our method uses a de Bruijn graph-based error correction idea to search paths in pairs of solid -mers iteratively with an increasing length of -mer. At the second level, we combine the processed results under different parameters from the first level. In particular, a multiple sequence alignment algorithm is used to align those similar long reads, followed by a voting algorithm which determines the final base at each position of the reads. We compare the superior performance of Bicolor with three state-of-the-art methods on three real data sets. Results demonstrate that Bicolor always achieves the highest identity ratio. Bicolor also achieves a higher alignment ratio () and a higher number of aligned reads than the current methods on two data sets. On the third data set, our method is closely competitive to the current methods in terms of number of aligned reads and genome coverage. The C++ source codes of our algorithm are freely available at https://github.com/yuansliu/Bicolor.</div>
</front>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Blumenstein, Michael" sort="Blumenstein, Michael" uniqKey="Blumenstein M" first="Michael" last="Blumenstein">Michael Blumenstein</name>
<name sortKey="Lan, Chaowang" sort="Lan, Chaowang" uniqKey="Lan C" first="Chaowang" last="Lan">Chaowang Lan</name>
<name sortKey="Li, Jinyan" sort="Li, Jinyan" uniqKey="Li J" first="Jinyan" last="Li">Jinyan Li</name>
<name sortKey="Liu, Yuansheng" sort="Liu, Yuansheng" uniqKey="Liu Y" first="Yuansheng" last="Liu">Yuansheng Liu</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E79 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000E79 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:29990239
   |texte=   Bi-level error correction for PacBio long reads.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:29990239" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021