Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Zseq: An Approach for Preprocessing Next-Generation Sequencing Data

Identifieur interne : 000D99 ( Pmc/Curation ); précédent : 000D98; suivant : 000E00

Zseq: An Approach for Preprocessing Next-Generation Sequencing Data

Auteurs : Abedalrhman Alkhateeb ; Luis Rueda

Source :

RBID : PMC:5563921

Abstract

Abstract

Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of uniquek-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage ink-mers. Based on az-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold.

Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.


Url:
DOI: 10.1089/cmb.2017.0021
PubMed: 28414515
PubMed Central: 5563921

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5563921

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Zseq: An Approach for Preprocessing Next-Generation Sequencing Data</title>
<author>
<name sortKey="Alkhateeb, Abedalrhman" sort="Alkhateeb, Abedalrhman" uniqKey="Alkhateeb A" first="Abedalrhman" last="Alkhateeb">Abedalrhman Alkhateeb</name>
</author>
<author>
<name sortKey="Rueda, Luis" sort="Rueda, Luis" uniqKey="Rueda L" first="Luis" last="Rueda">Luis Rueda</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28414515</idno>
<idno type="pmc">5563921</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5563921</idno>
<idno type="RBID">PMC:5563921</idno>
<idno type="doi">10.1089/cmb.2017.0021</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000D99</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D99</idno>
<idno type="wicri:Area/Pmc/Curation">000D99</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D99</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Zseq: An Approach for Preprocessing Next-Generation Sequencing Data</title>
<author>
<name sortKey="Alkhateeb, Abedalrhman" sort="Alkhateeb, Abedalrhman" uniqKey="Alkhateeb A" first="Abedalrhman" last="Alkhateeb">Abedalrhman Alkhateeb</name>
</author>
<author>
<name sortKey="Rueda, Luis" sort="Rueda, Luis" uniqKey="Rueda L" first="Luis" last="Rueda">Luis Rueda</name>
</author>
</analytic>
<series>
<title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>
<bold>Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique</bold>
<italic>k</italic>
<bold>-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in</bold>
<italic>k</italic>
<bold>-mers. Based on a</bold>
<italic>z</italic>
<bold>-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold.</bold>
</p>
<p>
<bold>Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as
<italic>de novo</italic>
assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover,
<italic>de novo</italic>
assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.</bold>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, S F" uniqKey="Altschul S">S.F. Altschul</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, T" uniqKey="Brown T">T. Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cheadle, C" uniqKey="Cheadle C">C. Cheadle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, Y C" uniqKey="Chen Y">Y.-C. Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cortes, C" uniqKey="Cortes C">C. Cortes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grabherr, M G" uniqKey="Grabherr M">M.G. Grabherr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ishiguro, H" uniqKey="Ishiguro H">H. Ishiguro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kannan, K" uniqKey="Kannan K">K. Kannan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D. Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, J H" uniqKey="Kim J">J.H. Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lavezzo, E" uniqKey="Lavezzo E">E. Lavezzo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H. Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mackinnon, M J" uniqKey="Mackinnon M">M.J. Mackinnon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Margulies, M" uniqKey="Margulies M">M. Margulies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morgulis, A" uniqKey="Morgulis A">A. Morgulis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pozzoli, U" uniqKey="Pozzoli U">U. Pozzoli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quail, M A" uniqKey="Quail M">M.A. Quail</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmieder, R" uniqKey="Schmieder R">R. Schmieder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Traish, A M" uniqKey="Traish A">A.M. Traish</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C. Trapnell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vogel, F" uniqKey="Vogel F">F. Vogel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waszak, S M" uniqKey="Waszak S">S.M. Waszak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wuitschick, J" uniqKey="Wuitschick J">J. Wuitschick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yakovchuk, P" uniqKey="Yakovchuk P">P. Yakovchuk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Z" uniqKey="Zhao Z">Z. Zhao</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">J Comput Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">J. Comput. Biol</journal-id>
<journal-id journal-id-type="publisher-id">cmb</journal-id>
<journal-title-group>
<journal-title>Journal of Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1066-5277</issn>
<issn pub-type="epub">1557-8666</issn>
<publisher>
<publisher-name>Mary Ann Liebert, Inc.</publisher-name>
<publisher-loc>140 Huguenot Street, 3rd FloorNew Rochelle, NY 10801USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28414515</article-id>
<article-id pub-id-type="pmc">5563921</article-id>
<article-id pub-id-type="publisher-id">10.1089/cmb.2017.0021</article-id>
<article-id pub-id-type="doi">10.1089/cmb.2017.0021</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Zseq: An Approach for Preprocessing Next-Generation Sequencing Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Alkhateeb</surname>
<given-names>Abedalrhman</given-names>
</name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Rueda</surname>
<given-names>Luis</given-names>
</name>
</contrib>
<aff id="aff1">School of Computer Science,
<institution>University of Windsor</institution>
, Windsor,
<country>Canada</country>
.</aff>
</contrib-group>
<author-notes>
<corresp>
<addr-line>Address correspondence to:</addr-line>
<addr-line>
<italic>Prof. Luis Rueda</italic>
</addr-line>
<addr-line>
<italic>School of Computer Science</italic>
</addr-line>
<institution>
<italic>University of Windsor</italic>
</institution>
<addr-line>
<italic>401 Sunset Avenue</italic>
</addr-line>
<addr-line>
<italic>Windsor ON N9B 3P4</italic>
</addr-line>
<country>Canada</country>
<break></break>
<italic>E-mail:</italic>
<email xlink:href="mailto:alkhate@uwindsor.ca">alkhate@uwindsor.ca</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>01</day>
<month>8</month>
<year>2017</year>
<pmc-comment>string-date: August 2017</pmc-comment>
</pub-date>
<pub-date pub-type="epub">
<day>01</day>
<month>8</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>01</day>
<month>8</month>
<year>2017</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>24</volume>
<issue>8</issue>
<fpage>746</fpage>
<lpage>755</lpage>
<permissions>
<copyright-statement>© Abedalrhman Alkhateeb and Luis Rueda, 2017. Published by Mary Ann Liebert, Inc.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="open-access">
<license-p>This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>
) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="cmb.2017.0021.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<p>
<bold>Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique</bold>
<italic>k</italic>
<bold>-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in</bold>
<italic>k</italic>
<bold>-mers. Based on a</bold>
<italic>z</italic>
<bold>-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold.</bold>
</p>
<p>
<bold>Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as
<italic>de novo</italic>
assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover,
<italic>de novo</italic>
assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.</bold>
</p>
</abstract>
<kwd-group kwd-group-type="author">
<title>
<bold>Keywords:</bold>
</title>
<kwd>machine learning</kwd>
<kwd>next-generation sequencing</kwd>
<kwd>preprocessing</kwd>
<kwd>RNA-SEQ analysis</kwd>
</kwd-group>
<counts>
<fig-count count="9"></fig-count>
<table-count count="5"></table-count>
<equation-count count="3"></equation-count>
<ref-count count="25"></ref-count>
<page-count count="10"></page-count>
</counts>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D99 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000D99 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:5563921
   |texte=   Zseq: An Approach for Preprocessing Next-Generation Sequencing Data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:28414515" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021