Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A random forest classifier for detecting rare variants in NGS data from viral populations

Identifieur interne : 000B96 ( Pmc/Curation ); précédent : 000B95; suivant : 000B97

A random forest classifier for detecting rare variants in NGS data from viral populations

Auteurs : Raunaq Malhotra [États-Unis] ; Manjari Jha [États-Unis] ; Mary Poss [États-Unis] ; Raj Acharya [États-Unis]

Source :

RBID : PMC:5548337

Abstract

We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of k-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies k-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of k-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that k-mers of a given size constitute a frame.

We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link https://github.com/raunaq-m/MultiRes.


Url:
DOI: 10.1016/j.csbj.2017.07.001
PubMed: 28819548
PubMed Central: 5548337

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5548337

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A random forest classifier for detecting rare variants in NGS data from viral populations</title>
<author>
<name sortKey="Malhotra, Raunaq" sort="Malhotra, Raunaq" uniqKey="Malhotra R" first="Raunaq" last="Malhotra">Raunaq Malhotra</name>
<affiliation wicri:level="1">
<nlm:aff id="af0005">The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Jha, Manjari" sort="Jha, Manjari" uniqKey="Jha M" first="Manjari" last="Jha">Manjari Jha</name>
<affiliation wicri:level="1">
<nlm:aff id="af0005">The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Poss, Mary" sort="Poss, Mary" uniqKey="Poss M" first="Mary" last="Poss">Mary Poss</name>
<affiliation wicri:level="1">
<nlm:aff id="af0010">Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biology, The Pennsylvania State University, University Park, PA 16802</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Acharya, Raj" sort="Acharya, Raj" uniqKey="Acharya R" first="Raj" last="Acharya">Raj Acharya</name>
<affiliation wicri:level="1">
<nlm:aff id="af0015">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Informatics and Computing, Indiana University, Bloomington, IN 47405</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28819548</idno>
<idno type="pmc">5548337</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5548337</idno>
<idno type="RBID">PMC:5548337</idno>
<idno type="doi">10.1016/j.csbj.2017.07.001</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000B96</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B96</idno>
<idno type="wicri:Area/Pmc/Curation">000B96</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B96</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A random forest classifier for detecting rare variants in NGS data from viral populations</title>
<author>
<name sortKey="Malhotra, Raunaq" sort="Malhotra, Raunaq" uniqKey="Malhotra R" first="Raunaq" last="Malhotra">Raunaq Malhotra</name>
<affiliation wicri:level="1">
<nlm:aff id="af0005">The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Jha, Manjari" sort="Jha, Manjari" uniqKey="Jha M" first="Manjari" last="Jha">Manjari Jha</name>
<affiliation wicri:level="1">
<nlm:aff id="af0005">The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Poss, Mary" sort="Poss, Mary" uniqKey="Poss M" first="Mary" last="Poss">Mary Poss</name>
<affiliation wicri:level="1">
<nlm:aff id="af0010">Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biology, The Pennsylvania State University, University Park, PA 16802</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Acharya, Raj" sort="Acharya, Raj" uniqKey="Acharya R" first="Raj" last="Acharya">Raj Acharya</name>
<affiliation wicri:level="1">
<nlm:aff id="af0015">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Informatics and Computing, Indiana University, Bloomington, IN 47405</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Computational and Structural Biotechnology Journal</title>
<idno type="eISSN">2001-0370</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of
<italic>k</italic>
-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies
<italic>k</italic>
-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of
<italic>k</italic>
-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that
<italic>k</italic>
-mers of a given size constitute a frame.</p>
<p>We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives
<italic>k</italic>
-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their
<italic>de-novo</italic>
assembly. It has high recall of the true
<italic>k</italic>
-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link
<ext-link ext-link-type="uri" xlink:href="https://github.com/raunaq-m/MultiRes" id="ir0005">https://github.com/raunaq-m/MultiRes</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Nguyen, D X" uniqKey="Nguyen D">D.X. Nguyen</name>
</author>
<author>
<name sortKey="Massague, J" uniqKey="Massague J">J. Massagué</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcelroy, K" uniqKey="Mcelroy K">K. McElroy</name>
</author>
<author>
<name sortKey="Thomas, T" uniqKey="Thomas T">T. Thomas</name>
</author>
<author>
<name sortKey="Luciani, F" uniqKey="Luciani F">F. Luciani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beerenwinkel, N" uniqKey="Beerenwinkel N">N. Beerenwinkel</name>
</author>
<author>
<name sortKey="Gunthard, H F" uniqKey="Gunthard H">H.F. Gunthard</name>
</author>
<author>
<name sortKey="Roth, V" uniqKey="Roth V">V. Roth</name>
</author>
<author>
<name sortKey="Metzner, K J" uniqKey="Metzner K">K.J. Metzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schirmer, M" uniqKey="Schirmer M">M. Schirmer</name>
</author>
<author>
<name sortKey="Ijaz, U Z" uniqKey="Ijaz U">U.Z. Ijaz</name>
</author>
<author>
<name sortKey="D More, R" uniqKey="D More R">R. D’Amore</name>
</author>
<author>
<name sortKey="Hall, N" uniqKey="Hall N">N. Hall</name>
</author>
<author>
<name sortKey="Sloan, W T" uniqKey="Sloan W">W.T. Sloan</name>
</author>
<author>
<name sortKey="Quince, C" uniqKey="Quince C">C. Quince</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meacham, F" uniqKey="Meacham F">F. Meacham</name>
</author>
<author>
<name sortKey="Boffelli, D" uniqKey="Boffelli D">D. Boffelli</name>
</author>
<author>
<name sortKey="Dhahbi, J" uniqKey="Dhahbi J">J. Dhahbi</name>
</author>
<author>
<name sortKey="Martin, D I" uniqKey="Martin D">D.I. Martin</name>
</author>
<author>
<name sortKey="Singer, M" uniqKey="Singer M">M. Singer</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L. Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Topfer, A" uniqKey="Topfer A">A. Töpfer</name>
</author>
<author>
<name sortKey="Zagordi, O" uniqKey="Zagordi O">O. Zagordi</name>
</author>
<author>
<name sortKey="Prabhakaran, S" uniqKey="Prabhakaran S">S. Prabhakaran</name>
</author>
<author>
<name sortKey="Roth, V" uniqKey="Roth V">V. Roth</name>
</author>
<author>
<name sortKey="Halperin, E" uniqKey="Halperin E">E. Halperin</name>
</author>
<author>
<name sortKey="Beerenwinkel, N" uniqKey="Beerenwinkel N">N. Beerenwinkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zagordi, O" uniqKey="Zagordi O">O. Zagordi</name>
</author>
<author>
<name sortKey="Bhattacharya, A" uniqKey="Bhattacharya A">A. Bhattacharya</name>
</author>
<author>
<name sortKey="Eriksson, N" uniqKey="Eriksson N">N. Eriksson</name>
</author>
<author>
<name sortKey="Beerenwinkel, N" uniqKey="Beerenwinkel N">N. Beerenwinkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mangul, S" uniqKey="Mangul S">S. Mangul</name>
</author>
<author>
<name sortKey="Wu, N C" uniqKey="Wu N">N.C. Wu</name>
</author>
<author>
<name sortKey="Mancuso, N" uniqKey="Mancuso N">N. Mancuso</name>
</author>
<author>
<name sortKey="Zelikovsky, A" uniqKey="Zelikovsky A">A. Zelikovsky</name>
</author>
<author>
<name sortKey="Sun, R" uniqKey="Sun R">R. Sun</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E. Eskin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, X" uniqKey="Yang X">X. Yang</name>
</author>
<author>
<name sortKey="Charlebois, P" uniqKey="Charlebois P">P. Charlebois</name>
</author>
<author>
<name sortKey="Macalalad, A" uniqKey="Macalalad A">A. Macalalad</name>
</author>
<author>
<name sortKey="Henn, M" uniqKey="Henn M">M. Henn</name>
</author>
<author>
<name sortKey="Zody, M" uniqKey="Zody M">M. Zody</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilm, A" uniqKey="Wilm A">A. Wilm</name>
</author>
<author>
<name sortKey="Aw, P P K" uniqKey="Aw P">P.P.K. Aw</name>
</author>
<author>
<name sortKey="Bertrand, D" uniqKey="Bertrand D">D. Bertrand</name>
</author>
<author>
<name sortKey="Yeo, G H T" uniqKey="Yeo G">G.H.T. Yeo</name>
</author>
<author>
<name sortKey="Ong, S H" uniqKey="Ong S">S.H. Ong</name>
</author>
<author>
<name sortKey="Wong, C H" uniqKey="Wong C">C.H. Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Topfer, A" uniqKey="Topfer A">A. Töpfer</name>
</author>
<author>
<name sortKey="Marschall, T" uniqKey="Marschall T">T. Marschall</name>
</author>
<author>
<name sortKey="Bull, R A" uniqKey="Bull R">R.A. Bull</name>
</author>
<author>
<name sortKey="Luciani, F" uniqKey="Luciani F">F. Luciani</name>
</author>
<author>
<name sortKey="Schonhuth, A" uniqKey="Schonhuth A">A. Schönhuth</name>
</author>
<author>
<name sortKey="Beerenwinkel, N" uniqKey="Beerenwinkel N">N. Beerenwinkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelley, D R" uniqKey="Kelley D">D.R. Kelley</name>
</author>
<author>
<name sortKey="Schatz, M C" uniqKey="Schatz M">M.C. Schatz</name>
</author>
<author>
<name sortKey="Salzberg, S L" uniqKey="Salzberg S">S.L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heo, Y" uniqKey="Heo Y">Y. Heo</name>
</author>
<author>
<name sortKey="Wu, X L" uniqKey="Wu X">X.-L. Wu</name>
</author>
<author>
<name sortKey="Chen, D" uniqKey="Chen D">D. Chen</name>
</author>
<author>
<name sortKey="Ma, J" uniqKey="Ma J">J. Ma</name>
</author>
<author>
<name sortKey="Hwu, W M" uniqKey="Hwu W">W.-M. Hwu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y. Liu</name>
</author>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J. Schröder</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B. Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P. Medvedev</name>
</author>
<author>
<name sortKey="Scott, E" uniqKey="Scott E">E. Scott</name>
</author>
<author>
<name sortKey="Kakaradov, B" uniqKey="Kakaradov B">B. Kakaradov</name>
</author>
<author>
<name sortKey="Pevzner, P A" uniqKey="Pevzner P">P.A. Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skums, P" uniqKey="Skums P">P. Skums</name>
</author>
<author>
<name sortKey="Dimitrova, Z" uniqKey="Dimitrova Z">Z. Dimitrova</name>
</author>
<author>
<name sortKey="Campo, D S" uniqKey="Campo D">D.S. Campo</name>
</author>
<author>
<name sortKey="Vaughan, G" uniqKey="Vaughan G">G. Vaughan</name>
</author>
<author>
<name sortKey="Rossi, L" uniqKey="Rossi L">L. Rossi</name>
</author>
<author>
<name sortKey="Forbi, J C" uniqKey="Forbi J">J.C. Forbi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G. Rizk</name>
</author>
<author>
<name sortKey="Lavenier, D" uniqKey="Lavenier D">D. Lavenier</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S. Deorowicz</name>
</author>
<author>
<name sortKey="Kokot, M" uniqKey="Kokot M">M. Kokot</name>
</author>
<author>
<name sortKey="Grabowski, S" uniqKey="Grabowski S">S. Grabowski</name>
</author>
<author>
<name sortKey="Debudaj Grabysz, A" uniqKey="Debudaj Grabysz A">A. Debudaj-Grabysz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P. Medvedev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feng, S" uniqKey="Feng S">S. Feng</name>
</author>
<author>
<name sortKey="Lo, C C" uniqKey="Lo C">C.-C. Lo</name>
</author>
<author>
<name sortKey="Li, P E" uniqKey="Li P">P.-E. Li</name>
</author>
<author>
<name sortKey="Chain, P S" uniqKey="Chain P">P.S. Chain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, J" uniqKey="Ding J">J. Ding</name>
</author>
<author>
<name sortKey="Bashashati, A" uniqKey="Bashashati A">A. Bashashati</name>
</author>
<author>
<name sortKey="Roth, A" uniqKey="Roth A">A. Roth</name>
</author>
<author>
<name sortKey="Oloumi, A" uniqKey="Oloumi A">A. Oloumi</name>
</author>
<author>
<name sortKey="Tse, K" uniqKey="Tse K">K. Tse</name>
</author>
<author>
<name sortKey="Zeng, T" uniqKey="Zeng T">T. Zeng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Poplin, R" uniqKey="Poplin R">R. Poplin</name>
</author>
<author>
<name sortKey="Newburger, D" uniqKey="Newburger D">D. Newburger</name>
</author>
<author>
<name sortKey="Dijamco, J" uniqKey="Dijamco J">J. Dijamco</name>
</author>
<author>
<name sortKey="Nguyen, N" uniqKey="Nguyen N">N. Nguyen</name>
</author>
<author>
<name sortKey="Loy, D" uniqKey="Loy D">D. Loy</name>
</author>
<author>
<name sortKey="Gross, S S" uniqKey="Gross S">S.S. Gross</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ferreira, P" uniqKey="Ferreira P">P. Ferreira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Duffin, R J" uniqKey="Duffin R">R.J. Duffin</name>
</author>
<author>
<name sortKey="Schaeffer, A C" uniqKey="Schaeffer A">A.C. Schaeffer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daubechies, I" uniqKey="Daubechies I">I. Daubechies</name>
</author>
<author>
<name sortKey="Grossmann, A" uniqKey="Grossmann A">A. Grossmann</name>
</author>
<author>
<name sortKey="Meyer, Y" uniqKey="Meyer Y">Y. Meyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daubechies, I" uniqKey="Daubechies I">I. Daubechies</name>
</author>
<author>
<name sortKey="Han, B" uniqKey="Han B">B. Han</name>
</author>
<author>
<name sortKey="Ron, A" uniqKey="Ron A">A. Ron</name>
</author>
<author>
<name sortKey="Shen, Z" uniqKey="Shen Z">Z. Shen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Unser, M" uniqKey="Unser M">M. Unser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ron, A" uniqKey="Ron A">A. Ron</name>
</author>
<author>
<name sortKey="Shen, Z" uniqKey="Shen Z">Z. Shen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nikolenko, S I" uniqKey="Nikolenko S">S.I. Nikolenko</name>
</author>
<author>
<name sortKey="Korobeynikov, A I" uniqKey="Korobeynikov A">A.I. Korobeynikov</name>
</author>
<author>
<name sortKey="Alekseyev, M A" uniqKey="Alekseyev M">M.A. Alekseyev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le, H S" uniqKey="Le H">H.-S. Le</name>
</author>
<author>
<name sortKey="Schulz, M H" uniqKey="Schulz M">M.H. Schulz</name>
</author>
<author>
<name sortKey="Mccauley, B M" uniqKey="Mccauley B">B.M. McCauley</name>
</author>
<author>
<name sortKey="Hinman, V F" uniqKey="Hinman V">V.F. Hinman</name>
</author>
<author>
<name sortKey="Bar Joseph, Z" uniqKey="Bar Joseph Z">Z. Bar-Joseph</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaiser, G" uniqKey="Kaiser G">G. Kaiser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hussein, N" uniqKey="Hussein N">N. Hussein</name>
</author>
<author>
<name sortKey="Zekri, A Rn" uniqKey="Zekri A">A-RN Zekri</name>
</author>
<author>
<name sortKey="Abouelhoda, M" uniqKey="Abouelhoda M">M. Abouelhoda</name>
</author>
<author>
<name sortKey="El Din, H M A" uniqKey="El Din H">H.M.A. El-din</name>
</author>
<author>
<name sortKey="Ghamry, A A" uniqKey="Ghamry A">A.A. Ghamry</name>
</author>
<author>
<name sortKey="Amer, M A" uniqKey="Amer M">M.A. Amer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angly, F E" uniqKey="Angly F">F.E. Angly</name>
</author>
<author>
<name sortKey="Willner, D" uniqKey="Willner D">D. Willner</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F. Rohwer</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P. Hugenholtz</name>
</author>
<author>
<name sortKey="Tyson, G W" uniqKey="Tyson G">G.W. Tyson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giallonardo, F D" uniqKey="Giallonardo F">F.D. Giallonardo</name>
</author>
<author>
<name sortKey="Topfer, A" uniqKey="Topfer A">A. Töpfer</name>
</author>
<author>
<name sortKey="Rey, M" uniqKey="Rey M">M. Rey</name>
</author>
<author>
<name sortKey="Prabhakaran, S" uniqKey="Prabhakaran S">S. Prabhakaran</name>
</author>
<author>
<name sortKey="Duport, Y" uniqKey="Duport Y">Y. Duport</name>
</author>
<author>
<name sortKey="Leemann, C" uniqKey="Leemann C">C. Leemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, X" uniqKey="Yang X">X. Yang</name>
</author>
<author>
<name sortKey="Charlebois, P" uniqKey="Charlebois P">P. Charlebois</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S. Gnerre</name>
</author>
<author>
<name sortKey="Coole, M G" uniqKey="Coole M">M.G. Coole</name>
</author>
<author>
<name sortKey="Lennon, N J" uniqKey="Lennon N">N.J. Lennon</name>
</author>
<author>
<name sortKey="Levin, J Z" uniqKey="Levin J">J.Z. Levin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A. Bankevich</name>
</author>
<author>
<name sortKey="Nurk, S" uniqKey="Nurk S">S. Nurk</name>
</author>
<author>
<name sortKey="Antipov, D" uniqKey="Antipov D">D. Antipov</name>
</author>
<author>
<name sortKey="Gurevich, A A" uniqKey="Gurevich A">A.A. Gurevich</name>
</author>
<author>
<name sortKey="Dvorkin, M" uniqKey="Dvorkin M">M. Dvorkin</name>
</author>
<author>
<name sortKey="Kulikov, A S" uniqKey="Kulikov A">A.S. Kulikov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iqbal, Z" uniqKey="Iqbal Z">Z. Iqbal</name>
</author>
<author>
<name sortKey="Caccamo, M" uniqKey="Caccamo M">M. Caccamo</name>
</author>
<author>
<name sortKey="Turner, I" uniqKey="Turner I">I. Turner</name>
</author>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P. Flicek</name>
</author>
<author>
<name sortKey="Mcvean, G" uniqKey="Mcvean G">G. McVean</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Comput Struct Biotechnol J</journal-id>
<journal-id journal-id-type="iso-abbrev">Comput Struct Biotechnol J</journal-id>
<journal-title-group>
<journal-title>Computational and Structural Biotechnology Journal</journal-title>
</journal-title-group>
<issn pub-type="epub">2001-0370</issn>
<publisher>
<publisher-name>Research Network of Computational and Structural Biotechnology</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28819548</article-id>
<article-id pub-id-type="pmc">5548337</article-id>
<article-id pub-id-type="publisher-id">S2001-0370(17)30039-9</article-id>
<article-id pub-id-type="doi">10.1016/j.csbj.2017.07.001</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A random forest classifier for detecting rare variants in NGS data from viral populations</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Malhotra</surname>
<given-names>Raunaq</given-names>
</name>
<email>raunaq.123@gmail.com</email>
<xref rid="af0005" ref-type="aff">a</xref>
<xref rid="cr0005" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jha</surname>
<given-names>Manjari</given-names>
</name>
<xref rid="af0005" ref-type="aff">a</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Poss</surname>
<given-names>Mary</given-names>
</name>
<xref rid="af0010" ref-type="aff">b</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Acharya</surname>
<given-names>Raj</given-names>
</name>
<xref rid="af0015" ref-type="aff">c</xref>
</contrib>
</contrib-group>
<aff id="af0005">
<label>a</label>
The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA</aff>
<aff id="af0010">
<label>b</label>
Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA</aff>
<aff id="af0015">
<label>c</label>
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</aff>
<author-notes>
<corresp id="cr0005">
<label>*</label>
Corresponding author.
<email>raunaq.123@gmail.com</email>
</corresp>
</author-notes>
<pub-date pub-type="pmc-release">
<day>19</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on .</pmc-comment>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<pub-date pub-type="epub">
<day>19</day>
<month>7</month>
<year>2017</year>
</pub-date>
<volume>15</volume>
<fpage>388</fpage>
<lpage>395</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>3</month>
<year>2017</year>
</date>
<date date-type="rev-recd">
<day>1</day>
<month>7</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>3</day>
<month>7</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>© 2017 The Authors</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="CC BY" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).</license-p>
</license>
</permissions>
<abstract id="ab0005">
<p>We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of
<italic>k</italic>
-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies
<italic>k</italic>
-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of
<italic>k</italic>
-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that
<italic>k</italic>
-mers of a given size constitute a frame.</p>
<p>We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives
<italic>k</italic>
-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their
<italic>de-novo</italic>
assembly. It has high recall of the true
<italic>k</italic>
-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link
<ext-link ext-link-type="uri" xlink:href="https://github.com/raunaq-m/MultiRes" id="ir0005">https://github.com/raunaq-m/MultiRes</ext-link>
.</p>
</abstract>
<kwd-group id="ks0005">
<title>Keywords</title>
<kwd>Sequencing error detection</kwd>
<kwd>Reference free methods</kwd>
<kwd>Next-generation sequencing</kwd>
<kwd>Viral populations</kwd>
<kwd>Multi-resolution frames</kwd>
<kwd>Random forest classifier</kwd>
</kwd-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B96 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000B96 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:5548337
   |texte=   A random forest classifier for detecting rare variants in NGS data from viral populations
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:28819548" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021