MersV1, Pmc, Checkpoint, bibRecord, 001280

Separating metagenomic short reads into genomes via clustering

Identifieur interne : 001280 ( Pmc/Checkpoint ); précédent : 001279; suivant : 001281

Separating metagenomic short reads into genomes via clustering

Auteurs : Olga Tanaseichuk [États-Unis] ; James Borneman [États-Unis] ; Tao Jiang [États-Unis]

Source :

Algorithms for Molecular Biology : AMB [ 1748-7188 ] ; 2012.

RBID : PMC:3537596

Abstract

Background

The metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS) technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among the existing computational tools for metagenomic analysis, there are similarity-based methods that use reference databases to align reads and composition-based methods that use composition patterns (i.e., frequencies of short words or l-mers) to cluster reads. Similarity-based methods are unable to classify reads from unknown species without close references (which constitute the majority of reads). Since composition patterns are preserved only in significantly large fragments, composition-based tools cannot be used for very short reads, which becomes a significant limitation with the development of NGS. A recently proposed algorithm, AbundanceBin, introduced another method that bins reads based on predicted abundances of the genomes sequenced. However, it does not separate reads from genomes of similar abundance levels.

Results

In this work, we present a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the l-mers belong to unique genomes when l is sufficiently large. The first phase of the algorithm results in clusters of l-mers each of which belongs to one genome. During the second phase, clusters are merged based on l-mer repeat information. These final clusters are used to assign reads. The algorithm could handle very short reads and sequencing errors. It is initially designed for genomes with similar abundance levels and then extended to handle arbitrary abundance ratios. The software can be download for free at http://www.cs.ucr.edu/∼tanaseio/toss.htm.

Conclusions

Our tests on a large number of simulated metagenomic datasets concerning species at various phylogenetic distances demonstrate that genomes can be separated if the number of common repeats is smaller than the number of genome-specific repeats. For such genomes, our method can separate NGS reads with a high precision and sensitivity.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3537596

DOI: 10.1186/1748-7188-7-27
PubMed: 23009059
PubMed Central: 3537596

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: 000940
to stream Pmc, to step Curation: 000940

Links to Exploration step

PMC:3537596

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Separating metagenomic short reads into genomes via clustering</title>
<author><name sortKey="Tanaseichuk, Olga" sort="Tanaseichuk, Olga" uniqKey="Tanaseichuk O" first="Olga" last="Tanaseichuk">Olga Tanaseichuk</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of California, Riverside, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of California, Riverside, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Borneman, James" sort="Borneman, James" uniqKey="Borneman J" first="James" last="Borneman">James Borneman</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Department of Plant Pathology and Microbiology, University of California, Riverside, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Plant Pathology and Microbiology, University of California, Riverside, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Jiang, Tao" sort="Jiang, Tao" uniqKey="Jiang T" first="Tao" last="Jiang">Tao Jiang</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of California, Riverside, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of California, Riverside, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">23009059</idno>
<idno type="pmc">3537596</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3537596</idno>
<idno type="RBID">PMC:3537596</idno>
<idno type="doi">10.1186/1748-7188-7-27</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000940</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000940</idno>
<idno type="wicri:Area/Pmc/Curation">000940</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000940</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001280</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001280</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Separating metagenomic short reads into genomes via clustering</title>
<author><name sortKey="Tanaseichuk, Olga" sort="Tanaseichuk, Olga" uniqKey="Tanaseichuk O" first="Olga" last="Tanaseichuk">Olga Tanaseichuk</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of California, Riverside, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of California, Riverside, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Borneman, James" sort="Borneman, James" uniqKey="Borneman J" first="James" last="Borneman">James Borneman</name>
<affiliation wicri:level="2"><nlm:aff id="I2">Department of Plant Pathology and Microbiology, University of California, Riverside, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Plant Pathology and Microbiology, University of California, Riverside, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Jiang, Tao" sort="Jiang, Tao" uniqKey="Jiang T" first="Tao" last="Jiang">Tao Jiang</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of California, Riverside, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of California, Riverside, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Algorithms for Molecular Biology : AMB</title>
<idno type="eISSN">1748-7188</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>The metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS) technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among the existing computational tools for metagenomic analysis, there are similarity-based methods that use reference databases to align reads and composition-based methods that use composition patterns (<italic>i.e.</italic>
, frequencies of short words or <italic>l</italic>
-mers) to cluster reads. Similarity-based methods are unable to classify reads from unknown species without close references (which constitute the majority of reads). Since composition patterns are preserved only in significantly large fragments, composition-based tools cannot be used for very short reads, which becomes a significant limitation with the development of NGS. A recently proposed algorithm, AbundanceBin, introduced another method that bins reads based on predicted abundances of the genomes sequenced. However, it does not separate reads from genomes of similar abundance levels.</p>
</sec>
<sec><title>Results</title>
<p>In this work, we present a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the <italic>l</italic>
-mers belong to unique genomes when <italic>l</italic>
 is sufficiently large. The first phase of the algorithm results in clusters of <italic>l</italic>
-mers each of which belongs to one genome. During the second phase, clusters are merged based on <italic>l</italic>-mer repeat information. These final clusters are used to assign reads. The algorithm could handle very short reads and sequencing errors. It is initially designed for genomes with similar abundance levels and then extended to handle arbitrary abundance ratios. The software can be download for free at
<ext-link ext-link-type="uri" xlink:href="http://www.cs.ucr.edu/~tanaseio/toss.htm">http://www.cs.ucr.edu/∼tanaseio/toss.htm</ext-link>
.</p>
</sec>
<sec><title>Conclusions</title>
<p>Our tests on a large number of simulated metagenomic datasets concerning species at various phylogenetic distances demonstrate that genomes can be separated if the number of common repeats is smaller than the number of genome-specific repeats. For such genomes, our method can separate NGS reads with a high precision and sensitivity.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Handelsman, J" uniqKey="Handelsman J">J Handelsman</name>
</author>
<author><name sortKey="Rondon, Mr" uniqKey="Rondon M">MR Rondon</name>
</author>
<author><name sortKey="Brady, Sf" uniqKey="Brady S">SF Brady</name>
</author>
<author><name sortKey="Clardy, J" uniqKey="Clardy J">J Clardy</name>
</author>
<author><name sortKey="Goodman, Rm" uniqKey="Goodman R">RM Goodman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rappe, Ms" uniqKey="Rappe M">MS Rappé</name>
</author>
<author><name sortKey="Giovannoni, Sj" uniqKey="Giovannoni S">SJ Giovannoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Beja, O" uniqKey="Beja O">O Béjà</name>
</author>
<author><name sortKey="Suzuki, Mt" uniqKey="Suzuki M">MT Suzuki</name>
</author>
<author><name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
<author><name sortKey="Aravind, L" uniqKey="Aravind L">L Aravind</name>
</author>
<author><name sortKey="Hadd, A" uniqKey="Hadd A">A Hadd</name>
</author>
<author><name sortKey="Nguyen, Lp" uniqKey="Nguyen L">LP Nguyen</name>
</author>
<author><name sortKey="Villacorta, R" uniqKey="Villacorta R">R Villacorta</name>
</author>
<author><name sortKey="Amjadi, M" uniqKey="Amjadi M">M Amjadi</name>
</author>
<author><name sortKey="Garrigues, C" uniqKey="Garrigues C">C Garrigues</name>
</author>
<author><name sortKey="Jovanovich, Sb" uniqKey="Jovanovich S">SB Jovanovich</name>
</author>
<author><name sortKey="Feldman, Ra" uniqKey="Feldman R">RA Feldman</name>
</author>
<author><name sortKey="Delong, Ef" uniqKey="Delong E">EF DeLong</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Venter, Jc" uniqKey="Venter J">JC Venter</name>
</author>
<author><name sortKey="Remington, K" uniqKey="Remington K">K Remington</name>
</author>
<author><name sortKey="Heidelberg, Jf" uniqKey="Heidelberg J">JF Heidelberg</name>
</author>
<author><name sortKey="Halpern, Al" uniqKey="Halpern A">AL Halpern</name>
</author>
<author><name sortKey="Rusch, D" uniqKey="Rusch D">D Rusch</name>
</author>
<author><name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author><name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author><name sortKey="Paulsen, I" uniqKey="Paulsen I">I Paulsen</name>
</author>
<author><name sortKey="Nelson, Ke" uniqKey="Nelson K">KE Nelson</name>
</author>
<author><name sortKey="Nelson, W" uniqKey="Nelson W">W Nelson</name>
</author>
<author><name sortKey="Fouts, De" uniqKey="Fouts D">DE Fouts</name>
</author>
<author><name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author><name sortKey="Knap, Ah" uniqKey="Knap A">AH Knap</name>
</author>
<author><name sortKey="Lomas, Mw" uniqKey="Lomas M">MW Lomas</name>
</author>
<author><name sortKey="Nealson, K" uniqKey="Nealson K">K Nealson</name>
</author>
<author><name sortKey="White, O" uniqKey="White O">O White</name>
</author>
<author><name sortKey="Peterson, J" uniqKey="Peterson J">J Peterson</name>
</author>
<author><name sortKey="Hoffman, J" uniqKey="Hoffman J">J Hoffman</name>
</author>
<author><name sortKey="Parsons, R" uniqKey="Parsons R">R Parsons</name>
</author>
<author><name sortKey="Baden Tillson, H" uniqKey="Baden Tillson H">H Baden-Tillson</name>
</author>
<author><name sortKey="Pfannkoch, C" uniqKey="Pfannkoch C">C Pfannkoch</name>
</author>
<author><name sortKey="Rogers, Yh" uniqKey="Rogers Y">YH Rogers</name>
</author>
<author><name sortKey="Smith, Ho" uniqKey="Smith H">HO Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gill, Sr" uniqKey="Gill S">SR Gill</name>
</author>
<author><name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author><name sortKey="Deboy, Rt" uniqKey="Deboy R">RT DeBoy</name>
</author>
<author><name sortKey="Eckburg, Pb" uniqKey="Eckburg P">PB Eckburg</name>
</author>
<author><name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author><name sortKey="Samuel, Bs" uniqKey="Samuel B">BS Samuel</name>
</author>
<author><name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
<author><name sortKey="Relman, Da" uniqKey="Relman D">DA Relman</name>
</author>
<author><name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
<author><name sortKey="Nelson, Ke" uniqKey="Nelson K">KE Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tyson, Gw" uniqKey="Tyson G">GW Tyson</name>
</author>
<author><name sortKey="Chapman, J" uniqKey="Chapman J">J Chapman</name>
</author>
<author><name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author><name sortKey="Allen, Ee" uniqKey="Allen E">EE Allen</name>
</author>
<author><name sortKey="Ram, Rj" uniqKey="Ram R">RJ Ram</name>
</author>
<author><name sortKey="Richardson, Pm" uniqKey="Richardson P">PM Richardson</name>
</author>
<author><name sortKey="Solovyev, Vv" uniqKey="Solovyev V">VV Solovyev</name>
</author>
<author><name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author><name sortKey="Rokhsar, Ds" uniqKey="Rokhsar D">DS Rokhsar</name>
</author>
<author><name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Warren, Rl" uniqKey="Warren R">RL Warren</name>
</author>
<author><name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author><name sortKey="Jones, Sjm" uniqKey="Jones S">SJM Jones</name>
</author>
<author><name sortKey="Holt, Ra" uniqKey="Holt R">RA Holt</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dohm, Jc" uniqKey="Dohm J">JC Dohm</name>
</author>
<author><name sortKey="Lottaz, C" uniqKey="Lottaz C">C Lottaz</name>
</author>
<author><name sortKey="Borodina, T" uniqKey="Borodina T">T Borodina</name>
</author>
<author><name sortKey="Himmelbauer, H" uniqKey="Himmelbauer H">H Himmelbauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author><name sortKey="Wong, K" uniqKey="Wong K">K Wong</name>
</author>
<author><name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author><name sortKey="Schein, Je" uniqKey="Schein J">JE Schein</name>
</author>
<author><name sortKey="Jones, Sjm" uniqKey="Jones S">SJM Jones</name>
</author>
<author><name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Charuvaka, A" uniqKey="Charuvaka A">A Charuvaka</name>
</author>
<author><name sortKey="Rangwala, H" uniqKey="Rangwala H">H Rangwala</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chakravorty, S" uniqKey="Chakravorty S">S Chakravorty</name>
</author>
<author><name sortKey="Helb, D" uniqKey="Helb D">D Helb</name>
</author>
<author><name sortKey="Burday, M" uniqKey="Burday M">M Burday</name>
</author>
<author><name sortKey="Connell, N" uniqKey="Connell N">N Connell</name>
</author>
<author><name sortKey="Alland, D" uniqKey="Alland D">D Alland</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author><name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author><name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author><name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krause, L" uniqKey="Krause L">L Krause</name>
</author>
<author><name sortKey="Diaz, Nn" uniqKey="Diaz N">NN Diaz</name>
</author>
<author><name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author><name sortKey="Kelley, S" uniqKey="Kelley S">S Kelley</name>
</author>
<author><name sortKey="Nattkemper, Tw" uniqKey="Nattkemper T">TW Nattkemper</name>
</author>
<author><name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author><name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author><name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhou, F" uniqKey="Zhou F">F Zhou</name>
</author>
<author><name sortKey="Olman, V" uniqKey="Olman V">V Olman</name>
</author>
<author><name sortKey="Xu, Y" uniqKey="Xu Y">Y Xu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chatterji, S" uniqKey="Chatterji S">S Chatterji</name>
</author>
<author><name sortKey="Yamazaki, I" uniqKey="Yamazaki I">I Yamazaki</name>
</author>
<author><name sortKey="Bai, Z" uniqKey="Bai Z">Z Bai</name>
</author>
<author><name sortKey="Eisen, J" uniqKey="Eisen J">J Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chan, Ck" uniqKey="Chan C">CK Chan</name>
</author>
<author><name sortKey="Hsu, A" uniqKey="Hsu A">A Hsu</name>
</author>
<author><name sortKey="Halgamuge, S" uniqKey="Halgamuge S">S Halgamuge</name>
</author>
<author><name sortKey="Tang, Sl" uniqKey="Tang S">SL Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Teeling, H" uniqKey="Teeling H">H Teeling</name>
</author>
<author><name sortKey="Waldmann, J" uniqKey="Waldmann J">J Waldmann</name>
</author>
<author><name sortKey="Lombardot, T" uniqKey="Lombardot T">T Lombardot</name>
</author>
<author><name sortKey="Bauer, M" uniqKey="Bauer M">M Bauer</name>
</author>
<author><name sortKey="Glockner, F" uniqKey="Glockner F">F Glockner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Leung, Hcm" uniqKey="Leung H">HCM Leung</name>
</author>
<author><name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author><name sortKey="Yang, B" uniqKey="Yang B">B Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Diaz, N" uniqKey="Diaz N">N Diaz</name>
</author>
<author><name sortKey="Krause, L" uniqKey="Krause L">L Krause</name>
</author>
<author><name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author><name sortKey="Niehaus, K" uniqKey="Niehaus K">K Niehaus</name>
</author>
<author><name sortKey="Nattkemper, T" uniqKey="Nattkemper T">T Nattkemper</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bentley, Sd" uniqKey="Bentley S">SD Bentley</name>
</author>
<author><name sortKey="Parkhill, J" uniqKey="Parkhill J">J Parkhill</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, Yw" uniqKey="Wu Y">YW Wu</name>
</author>
<author><name sortKey="Ye, Y" uniqKey="Ye Y">Y Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wheeler, Dl" uniqKey="Wheeler D">DL Wheeler</name>
</author>
<author><name sortKey="Barrett, T" uniqKey="Barrett T">T Barrett</name>
</author>
<author><name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
<author><name sortKey="Bryant, Sh" uniqKey="Bryant S">SH Bryant</name>
</author>
<author><name sortKey="Canese, K" uniqKey="Canese K">K Canese</name>
</author>
<author><name sortKey="Chetvernin, V" uniqKey="Chetvernin V">V Chetvernin</name>
</author>
<author><name sortKey="Church, Dm" uniqKey="Church D">DM Church</name>
</author>
<author><name sortKey="Dicuccio, M" uniqKey="Dicuccio M">M Dicuccio</name>
</author>
<author><name sortKey="Edgar, R" uniqKey="Edgar R">R Edgar</name>
</author>
<author><name sortKey="Federhen, S" uniqKey="Federhen S">S Federhen</name>
</author>
<author><name sortKey="Geer, Ly" uniqKey="Geer L">LY Geer</name>
</author>
<author><name sortKey="Kapustin, Y" uniqKey="Kapustin Y">Y Kapustin</name>
</author>
<author><name sortKey="Khovayko, O" uniqKey="Khovayko O">O Khovayko</name>
</author>
<author><name sortKey="Landsman, D" uniqKey="Landsman D">D Landsman</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
<author><name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author><name sortKey="Maglott, Dr" uniqKey="Maglott D">DR Maglott</name>
</author>
<author><name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
<author><name sortKey="Miller, V" uniqKey="Miller V">V Miller</name>
</author>
<author><name sortKey="Pruitt, Kd" uniqKey="Pruitt K">KD Pruitt</name>
</author>
<author><name sortKey="Schuler, Gd" uniqKey="Schuler G">GD Schuler</name>
</author>
<author><name sortKey="Sequeira, E" uniqKey="Sequeira E">E Sequeira</name>
</author>
<author><name sortKey="Sherry, St" uniqKey="Sherry S">ST Sherry</name>
</author>
<author><name sortKey="Sirotkin, K" uniqKey="Sirotkin K">K Sirotkin</name>
</author>
<author><name sortKey="Souvorov, A" uniqKey="Souvorov A">A Souvorov</name>
</author>
<author><name sortKey="Starchenko, G" uniqKey="Starchenko G">G Starchenko</name>
</author>
<author><name sortKey="Tatusov, Rl" uniqKey="Tatusov R">RL Tatusov</name>
</author>
<author><name sortKey="Tatusova, Ta" uniqKey="Tatusova T">TA Tatusova</name>
</author>
<author><name sortKey="Wagner, L" uniqKey="Wagner L">L Wagner</name>
</author>
<author><name sortKey="Yaschenko, E" uniqKey="Yaschenko E">E Yaschenko</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
<author><name sortKey="Karsch Mizrachi, I" uniqKey="Karsch Mizrachi I">I Karsch-Mizrachi</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
<author><name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
<author><name sortKey="Sayers, Ew" uniqKey="Sayers E">EW Sayers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author><name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wendl, M" uniqKey="Wendl M">M Wendl</name>
</author>
<author><name sortKey="Waterston, R" uniqKey="Waterston R">R Waterston</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, X" uniqKey="Li X">X Li</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Dongen, S" uniqKey="Van Dongen S">S van Dongen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author><name sortKey="Daugherty, Sc" uniqKey="Daugherty S">SC Daugherty</name>
</author>
<author><name sortKey="Van Aken, Se" uniqKey="Van Aken S">SE Van Aken</name>
</author>
<author><name sortKey="Pai, Gh" uniqKey="Pai G">GH Pai</name>
</author>
<author><name sortKey="Watkins, Kl" uniqKey="Watkins K">KL Watkins</name>
</author>
<author><name sortKey="Khouri, H" uniqKey="Khouri H">H Khouri</name>
</author>
<author><name sortKey="Tallon, Lj" uniqKey="Tallon L">LJ Tallon</name>
</author>
<author><name sortKey="Zaborsky, Jm" uniqKey="Zaborsky J">JM Zaborsky</name>
</author>
<author><name sortKey="Dunbar, He" uniqKey="Dunbar H">HE Dunbar</name>
</author>
<author><name sortKey="Tran, Pl" uniqKey="Tran P">PL Tran</name>
</author>
<author><name sortKey="Moran, Na" uniqKey="Moran N">NA Moran</name>
</author>
<author><name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Richter, Dc" uniqKey="Richter D">DC Richter</name>
</author>
<author><name sortKey="Ott, F" uniqKey="Ott F">F Ott</name>
</author>
<author><name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author><name sortKey="Schmid, R" uniqKey="Schmid R">R Schmid</name>
</author>
<author><name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Algorithms Mol Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">Algorithms Mol Biol</journal-id>
<journal-title-group><journal-title>Algorithms for Molecular Biology : AMB</journal-title>
</journal-title-group>
<issn pub-type="epub">1748-7188</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">23009059</article-id>
<article-id pub-id-type="pmc">3537596</article-id>
<article-id pub-id-type="publisher-id">1748-7188-7-27</article-id>
<article-id pub-id-type="doi">10.1186/1748-7188-7-27</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research</subject>
</subj-group>
</article-categories>
<title-group><article-title>Separating metagenomic short reads into genomes via clustering</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" corresp="yes" id="A1"><name><surname>Tanaseichuk</surname>
<given-names>Olga</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>tanaseio@cs.ucr.edu</email>
</contrib>
<contrib contrib-type="author" id="A2"><name><surname>Borneman</surname>
<given-names>James</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>borneman@ucr.edu</email>
</contrib>
<contrib contrib-type="author" id="A3"><name><surname>Jiang</surname>
<given-names>Tao</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>jiang@cs.ucr.edu</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Department of Computer Science and Engineering, University of California, Riverside, CA, USA</aff>
<aff id="I2"><label>2</label>
Department of Plant Pathology and Microbiology, University of California, Riverside, CA, USA</aff>
<pub-date pub-type="collection"><year>2012</year>
</pub-date>
<pub-date pub-type="epub"><day>26</day>
<month>9</month>
<year>2012</year>
</pub-date>
<volume>7</volume>
<fpage>27</fpage>
<lpage>27</lpage>
<history><date date-type="received"><day>4</day>
<month>1</month>
<year>2012</year>
</date>
<date date-type="accepted"><day>14</day>
<month>9</month>
<year>2012</year>
</date>
</history>
<permissions><copyright-statement>Copyright ©2012 Tanaseichuk et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Tanaseichuk et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.almob.org/content"></self-uri>
<abstract><sec><title>Background</title>
<p>The metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS) technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among the existing computational tools for metagenomic analysis, there are similarity-based methods that use reference databases to align reads and composition-based methods that use composition patterns (<italic>i.e.</italic>
, frequencies of short words or <italic>l</italic>
-mers) to cluster reads. Similarity-based methods are unable to classify reads from unknown species without close references (which constitute the majority of reads). Since composition patterns are preserved only in significantly large fragments, composition-based tools cannot be used for very short reads, which becomes a significant limitation with the development of NGS. A recently proposed algorithm, AbundanceBin, introduced another method that bins reads based on predicted abundances of the genomes sequenced. However, it does not separate reads from genomes of similar abundance levels.</p>
</sec>
<sec><title>Results</title>
<p>In this work, we present a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the <italic>l</italic>
-mers belong to unique genomes when <italic>l</italic>
 is sufficiently large. The first phase of the algorithm results in clusters of <italic>l</italic>
-mers each of which belongs to one genome. During the second phase, clusters are merged based on <italic>l</italic>-mer repeat information. These final clusters are used to assign reads. The algorithm could handle very short reads and sequencing errors. It is initially designed for genomes with similar abundance levels and then extended to handle arbitrary abundance ratios. The software can be download for free at
<ext-link ext-link-type="uri" xlink:href="http://www.cs.ucr.edu/~tanaseio/toss.htm">http://www.cs.ucr.edu/∼tanaseio/toss.htm</ext-link>
.</p>
</sec>
<sec><title>Conclusions</title>
<p>Our tests on a large number of simulated metagenomic datasets concerning species at various phylogenetic distances demonstrate that genomes can be separated if the number of common repeats is smaller than the number of genome-specific repeats. For such genomes, our method can separate NGS reads with a high precision and sensitivity.</p>
</sec>
</abstract>
<kwd-group><kwd>Metagenomics</kwd>
<kwd>NGS short reads</kwd>
<kwd>Genome separation</kwd>
<kwd>Clustering</kwd>
</kwd-group>
</article-meta>
</front>
</pmc>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Californie"><name sortKey="Tanaseichuk, Olga" sort="Tanaseichuk, Olga" uniqKey="Tanaseichuk O" first="Olga" last="Tanaseichuk">Olga Tanaseichuk</name>
</region>
<name sortKey="Borneman, James" sort="Borneman, James" uniqKey="Borneman J" first="James" last="Borneman">James Borneman</name>
<name sortKey="Jiang, Tao" sort="Jiang, Tao" uniqKey="Jiang T" first="Tao" last="Jiang">Tao Jiang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Checkpoint

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001280 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 001280 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:3537596
   |texte=   Separating metagenomic short reads into genomes via clustering
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:23009059" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Separating metagenomic short reads into genomes via clustering

Separating metagenomic short reads into genomes via clustering

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki