Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

Identifieur interne : 000960 ( Pmc/Curation ); précédent : 000959; suivant : 000961

Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

Auteurs : Joel Berendzen [États-Unis] ; William J. Bruno [États-Unis] ; Judith D. Cohn [États-Unis] ; Nicolas W. Hengartner [États-Unis] ; Cheryl R. Kuske [États-Unis] ; Benjamin H. Mcmahon [États-Unis] ; Murray A. Wolinsky [États-Unis] ; Gary Xie [États-Unis]

Source :

RBID : PMC:3772700

Abstract

Background

Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers.

Results

At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database.

Conclusions

Classification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.


Url:
DOI: 10.1186/1756-0500-5-460
PubMed: 22925230
PubMed Central: 3772700

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3772700

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Rapid phylogenetic and functional classification of short genomic fragments with signature peptides</title>
<author>
<name sortKey="Berendzen, Joel" sort="Berendzen, Joel" uniqKey="Berendzen J" first="Joel" last="Berendzen">Joel Berendzen</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Physics Division, MS D454, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Physics Division, MS D454, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Bruno, William J" sort="Bruno, William J" uniqKey="Bruno W" first="William J" last="Bruno">William J. Bruno</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Cohn, Judith D" sort="Cohn, Judith D" uniqKey="Cohn J" first="Judith D" last="Cohn">Judith D. Cohn</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Hengartner, Nicolas W" sort="Hengartner, Nicolas W" uniqKey="Hengartner N" first="Nicolas W" last="Hengartner">Nicolas W. Hengartner</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kuske, Cheryl R" sort="Kuske, Cheryl R" uniqKey="Kuske C" first="Cheryl R" last="Kuske">Cheryl R. Kuske</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Mcmahon, Benjamin H" sort="Mcmahon, Benjamin H" uniqKey="Mcmahon B" first="Benjamin H" last="Mcmahon">Benjamin H. Mcmahon</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Wolinsky, Murray A" sort="Wolinsky, Murray A" uniqKey="Wolinsky M" first="Murray A" last="Wolinsky">Murray A. Wolinsky</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Xie, Gary" sort="Xie, Gary" uniqKey="Xie G" first="Gary" last="Xie">Gary Xie</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22925230</idno>
<idno type="pmc">3772700</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3772700</idno>
<idno type="RBID">PMC:3772700</idno>
<idno type="doi">10.1186/1756-0500-5-460</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000960</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000960</idno>
<idno type="wicri:Area/Pmc/Curation">000960</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000960</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Rapid phylogenetic and functional classification of short genomic fragments with signature peptides</title>
<author>
<name sortKey="Berendzen, Joel" sort="Berendzen, Joel" uniqKey="Berendzen J" first="Joel" last="Berendzen">Joel Berendzen</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Physics Division, MS D454, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Physics Division, MS D454, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Bruno, William J" sort="Bruno, William J" uniqKey="Bruno W" first="William J" last="Bruno">William J. Bruno</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Cohn, Judith D" sort="Cohn, Judith D" uniqKey="Cohn J" first="Judith D" last="Cohn">Judith D. Cohn</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Hengartner, Nicolas W" sort="Hengartner, Nicolas W" uniqKey="Hengartner N" first="Nicolas W" last="Hengartner">Nicolas W. Hengartner</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kuske, Cheryl R" sort="Kuske, Cheryl R" uniqKey="Kuske C" first="Cheryl R" last="Kuske">Cheryl R. Kuske</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Mcmahon, Benjamin H" sort="Mcmahon, Benjamin H" uniqKey="Mcmahon B" first="Benjamin H" last="Mcmahon">Benjamin H. Mcmahon</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Wolinsky, Murray A" sort="Wolinsky, Murray A" uniqKey="Wolinsky M" first="Murray A" last="Wolinsky">Murray A. Wolinsky</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Xie, Gary" sort="Xie, Gary" uniqKey="Xie G" first="Gary" last="Xie">Gary Xie</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Research Notes</title>
<idno type="eISSN">1756-0500</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers.</p>
</sec>
<sec>
<title>Results</title>
<p>At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Classification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Daniel, R" uniqKey="Daniel R">R Daniel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tamames, J" uniqKey="Tamames J">J Tamames</name>
</author>
<author>
<name sortKey="Abellan, Jj" uniqKey="Abellan J">JJ Abellan</name>
</author>
<author>
<name sortKey="Pignatelli, M" uniqKey="Pignatelli M">M Pignatelli</name>
</author>
<author>
<name sortKey="Camacho, A" uniqKey="Camacho A">A Camacho</name>
</author>
<author>
<name sortKey="Moya, A" uniqKey="Moya A">A Moya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blaser, Mj" uniqKey="Blaser M">MJ Blaser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Handelsman, J" uniqKey="Handelsman J">J Handelsman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
<author>
<name sortKey="Rasmus, N" uniqKey="Rasmus N">N Rasmus</name>
</author>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Pedersen, Am" uniqKey="Pedersen A">AM Pedersen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Worth, Cl" uniqKey="Worth C">CL Worth</name>
</author>
<author>
<name sortKey="Gong, S" uniqKey="Gong S">S Gong</name>
</author>
<author>
<name sortKey="Blundell, Tl" uniqKey="Blundell T">TL Blundell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haque, M" uniqKey="Haque M">M Haque</name>
</author>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
<author>
<name sortKey="Komanduri, D" uniqKey="Komanduri D">D Komanduri</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
<author>
<name sortKey="Haque, M" uniqKey="Haque M">M Haque</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author>
<name sortKey="Krogh, A" uniqKey="Krogh A">A Krogh</name>
</author>
<author>
<name sortKey="Mitchison, G" uniqKey="Mitchison G">G Mitchison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bateman, A" uniqKey="Bateman A">A Bateman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rusch, Db" uniqKey="Rusch D">DB Rusch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weingart, U" uniqKey="Weingart U">U Weingart</name>
</author>
<author>
<name sortKey="Persi, E" uniqKey="Persi E">E Persi</name>
</author>
<author>
<name sortKey="Gophna, U" uniqKey="Gophna U">U Gophna</name>
</author>
<author>
<name sortKey="Horn, D" uniqKey="Horn D">D Horn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Horan, K" uniqKey="Horan K">K Horan</name>
</author>
<author>
<name sortKey="Shelton, Cr" uniqKey="Shelton C">CR Shelton</name>
</author>
<author>
<name sortKey="Girke, T" uniqKey="Girke T">T Girke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Overbeek, R" uniqKey="Overbeek R">R Overbeek</name>
</author>
<author>
<name sortKey="Rodriguez, A" uniqKey="Rodriguez A">A Rodriguez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hulo, N" uniqKey="Hulo N">N Hulo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, M" uniqKey="Wu M">M Wu</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stark, M" uniqKey="Stark M">M Stark</name>
</author>
<author>
<name sortKey="Berger, Sa" uniqKey="Berger S">SA Berger</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Von Mering, Ac" uniqKey="Von Mering A">AC von Mering</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kembel, Sw" uniqKey="Kembel S">SW Kembel</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author>
<name sortKey="Pollard, Ks" uniqKey="Pollard K">KS Pollard</name>
</author>
<author>
<name sortKey="Green, Jl" uniqKey="Green J">JL Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roth, S" uniqKey="Roth S">S Roth</name>
</author>
<author>
<name sortKey="Jung, K" uniqKey="Jung K">K Jung</name>
</author>
<author>
<name sortKey="Jung, H" uniqKey="Jung H">H Jung</name>
</author>
<author>
<name sortKey="Hommel, Rk" uniqKey="Hommel R">RK Hommel</name>
</author>
<author>
<name sortKey="Kleber, Hp" uniqKey="Kleber H">HP Kleber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fulton, Dl" uniqKey="Fulton D">DL Fulton</name>
</author>
<author>
<name sortKey="Li, Yy" uniqKey="Li Y">YY Li</name>
</author>
<author>
<name sortKey="Laird, Mr" uniqKey="Laird M">MR Laird</name>
</author>
<author>
<name sortKey="Hrosman, Bgs" uniqKey="Hrosman B">BGS Hrosman</name>
</author>
<author>
<name sortKey="Roche, Fm" uniqKey="Roche F">FM Roche</name>
</author>
<author>
<name sortKey="Brinkman, Fsl" uniqKey="Brinkman F">FSL Brinkman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wommack, Ke" uniqKey="Wommack K">KE Wommack</name>
</author>
<author>
<name sortKey="Bhavsar, J" uniqKey="Bhavsar J">J Bhavsar</name>
</author>
<author>
<name sortKey="Ravel, J" uniqKey="Ravel J">J Ravel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vos, M" uniqKey="Vos M">M Vos</name>
</author>
<author>
<name sortKey="Quince, C" uniqKey="Quince C">C Quince</name>
</author>
<author>
<name sortKey="Pijl, As" uniqKey="Pijl A">AS Pijl</name>
</author>
<author>
<name sortKey="De Hollander, M" uniqKey="De Hollander M">M de Hollander</name>
</author>
<author>
<name sortKey="Kowalchuk, Ga" uniqKey="Kowalchuk G">GA Kowalchuk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ohno, S" uniqKey="Ohno S">S Ohno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bennett, Mj" uniqKey="Bennett M">MJ Bennett</name>
</author>
<author>
<name sortKey="Schlunegger, Mp" uniqKey="Schlunegger M">MP Schlunegger</name>
</author>
<author>
<name sortKey="Eisenberg, D" uniqKey="Eisenberg D">D Eisenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doolittle, Fw" uniqKey="Doolittle F">FW Doolittle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcdaniel, Ld" uniqKey="Mcdaniel L">LD McDaniel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Price, Mn" uniqKey="Price M">MN Price</name>
</author>
<author>
<name sortKey="Dehal, Ps" uniqKey="Dehal P">PS Dehal</name>
</author>
<author>
<name sortKey="Arkin, Ap" uniqKey="Arkin A">AP Arkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hong, Sh" uniqKey="Hong S">SH Hong</name>
</author>
<author>
<name sortKey="Bunge, J" uniqKey="Bunge J">J Bunge</name>
</author>
<author>
<name sortKey="Leslin, C" uniqKey="Leslin C">C Leslin</name>
</author>
<author>
<name sortKey="Jeon, S" uniqKey="Jeon S">S Jeon</name>
</author>
<author>
<name sortKey="Epstein, Ss" uniqKey="Epstein S">SS Epstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morgan, Jl" uniqKey="Morgan J">JL Morgan</name>
</author>
<author>
<name sortKey="Darling, Ae" uniqKey="Darling A">AE Darling</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dastager, Sg" uniqKey="Dastager S">SG Dastager</name>
</author>
<author>
<name sortKey="Lee, J C" uniqKey="Lee J">J-C Lee</name>
</author>
<author>
<name sortKey="Ju, Y J" uniqKey="Ju Y">Y-J Ju</name>
</author>
<author>
<name sortKey="Park, D J" uniqKey="Park D">D-J Park</name>
</author>
<author>
<name sortKey="Kim, C J" uniqKey="Kim C">C-J Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martiny, Jbh" uniqKey="Martiny J">JBH Martiny</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author>
<name sortKey="Penn, K" uniqKey="Penn K">K Penn</name>
</author>
<author>
<name sortKey="Allison, Sd" uniqKey="Allison S">SD Allison</name>
</author>
<author>
<name sortKey="Horner Devine, Mc" uniqKey="Horner Devine M">MC Horner-Devine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Overbeek, R" uniqKey="Overbeek R">R Overbeek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcneil, Lk" uniqKey="Mcneil L">LK McNeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Ye, Y" uniqKey="Ye Y">Y Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mohammed, Mh" uniqKey="Mohammed M">MH Mohammed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mohammed, Mh" uniqKey="Mohammed M">MH Mohammed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosen, G" uniqKey="Rosen G">G Rosen</name>
</author>
<author>
<name sortKey="Garbarine, E" uniqKey="Garbarine E">E Garbarine</name>
</author>
<author>
<name sortKey="Caseiro, D" uniqKey="Caseiro D">D Caseiro</name>
</author>
<author>
<name sortKey="Polikar, R" uniqKey="Polikar R">R Polikar</name>
</author>
<author>
<name sortKey="Sokhansanj, B" uniqKey="Sokhansanj B">B Sokhansanj</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerlach, W" uniqKey="Gerlach W">W Gerlach</name>
</author>
<author>
<name sortKey="Junemann, S" uniqKey="Junemann S">S Junemann</name>
</author>
<author>
<name sortKey="Tille, F" uniqKey="Tille F">F Tille</name>
</author>
<author>
<name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author>
<name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sims, Ge" uniqKey="Sims G">GE Sims</name>
</author>
<author>
<name sortKey="Kim, S H" uniqKey="Kim S">S-H Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weber, Cf" uniqKey="Weber C">CF Weber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gomez Alvarez, V" uniqKey="Gomez Alvarez V">V Gomez-Alvarez</name>
</author>
<author>
<name sortKey="Teal, Tk" uniqKey="Teal T">TK Teal</name>
</author>
<author>
<name sortKey="Schmidt, Tm" uniqKey="Schmidt T">TM Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niu, B" uniqKey="Niu B">B Niu</name>
</author>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L Fu</name>
</author>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S Sun</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cole, Jr" uniqKey="Cole J">JR Cole</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hall, T" uniqKey="Hall T">T Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bruno, Wj" uniqKey="Bruno W">WJ Bruno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bruno, Wj" uniqKey="Bruno W">WJ Bruno</name>
</author>
<author>
<name sortKey="Socci, Nd" uniqKey="Socci N">ND Socci</name>
</author>
<author>
<name sortKey="Halpern, Al" uniqKey="Halpern A">AL Halpern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skophammer, Rg" uniqKey="Skophammer R">RG Skophammer</name>
</author>
<author>
<name sortKey="Servin, Ja" uniqKey="Servin J">JA Servin</name>
</author>
<author>
<name sortKey="Herbold, Cw" uniqKey="Herbold C">CW Herbold</name>
</author>
<author>
<name sortKey="Lake, Ja" uniqKey="Lake J">JA Lake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Herlemann, Dpr" uniqKey="Herlemann D">DPR Herlemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cock, Pj" uniqKey="Cock P">PJ Cock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zmasek, Cm" uniqKey="Zmasek C">CM Zmasek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gospodnetic, O" uniqKey="Gospodnetic O">O Gospodnetic</name>
</author>
<author>
<name sortKey="Hatcher, E" uniqKey="Hatcher E">E Hatcher</name>
</author>
<author>
<name sortKey="Mccandless, M" uniqKey="Mccandless M">M McCandless</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rice, P" uniqKey="Rice P">P Rice</name>
</author>
<author>
<name sortKey="Longden, I" uniqKey="Longden I">I Longden</name>
</author>
<author>
<name sortKey="Bleasby, A" uniqKey="Bleasby A">A Bleasby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Richter, Dc" uniqKey="Richter D">DC Richter</name>
</author>
<author>
<name sortKey="Ott, F" uniqKey="Ott F">F Ott</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Schmid, R" uniqKey="Schmid R">R Schmid</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Res Notes</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Res Notes</journal-id>
<journal-title-group>
<journal-title>BMC Research Notes</journal-title>
</journal-title-group>
<issn pub-type="epub">1756-0500</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22925230</article-id>
<article-id pub-id-type="pmc">3772700</article-id>
<article-id pub-id-type="publisher-id">1756-0500-5-460</article-id>
<article-id pub-id-type="doi">10.1186/1756-0500-5-460</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Rapid phylogenetic and functional classification of short genomic fragments with signature peptides</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Berendzen</surname>
<given-names>Joel</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>joelb@lanl.gov</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Bruno</surname>
<given-names>William J</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>billb@lanl.gov</email>
</contrib>
<contrib contrib-type="author" id="A3">
<name>
<surname>Cohn</surname>
<given-names>Judith D</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>jcohn@lanl.gov</email>
</contrib>
<contrib contrib-type="author" id="A4">
<name>
<surname>Hengartner</surname>
<given-names>Nicolas W</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>nickh@lanl.gov</email>
</contrib>
<contrib contrib-type="author" id="A5">
<name>
<surname>Kuske</surname>
<given-names>Cheryl R</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>kuske@lanl.gov</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A6">
<name>
<surname>McMahon</surname>
<given-names>Benjamin H</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>mcmahon@lanl.gov</email>
</contrib>
<contrib contrib-type="author" id="A7">
<name>
<surname>Wolinsky</surname>
<given-names>Murray A</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>murray@lanl.gov</email>
</contrib>
<contrib contrib-type="author" id="A8">
<name>
<surname>Xie</surname>
<given-names>Gary</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>xie@lanl.gov</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Physics Division, MS D454, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</aff>
<aff id="I2">
<label>2</label>
Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</aff>
<aff id="I3">
<label>3</label>
Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</aff>
<aff id="I4">
<label>4</label>
Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>8</month>
<year>2012</year>
</pub-date>
<volume>5</volume>
<fpage>460</fpage>
<lpage>460</lpage>
<history>
<date date-type="received">
<day>21</day>
<month>6</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>8</day>
<month>8</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2012 Berendzen et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Berendzen et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1756-0500/5/460"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers.</p>
</sec>
<sec>
<title>Results</title>
<p>At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Classification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.</p>
</sec>
</abstract>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000960 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000960 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:3772700
   |texte=   Rapid phylogenetic and functional classification of short genomic fragments with signature peptides
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:22925230" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021