Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples

Identifieur interne : 000304 ( Pmc/Corpus ); précédent : 000303; suivant : 000305

MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples

Auteurs : Nathan Lapierre ; Serghei Mangul ; Mohammed Alser ; Igor Mandric ; Nicholas C. Wu ; David Koslicki ; Eleazar Eskin

Source :

RBID : PMC:6551237

Abstract

Background

High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes.

Results

Here we present a method, MiCoP (Microbiome Community Profiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project.

Conclusions

MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at: https://github.com/smangul1/MiCoP.


Url:
DOI: 10.1186/s12864-019-5699-9
PubMed: 31167634
PubMed Central: 6551237

Links to Exploration step

PMC:6551237

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples</title>
<author>
<name sortKey="Lapierre, Nathan" sort="Lapierre, Nathan" uniqKey="Lapierre N" first="Nathan" last="Lapierre">Nathan Lapierre</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mangul, Serghei" sort="Mangul, Serghei" uniqKey="Mangul S" first="Serghei" last="Mangul">Serghei Mangul</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alser, Mohammed" sort="Alser, Mohammed" uniqKey="Alser M" first="Mohammed" last="Alser">Mohammed Alser</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 2780</institution-id>
<institution-id institution-id-type="GRID">grid.5801.c</institution-id>
<institution>Department of Computer Science,</institution>
<institution>ETH Zürich,</institution>
</institution-wrap>
Zürich, 8092 Switzerland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mandric, Igor" sort="Mandric, Igor" uniqKey="Mandric I" first="Igor" last="Mandric">Igor Mandric</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wu, Nicholas C" sort="Wu, Nicholas C" uniqKey="Wu N" first="Nicholas C." last="Wu">Nicholas C. Wu</name>
<affiliation>
<nlm:aff id="Aff4">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000122199231</institution-id>
<institution-id institution-id-type="GRID">grid.214007.0</institution-id>
<institution>Department of Integrative Structural and Computational Biology,</institution>
<institution>The Scripps Research Institute,</institution>
</institution-wrap>
La Jolla, CA92037 USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Koslicki, David" sort="Koslicki, David" uniqKey="Koslicki D" first="David" last="Koslicki">David Koslicki</name>
<affiliation>
<nlm:aff id="Aff5">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2112 1969</institution-id>
<institution-id institution-id-type="GRID">grid.4391.f</institution-id>
<institution>Department of Mathematics,</institution>
<institution>Oregon State University,</institution>
</institution-wrap>
Corvallis, 97331 OR USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Eskin, Eleazar" sort="Eskin, Eleazar" uniqKey="Eskin E" first="Eleazar" last="Eskin">Eleazar Eskin</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Human Genetics,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">31167634</idno>
<idno type="pmc">6551237</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551237</idno>
<idno type="RBID">PMC:6551237</idno>
<idno type="doi">10.1186/s12864-019-5699-9</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000304</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000304</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples</title>
<author>
<name sortKey="Lapierre, Nathan" sort="Lapierre, Nathan" uniqKey="Lapierre N" first="Nathan" last="Lapierre">Nathan Lapierre</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mangul, Serghei" sort="Mangul, Serghei" uniqKey="Mangul S" first="Serghei" last="Mangul">Serghei Mangul</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alser, Mohammed" sort="Alser, Mohammed" uniqKey="Alser M" first="Mohammed" last="Alser">Mohammed Alser</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 2780</institution-id>
<institution-id institution-id-type="GRID">grid.5801.c</institution-id>
<institution>Department of Computer Science,</institution>
<institution>ETH Zürich,</institution>
</institution-wrap>
Zürich, 8092 Switzerland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mandric, Igor" sort="Mandric, Igor" uniqKey="Mandric I" first="Igor" last="Mandric">Igor Mandric</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wu, Nicholas C" sort="Wu, Nicholas C" uniqKey="Wu N" first="Nicholas C." last="Wu">Nicholas C. Wu</name>
<affiliation>
<nlm:aff id="Aff4">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000122199231</institution-id>
<institution-id institution-id-type="GRID">grid.214007.0</institution-id>
<institution>Department of Integrative Structural and Computational Biology,</institution>
<institution>The Scripps Research Institute,</institution>
</institution-wrap>
La Jolla, CA92037 USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Koslicki, David" sort="Koslicki, David" uniqKey="Koslicki D" first="David" last="Koslicki">David Koslicki</name>
<affiliation>
<nlm:aff id="Aff5">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2112 1969</institution-id>
<institution-id institution-id-type="GRID">grid.4391.f</institution-id>
<institution>Department of Mathematics,</institution>
<institution>Oregon State University,</institution>
</institution-wrap>
Corvallis, 97331 OR USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Eskin, Eleazar" sort="Eskin, Eleazar" uniqKey="Eskin E" first="Eleazar" last="Eskin">Eleazar Eskin</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Human Genetics,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes.</p>
</sec>
<sec>
<title>Results</title>
<p>Here we present a method, MiCoP (
<bold>Mi</bold>
crobiome
<bold>Co</bold>
mmunity
<bold>P</bold>
rofiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at:
<ext-link ext-link-type="uri" xlink:href="https://github.com/smangul1/MiCoP">https://github.com/smangul1/MiCoP</ext-link>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Handelsman, J" uniqKey="Handelsman J">J Handelsman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wooley, Jc" uniqKey="Wooley J">JC Wooley</name>
</author>
<author>
<name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
<author>
<name sortKey="Friedberg, I" uniqKey="Friedberg I">I Friedberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stewart, Ej" uniqKey="Stewart E">EJ Stewart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Venter, Jc" uniqKey="Venter J">JC Venter</name>
</author>
<author>
<name sortKey="Remington, K" uniqKey="Remington K">K Remington</name>
</author>
<author>
<name sortKey="Heidelberg, Jf" uniqKey="Heidelberg J">JF Heidelberg</name>
</author>
<author>
<name sortKey="Halpern, Al" uniqKey="Halpern A">AL Halpern</name>
</author>
<author>
<name sortKey="Rusch, D" uniqKey="Rusch D">D Rusch</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author>
<name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author>
<name sortKey="Paulsen, I" uniqKey="Paulsen I">I Paulsen</name>
</author>
<author>
<name sortKey="Nelson, Ke" uniqKey="Nelson K">KE Nelson</name>
</author>
<author>
<name sortKey="Nelson, W" uniqKey="Nelson W">W Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Tyson, Gw" uniqKey="Tyson G">GW Tyson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosario, K" uniqKey="Rosario K">K Rosario</name>
</author>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M Breitbart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rose, R" uniqKey="Rose R">R Rose</name>
</author>
<author>
<name sortKey="Constantinides, B" uniqKey="Constantinides B">B Constantinides</name>
</author>
<author>
<name sortKey="Tapinos, A" uniqKey="Tapinos A">A Tapinos</name>
</author>
<author>
<name sortKey="Robertson, Dl" uniqKey="Robertson D">DL Robertson</name>
</author>
<author>
<name sortKey="Prosperi, M" uniqKey="Prosperi M">M Prosperi</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Gibbons, T" uniqKey="Gibbons T">T Gibbons</name>
</author>
<author>
<name sortKey="Ghodsi, M" uniqKey="Ghodsi M">M Ghodsi</name>
</author>
<author>
<name sortKey="Treangen, T" uniqKey="Treangen T">T Treangen</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
<author>
<name sortKey="Waldron, L" uniqKey="Waldron L">L Waldron</name>
</author>
<author>
<name sortKey="Ballarini, A" uniqKey="Ballarini A">A Ballarini</name>
</author>
<author>
<name sortKey="Narasimhan, V" uniqKey="Narasimhan V">V Narasimhan</name>
</author>
<author>
<name sortKey="Jousson, O" uniqKey="Jousson O">O Jousson</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Truong, Dt" uniqKey="Truong D">DT Truong</name>
</author>
<author>
<name sortKey="Franzosa, Ea" uniqKey="Franzosa E">EA Franzosa</name>
</author>
<author>
<name sortKey="Tickle, Tl" uniqKey="Tickle T">TL Tickle</name>
</author>
<author>
<name sortKey="Scholz, M" uniqKey="Scholz M">M Scholz</name>
</author>
<author>
<name sortKey="Weingart, G" uniqKey="Weingart G">G Weingart</name>
</author>
<author>
<name sortKey="Pasolli, E" uniqKey="Pasolli E">E Pasolli</name>
</author>
<author>
<name sortKey="Tett, A" uniqKey="Tett A">A Tett</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
<author>
<name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Willner, D" uniqKey="Willner D">D Willner</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, De" uniqKey="Wood D">DE Wood</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sczyrba, A" uniqKey="Sczyrba A">A Sczyrba</name>
</author>
<author>
<name sortKey="Hofmann, P" uniqKey="Hofmann P">P Hofmann</name>
</author>
<author>
<name sortKey="Belmann, P" uniqKey="Belmann P">P Belmann</name>
</author>
<author>
<name sortKey="Koslicki, D" uniqKey="Koslicki D">D Koslicki</name>
</author>
<author>
<name sortKey="Janssen, S" uniqKey="Janssen S">S Janssen</name>
</author>
<author>
<name sortKey="Droge, J" uniqKey="Droge J">J Dröge</name>
</author>
<author>
<name sortKey="Gregor, I" uniqKey="Gregor I">I Gregor</name>
</author>
<author>
<name sortKey="Majda, S" uniqKey="Majda S">S Majda</name>
</author>
<author>
<name sortKey="Fiedler, J" uniqKey="Fiedler J">J Fiedler</name>
</author>
<author>
<name sortKey="Dahms, E" uniqKey="Dahms E">E Dahms</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cowan, D" uniqKey="Cowan D">D Cowan</name>
</author>
<author>
<name sortKey="Meyer, Q" uniqKey="Meyer Q">Q Meyer</name>
</author>
<author>
<name sortKey="Stafford, W" uniqKey="Stafford W">W Stafford</name>
</author>
<author>
<name sortKey="Muyanga, S" uniqKey="Muyanga S">S Muyanga</name>
</author>
<author>
<name sortKey="Cameron, R" uniqKey="Cameron R">R Cameron</name>
</author>
<author>
<name sortKey="Wittwer, P" uniqKey="Wittwer P">P Wittwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ounit, R" uniqKey="Ounit R">R Ounit</name>
</author>
<author>
<name sortKey="Wanamaker, S" uniqKey="Wanamaker S">S Wanamaker</name>
</author>
<author>
<name sortKey="Close, Tj" uniqKey="Close T">TJ Close</name>
</author>
<author>
<name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Petersen, Tn" uniqKey="Petersen T">TN Petersen</name>
</author>
<author>
<name sortKey="Lukjancenko, O" uniqKey="Lukjancenko O">O Lukjancenko</name>
</author>
<author>
<name sortKey="Thomsen, Mcf" uniqKey="Thomsen M">MCF Thomsen</name>
</author>
<author>
<name sortKey="Sperotto, Mm" uniqKey="Sperotto M">MM Sperotto</name>
</author>
<author>
<name sortKey="Lund, O" uniqKey="Lund O">O Lund</name>
</author>
<author>
<name sortKey="Aarestrup, Fm" uniqKey="Aarestrup F">FM Aarestrup</name>
</author>
<author>
<name sortKey="Sicheritz Ponten, T" uniqKey="Sicheritz Ponten T">T Sicheritz-Pontén</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corvelo, A" uniqKey="Corvelo A">A Corvelo</name>
</author>
<author>
<name sortKey="Clarke, We" uniqKey="Clarke W">WE Clarke</name>
</author>
<author>
<name sortKey="Robine, N" uniqKey="Robine N">N Robine</name>
</author>
<author>
<name sortKey="Zody, Mc" uniqKey="Zody M">MC Zody</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author>
<name sortKey="Wagner, L" uniqKey="Wagner L">L Wagner</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author>
<name sortKey="Song, L" uniqKey="Song L">L Song</name>
</author>
<author>
<name sortKey="Breitwieser, Fp" uniqKey="Breitwieser F">FP Breitwieser</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Methe, Ba" uniqKey="Methe B">BA Methé</name>
</author>
<author>
<name sortKey="Nelson, Ke" uniqKey="Nelson K">KE Nelson</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Creasy, Hh" uniqKey="Creasy H">HH Creasy</name>
</author>
<author>
<name sortKey="Giglio, Mg" uniqKey="Giglio M">MG Giglio</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
<author>
<name sortKey="Gevers, D" uniqKey="Gevers D">D Gevers</name>
</author>
<author>
<name sortKey="Petrosino, Jf" uniqKey="Petrosino J">JF Petrosino</name>
</author>
<author>
<name sortKey="Abubucker, S" uniqKey="Abubucker S">S Abubucker</name>
</author>
<author>
<name sortKey="Badger, Jh" uniqKey="Badger J">JH Badger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
<author>
<name sortKey="Gevers, D" uniqKey="Gevers D">D Gevers</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Abubucker, S" uniqKey="Abubucker S">S Abubucker</name>
</author>
<author>
<name sortKey="Badger, Jh" uniqKey="Badger J">JH Badger</name>
</author>
<author>
<name sortKey="Chinwalla, At" uniqKey="Chinwalla A">AT Chinwalla</name>
</author>
<author>
<name sortKey="Creasy, Hh" uniqKey="Creasy H">HH Creasy</name>
</author>
<author>
<name sortKey="Earl, Am" uniqKey="Earl A">AM Earl</name>
</author>
<author>
<name sortKey="Fitzgerald, Mg" uniqKey="Fitzgerald M">MG FitzGerald</name>
</author>
<author>
<name sortKey="Fulton, Rs" uniqKey="Fulton R">RS Fulton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paez Espino, D" uniqKey="Paez Espino D">D Paez-Espino</name>
</author>
<author>
<name sortKey="Eloe Fadrosh, Ea" uniqKey="Eloe Fadrosh E">EA Eloe-Fadrosh</name>
</author>
<author>
<name sortKey="Pavlopoulos, Ga" uniqKey="Pavlopoulos G">GA Pavlopoulos</name>
</author>
<author>
<name sortKey="Thomas, Ad" uniqKey="Thomas A">AD Thomas</name>
</author>
<author>
<name sortKey="Huntemann, M" uniqKey="Huntemann M">M Huntemann</name>
</author>
<author>
<name sortKey="Mikhailova, N" uniqKey="Mikhailova N">N Mikhailova</name>
</author>
<author>
<name sortKey="Rubin, E" uniqKey="Rubin E">E Rubin</name>
</author>
<author>
<name sortKey="Ivanova, Nn" uniqKey="Ivanova N">NN Ivanova</name>
</author>
<author>
<name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dutilh, Be" uniqKey="Dutilh B">BE Dutilh</name>
</author>
<author>
<name sortKey="Cassman, N" uniqKey="Cassman N">N Cassman</name>
</author>
<author>
<name sortKey="Mcnair, K" uniqKey="Mcnair K">K McNair</name>
</author>
<author>
<name sortKey="Sanchez, Se" uniqKey="Sanchez S">SE Sanchez</name>
</author>
<author>
<name sortKey="Silva, Gg" uniqKey="Silva G">GG Silva</name>
</author>
<author>
<name sortKey="Boling, L" uniqKey="Boling L">L Boling</name>
</author>
<author>
<name sortKey="Barr, Jj" uniqKey="Barr J">JJ Barr</name>
</author>
<author>
<name sortKey="Speth, Dr" uniqKey="Speth D">DR Speth</name>
</author>
<author>
<name sortKey="Seguritan, V" uniqKey="Seguritan V">V Seguritan</name>
</author>
<author>
<name sortKey="Aziz, Rk" uniqKey="Aziz R">RK Aziz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aziz, Rk" uniqKey="Aziz R">RK Aziz</name>
</author>
<author>
<name sortKey="Dwivedi, B" uniqKey="Dwivedi B">B Dwivedi</name>
</author>
<author>
<name sortKey="Akhter, S" uniqKey="Akhter S">S Akhter</name>
</author>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M Breitbart</name>
</author>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Conceicao Neto, N" uniqKey="Conceicao Neto N">N Conceição-Neto</name>
</author>
<author>
<name sortKey="Zeller, M" uniqKey="Zeller M">M Zeller</name>
</author>
<author>
<name sortKey="Lefrere, H" uniqKey="Lefrere H">H Lefrère</name>
</author>
<author>
<name sortKey="De Bruyn, P" uniqKey="De Bruyn P">P De Bruyn</name>
</author>
<author>
<name sortKey="Beller, L" uniqKey="Beller L">L Beller</name>
</author>
<author>
<name sortKey="Deboutte, W" uniqKey="Deboutte W">W Deboutte</name>
</author>
<author>
<name sortKey="Yinda, Ck" uniqKey="Yinda C">CK Yinda</name>
</author>
<author>
<name sortKey="Lavigne, R" uniqKey="Lavigne R">R Lavigne</name>
</author>
<author>
<name sortKey="Maes, P" uniqKey="Maes P">P Maes</name>
</author>
<author>
<name sortKey="Van Ranst, M" uniqKey="Van Ranst M">M Van Ranst</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parrish, Cr" uniqKey="Parrish C">CR Parrish</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tonge, Dp" uniqKey="Tonge D">DP Tonge</name>
</author>
<author>
<name sortKey="Pashley, Ch" uniqKey="Pashley C">CH Pashley</name>
</author>
<author>
<name sortKey="Gant, Tw" uniqKey="Gant T">TW Gant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abeles, Sr" uniqKey="Abeles S">SR Abeles</name>
</author>
<author>
<name sortKey="Robles Sikisaka, R" uniqKey="Robles Sikisaka R">R Robles-Sikisaka</name>
</author>
<author>
<name sortKey="Ly, M" uniqKey="Ly M">M Ly</name>
</author>
<author>
<name sortKey="Lum, Ag" uniqKey="Lum A">AG Lum</name>
</author>
<author>
<name sortKey="Salzman, J" uniqKey="Salzman J">J Salzman</name>
</author>
<author>
<name sortKey="Boehm, Tk" uniqKey="Boehm T">TK Boehm</name>
</author>
<author>
<name sortKey="Pride, Dt" uniqKey="Pride D">DT Pride</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wade, Wg" uniqKey="Wade W">WG Wade</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davenport, Cf" uniqKey="Davenport C">CF Davenport</name>
</author>
<author>
<name sortKey="Neugebauer, J" uniqKey="Neugebauer J">J Neugebauer</name>
</author>
<author>
<name sortKey="Beckmann, N" uniqKey="Beckmann N">N Beckmann</name>
</author>
<author>
<name sortKey="Friedrich, B" uniqKey="Friedrich B">B Friedrich</name>
</author>
<author>
<name sortKey="Kameri, B" uniqKey="Kameri B">B Kameri</name>
</author>
<author>
<name sortKey="Kokott, S" uniqKey="Kokott S">S Kokott</name>
</author>
<author>
<name sortKey="Paetow, M" uniqKey="Paetow M">M Paetow</name>
</author>
<author>
<name sortKey="Siekmann, B" uniqKey="Siekmann B">B Siekmann</name>
</author>
<author>
<name sortKey="Wieding Drewes, M" uniqKey="Wieding Drewes M">M Wieding-Drewes</name>
</author>
<author>
<name sortKey="Wienhofer, M" uniqKey="Wienhofer M">M Wienhöfer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buchfink, B" uniqKey="Buchfink B">B Buchfink</name>
</author>
<author>
<name sortKey="Xie, C" uniqKey="Xie C">C Xie</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angly, Fe" uniqKey="Angly F">FE Angly</name>
</author>
<author>
<name sortKey="Willner, D" uniqKey="Willner D">D Willner</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Tyson, Gw" uniqKey="Tyson G">GW Tyson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">31167634</article-id>
<article-id pub-id-type="pmc">6551237</article-id>
<article-id pub-id-type="publisher-id">5699</article-id>
<article-id pub-id-type="doi">10.1186/s12864-019-5699-9</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes">
<name>
<surname>LaPierre</surname>
<given-names>Nathan</given-names>
</name>
<address>
<email>nlapier2@cs.ucla.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes" equal-contrib="yes">
<name>
<surname>Mangul</surname>
<given-names>Serghei</given-names>
</name>
<address>
<email>smangul@ucla.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Alser</surname>
<given-names>Mohammed</given-names>
</name>
<address>
<email>mohammed.alser@inf.ethz.ch</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mandric</surname>
<given-names>Igor</given-names>
</name>
<address>
<email>imandric@ucla.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wu</surname>
<given-names>Nicholas C.</given-names>
</name>
<address>
<email>nicwu@scripps.edu</email>
</address>
<xref ref-type="aff" rid="Aff4">4</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<name>
<surname>Koslicki</surname>
<given-names>David</given-names>
</name>
<address>
<email>david.koslicki@math.oregonstate.edu</email>
</address>
<xref ref-type="aff" rid="Aff5">5</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<name>
<surname>Eskin</surname>
<given-names>Eleazar</given-names>
</name>
<address>
<email>eeskin@cs.ucla.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Computer Science,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9632 6718</institution-id>
<institution-id institution-id-type="GRID">grid.19006.3e</institution-id>
<institution>Department of Human Genetics,</institution>
<institution>University of California,</institution>
</institution-wrap>
Los Angeles, 90095 CA USA</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 2780</institution-id>
<institution-id institution-id-type="GRID">grid.5801.c</institution-id>
<institution>Department of Computer Science,</institution>
<institution>ETH Zürich,</institution>
</institution-wrap>
Zürich, 8092 Switzerland</aff>
<aff id="Aff4">
<label>4</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000122199231</institution-id>
<institution-id institution-id-type="GRID">grid.214007.0</institution-id>
<institution>Department of Integrative Structural and Computational Biology,</institution>
<institution>The Scripps Research Institute,</institution>
</institution-wrap>
La Jolla, CA92037 USA</aff>
<aff id="Aff5">
<label>5</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2112 1969</institution-id>
<institution-id institution-id-type="GRID">grid.4391.f</institution-id>
<institution>Department of Mathematics,</institution>
<institution>Oregon State University,</institution>
</institution-wrap>
Corvallis, 97331 OR USA</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>6</day>
<month>6</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>6</day>
<month>6</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>20</volume>
<issue>Suppl 5</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.</issue-sponsor>
<elocation-id>423</elocation-id>
<permissions>
<copyright-statement>© The Author(s) 2019</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes.</p>
</sec>
<sec>
<title>Results</title>
<p>Here we present a method, MiCoP (
<bold>Mi</bold>
crobiome
<bold>Co</bold>
mmunity
<bold>P</bold>
rofiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at:
<ext-link ext-link-type="uri" xlink:href="https://github.com/smangul1/MiCoP">https://github.com/smangul1/MiCoP</ext-link>
.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Metagenomics</kwd>
<kwd>Virome</kwd>
<kwd>Eukaryome</kwd>
<kwd>Abundance estimation</kwd>
<kwd>Community profiling</kwd>
<kwd>Alignment</kwd>
</kwd-group>
<conference xlink:href="https://iccabs.engr.uconn.edu/2017/">
<conf-name>7th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS 2017)</conf-name>
<conf-loc>Orlando, FL, USA</conf-loc>
<conf-date>19-21 October 2017</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2019</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Microorganisms are ubiquitous in almost every ecosystem on earth, including soil, ocean water, and the human body. Single-celled organisms play a number of vital roles in each of these environments [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR2">2</xref>
]. Identifying the microbes present in a sample is critical to understanding what functions are carried out by these organisms and characterizing how disturbances in microbial communities lead to various maladies. Traditionally, microorganisms have been studied via culture-based techniques, in which the microbial organisms were isolated and studied individually in laboratory settings. However, it is well-recognized that many microbes are not culturable; hence they cannot be studied in laboratory settings [
<xref ref-type="bibr" rid="CR3">3</xref>
]. In addition, techniques studying microbes in laboratory settings are incapable of capturing the complex relations between hundreds to thousands of different microbial species in their natural habitats [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR2">2</xref>
]. High-throughput sequencing has revolutionized microbiome research, enabling the study of thousands of microbial genomes directly from their host environments and forming the field of metagenomics. This approach bypasses the traditional culture-dependent bias and allows the study of the composition of microbial communities in their natural habitats across different human tissues and environmental settings [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR2">2</xref>
]. Metagenomic profiling has proven useful for studying various microbes, including eukaryotic and viral pathogens, which were previously impossible to study in an unbiased manner with 16S ribosomal RNA gene sequencing [
<xref ref-type="bibr" rid="CR4">4</xref>
<xref ref-type="bibr" rid="CR6">6</xref>
].</p>
<p>Despite the critical importance of the “virome” and the “eukaryome” in affecting the microbiome and human health, most metagenomic profiling methods have focused primarily on identifying bacteria and archaea [
<xref ref-type="bibr" rid="CR7">7</xref>
,
<xref ref-type="bibr" rid="CR8">8</xref>
]. Several existing methods for metagenomic profiling have proposed using ‘marker genes’ that uniquely identify a read as coming from a certain species. This method has been shown to be efficient and accurate at estimating the presence and relative abundances of bacteria and archaea in a sample [
<xref ref-type="bibr" rid="CR9">9</xref>
<xref ref-type="bibr" rid="CR11">11</xref>
]. However, approaches based on marker genes have some limitations with identifying viral and eukaryotic genomes. One approach involves comparing differences in genes that are considered ‘universal’ but differ between species. This approach uses reads that indicate a certain sequence for that marker gene and thus uniquely identify a species [
<xref ref-type="bibr" rid="CR9">9</xref>
]. However, this is problematic for viruses, which are comprised mostly of novel sequences and do not share any single common gene [
<xref ref-type="bibr" rid="CR2">2</xref>
,
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR13">13</xref>
]. Another approach utilizes sequences that uniquely identify a given clade [
<xref ref-type="bibr" rid="CR10">10</xref>
,
<xref ref-type="bibr" rid="CR11">11</xref>
], but these can only use the relatively small number of reads that map to these specific regions of the genome [
<xref ref-type="bibr" rid="CR14">14</xref>
], leading to poor sensitivity [
<xref ref-type="bibr" rid="CR15">15</xref>
]. This is particularly problematic for eukaryotic genomes, which are usually long and comprised mostly of noncoding regions, leading to poor read utilization [
<xref ref-type="bibr" rid="CR5">5</xref>
,
<xref ref-type="bibr" rid="CR16">16</xref>
,
<xref ref-type="bibr" rid="CR17">17</xref>
]. Recent approaches based on k-mers have overcome these issues and improved run time dramatically [
<xref ref-type="bibr" rid="CR14">14</xref>
,
<xref ref-type="bibr" rid="CR18">18</xref>
]. However, these approaches show decreased sensitivity due to requiring perfect k-mer matches [
<xref ref-type="bibr" rid="CR15">15</xref>
,
<xref ref-type="bibr" rid="CR19">19</xref>
]. In addition, they demand heavy memory usage often in excess of 100GB, which many users do not have available [
<xref ref-type="bibr" rid="CR20">20</xref>
]. Finally, k-mer based methods have been observed to predict a large number of low-abundance species that are not actually present in the sample [
<xref ref-type="bibr" rid="CR15">15</xref>
].</p>
<p>In this paper, we present MiCoP (
<bold>Mi</bold>
crobiome
<bold>Co</bold>
mmunity
<bold>P</bold>
rofiling), a computational method capable of profiling viruses and eukaryotes with high precision and sensitivity. We overcome the issues mentioned above by utilizing a fast mapping-based approach, which is capable of high read usage, avoids bias against viruses and eukaryotes, and is sensitive to low-abundance species. Mapping-based approaches have been observed to have even higher sensitivity than Megablast [
<xref ref-type="bibr" rid="CR21">21</xref>
], a common gold standard method, but there have traditionally been concerns with the speed and memory usage of a mapping-based approach [
<xref ref-type="bibr" rid="CR22">22</xref>
]. We demonstrate that, when using smaller viral and eukaryote reference databases, a mapping-based approach is both feasible and preferable.</p>
<p>We first map reads to reference genomes using BWA-MEM, and we then apply a two-step read assignment process. From the BWA-MEM mapping results, uniquely-mapped reads are immediately assigned, and, subsequently, multi-mapped reads are probabilistically assigned to genomes based on the distribution of uniquely-mapped reads. We perform species abundance estimation by calculating the number of reads mapped to each genome and normalizing by genome length. Since MiCoP maps reads to specific genomes in a reference database, it is capable of detecting microbes at a finer granularity than the species level, for instance if different strains or chromosomes from a species are listed separately in the reference database. We validate MiCoP by comparing its abundance estimation performance with two of the most popular methods, MetaPhlAn2 [
<xref ref-type="bibr" rid="CR11">11</xref>
] and Kraken [
<xref ref-type="bibr" rid="CR14">14</xref>
]. We demonstrate improved results on simulated reads from viral and eukaryotic genomes, and we show that MiCoP can identify more viruses and eukaryotes in Human Microbiome Project data than previously-used methods.</p>
</sec>
<sec id="Sec2" sec-type="results">
<title>Results</title>
<sec id="Sec3">
<title>Methods overview</title>
<p>MiCoP utilizes a mapping-based approach to perform highly sensitive and precise read classification and accurate abundance estimation of viruses and eukaryotes in metagenomic samples. MiCoP starts with mapping reads to whole genomes in a reference database using BWA-MEM [
<xref ref-type="bibr" rid="CR23">23</xref>
], keeping all multi-mapped reads. Our approach then use a two-stage process to classify the reads. In the first stage, all uniquely-mapped reads are classified, and we compute the abundance of each genome in the sample based on these reads. In the second stage, multi-mapped reads are probabilistically assigned to one of the genomes that they mapped to, with probabilities proportional to the abundance of those genomes among uniquely-mapped reads. We remove species for which there is limited evidence, based on the number of reads assigned to that species. Relative abundances of the present genomes are then computed. These steps are discussed in further detail in the “
<xref rid="Sec9" ref-type="sec">Methods</xref>
” section. Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
illustrates the MiCoP workflow.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>MiCoP workflow. Reads are first aligned to viral or eukaryotic genomes in a reference database using BWA-MEM. The results provide coverage and read mapping quality information that can be examined. In the abundance estimation stage, uniquely-mapped reads are assigned to species and species abundances are estimated based on these. Multi-mapped reads are then assigned to genomes with probability proportional to their abundances among uniquely-mapped reads. Species with not enough reads mapped are filtered out, and then the final species abundances are computed</p>
</caption>
<graphic xlink:href="12864_2019_5699_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
</sec>
<sec id="Sec4">
<title>Performance metrics</title>
<p>We evaluate the performance of different methods using several different metrics, which are designed to encompass both performance in the binary classification task of predicting species presence or absence and in the estimation of relative abundances. For species presence and absence, a “True Positive” (TP) indicates that a species that is actually present in a sample is correctly predicted as being present by a method, while a “False Positive” (FP) indicates that the method predicted the presence of a species that is not actually in a sample and a “False Negative” (FN) indicates that a species was actually present in a sample but a method did not predict its presence. We use two metrics to assess the performance of a method in species presence/absence, precision and recall, defined below. Precision measures the percentage of predicted species that are actually present, while recall measures the percentage of species actually in a sample that were predicted by a method. Additionally, we report the F1-Score, which is defined as the harmonic mean of precision and recall. All three of these metrics range from 0 to 1, or 0% to 100%.
<disp-formula id="Equa">
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$Precision = \frac{TP}{TP+FP} \qquad Recall = \frac{TP}{TP+FN} $$ \end{document}</tex-math>
<mml:math id="M2">
<mml:mrow>
<mml:mtext mathvariant="italic">Precision</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
</mml:mrow>
</mml:mfrac>
<mml:mspace width="2em"></mml:mspace>
<mml:mtext mathvariant="italic">Recall</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_5699_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
<disp-formula id="Equb">
<alternatives>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $${F1-score} = \frac{2*Precision*Recall}{Precision+Recall} $$ \end{document}</tex-math>
<mml:math id="M4">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo></mml:mo>
<mml:mtext mathvariant="italic">score</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo></mml:mo>
<mml:mtext mathvariant="italic">Precision</mml:mtext>
<mml:mo></mml:mo>
<mml:mtext mathvariant="italic">Recall</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">Precision</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">Recall</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_5699_Article_Equb.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
We use the L1 error as a measure of how accurately a method computes the relative abundances of species in a sample. The L1 error is the sum of absolute value differences between predicted species abundances and actual species abundances, and ranges from 0 (completely correct) to 2 (completely incorrect). An L1 error of 0 indicates that the exact set of actual species and their actual abundances is predicted perfectly, while a score of 2 indicates that the set of predicted species is completely incorrect. The L1 error can be described mathematically as:
<disp-formula id="Equc">
<alternatives>
<tex-math id="M5">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$L1\ Error = \sum\limits^{S}_{i=1} |{Predicted}_{i} - {Actual}_{i}| $$ \end{document}</tex-math>
<mml:math id="M6">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mn>1</mml:mn>
<mml:mspace width="1em"></mml:mspace>
<mml:mtext mathvariant="italic">Error</mml:mtext>
<mml:mo>=</mml:mo>
<mml:munderover accent="false" accentunder="false">
<mml:mrow>
<mml:mo mathsize="big"></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext mathvariant="italic">Predicted</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext mathvariant="italic">Actual</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_5699_Article_Equc.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
where S is the set of species that are predicted or actually present and i is the summation index.</p>
</sec>
<sec id="Sec5">
<title>MiCoP shows order of magnitude improvement in abundance estimation</title>
<p>We validated the accuracy of our method by using simulated data, so that our results could be compared to a known ground truth. We sampled 1 million reads from 544 viral genomes obtained from an NCBI RefSeq reference file database 5808 viral genomes. Our simulation was designed using the “high complexity” microbial community parameters specified by the CAMI consortium, as described in the “
<xref rid="Sec9" ref-type="sec">Methods</xref>
” section [
<xref ref-type="bibr" rid="CR15">15</xref>
]. While 1 million reads is a fairly small metagenomic sample, in our case the coverage was reasonable because viral genomes are much shorter than bacterial genomes. We compared the results of our method with two of the most popular metagenome profiling methods, MetaPhlAn2 [
<xref ref-type="bibr" rid="CR11">11</xref>
] and Kraken [
<xref ref-type="bibr" rid="CR14">14</xref>
]. For this initial simulation, we used the default MetaPhlAn database and Kraken’s pre-built Minikraken database, since these databases reflect the most common conditions under which these methods are applied. For MiCoP, we use a database composed of the genomes available from NCBI’s RefSeq Viral and Fungal databases. In our next simulation, we examine the effect of the choice of reference database. Results are shown in Table 
<xref rid="Tab1" ref-type="table">1</xref>
.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Abundance estimation performance results on a simulated viral community with 544 species</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">L1 error</th>
<th align="left">Precision</th>
<th align="left">Recall/Sensitivity</th>
<th align="left">F1-Score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">MiCoP</td>
<td align="left">0.09124</td>
<td align="left">1.0</td>
<td align="left">0.98155</td>
<td align="left">0.99069</td>
</tr>
<tr>
<td align="left">Kraken</td>
<td align="left">1.15834*</td>
<td align="left">0.85147</td>
<td align="left">0.90959</td>
<td align="left">0.87957</td>
</tr>
<tr>
<td align="left">MetaPhlAn2</td>
<td align="left">1.24357</td>
<td align="left">0.84388</td>
<td align="left">0.369</td>
<td align="left">0.51348</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Kraken is considered by its authors to be a read classification tool, not abundance estimation tool, so we put an asterisk next to its results. However, we note that abundance estimation is a common application for Kraken in practice. Overall, MiCoP outperforms the other two methods across all metrics. Kraken and especially MetaPhlAn are limited by the poor representation of viruses in their standard databases. L1 error is the sum of the absolute values of the differences between the computed species abundances and the ground truth species abundances. MiCoP’s L1 error was more than an order of magnitude better than the other tools, and MiCoP had the best precision and recall. * Based on read classification proportions; Kraken does not claim to perform abundance estimation</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>We found that all three methods had high precision, but MiCoP had perfect precision. In other words, every species MiCoP predicted as present in the sample was actually present according to the ground truth. Results from MetaPhlAn2 reported only 37% sensitivity, indicating that it only identified just over a third of the species present in the sample, while Kraken identified about 91% and MiCoP identified about 98%. The total L1 error was only about 0.09 for MiCoP, while MetaPhlAn2’s error was over 1.24. Kraken also reported a high L1 error, but its authors present Kraken as a read classification method, not a relative abundance estimation method, so this metric may be misleading for Kraken.</p>
<p>Clearly, MetaPhlAn2’s performance was limited by the fact that their standard database did not contain marker genes for many of the NCBI virus genomes present in the sample. When using MetaPhlAn2’s provided database, researchers may fail to identify many of the species present in a sample, simply because they are not in this database. While Kraken’s default Minikraken database seems more comprehensive, it is still known to have lower sensitivity when compared to a more complete reference [
<xref ref-type="bibr" rid="CR14">14</xref>
]. The problem of reference bias can significantly impact the performance of these methods, particularly when applied to real datasets in which the set of expected genomes is not known in advance.</p>
<p>However, for the purposes of comparing MiCoP to MetaPhlAn2 and Kraken without the results being affected by reference bias, we constructed a dataset composed only of genomes that all three of these methods identified in the high complexity dataset. Out of these 173 genomes, we selected 40 according to the “low complexity” microbial community parameters established by the CAMI consortium [
<xref ref-type="bibr" rid="CR15">15</xref>
]. We also simulated errors in these reads, with the error rate linearly increasing from 1% at the start of reads to 5% at the end, with 2/3 of errors being substitutions and the other 1/3 being indels; these numbers were chosen to be roughly equal to the error rates used in the Kraken paper [
<xref ref-type="bibr" rid="CR14">14</xref>
]. Results are shown in Table 
<xref rid="Tab2" ref-type="table">2</xref>
.
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Abundance estimation performance results on a simulated viral community with 40 species</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">L1 error</th>
<th align="left">Precision</th>
<th align="left">Recall/Sensitivity</th>
<th align="left">F1-Score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">MiCoP</td>
<td align="left">0.00909</td>
<td align="left">1.0</td>
<td align="left">1.0</td>
<td align="left">1.0</td>
</tr>
<tr>
<td align="left">Kraken</td>
<td align="left">1.15466*</td>
<td align="left">0.82222</td>
<td align="left">0.925</td>
<td align="left">0.87059</td>
</tr>
<tr>
<td align="left">MetaPhlAn2</td>
<td align="left">0.09844</td>
<td align="left">1.0</td>
<td align="left">1.0</td>
<td align="left">1.0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>These species were sampled from the intersect of the species detected by all three tools in the previous simulation. Thus, this simulation consisted of only the species that were present in all three reference databases, eliminating reference bias. MetaPhlAn’s performance dramatically improved, predicting the exact set of species in the sample, but its abundance estimation was an order of magnitude worse than MiCoP’s. Kraken’s results did not markedly improve in this simulation. * Based on read classification proportions; Kraken does not claim to perform abundance estimation</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>We observed that MiCoP and MetaPhlAn2 both identified the exact set of genomes present in the sample, leading to perfect precision and recall scores. However, MiCoP’s L1 error of about 0.0091 was less than a tenth of MetaPhlAn2’s L1 error of about 0.098. We speculate that the poor read utilization of MetaPhlAn2 (about 5% of reads used) leads to less accurate abundance estimation. Kraken’s sensitivity was almost perfect, with only three genomes out of 40 left unidentified. Surprisingly, Kraken’s precision was surprisingly slightly worse in comparison to its performance on the high complexity dataset. We observed that Kraken reported many low-abundance false positive predictions, resulting from no more than a few mispredicted reads. This phenomenon was also reported by the CAMI consortium when testing on bacterial data [
<xref ref-type="bibr" rid="CR15">15</xref>
]. By excluding genomes that were reported in less than 0.01% of reads by Kraken, we were able to raise precision from the original 0.39362 to 0.82222 without reducing recall. With higher cutoffs, the recall dropped rapidly, so the choice of 0.01% appeared to be optimal. Even with this improved precision, Kraken still produces several false positives whereas the other methods do not. These results suggest that MiCoP is highly effective at accurately predicting the species present in a sample and estimating their relative abundances, even when a significant amount of errors are present in the reads.</p>
<p>We generated a low complexity fungi community simulation dataset using the procedure described above: first we simulated a high complexity dataset, and then sampled 40 genomes out of the genomes detected by all methods on the high complexity dataset. Unlike the viral simulation, in which each sampled genome belongs to a different viral species, this simulation’s 40 genomes derived from only 7 different fungal species (some genomes in the reference database were contigs from the same species). Results are shown in Table 
<xref rid="Tab3" ref-type="table">3</xref>
.
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>Abundance estimation performance results on a simulated fungal community consisting of 40 genomes derived from 7 different species</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">L1 error</th>
<th align="left">Precision</th>
<th align="left">Recall/Sensitivity</th>
<th align="left">F1-Score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">MiCoP</td>
<td align="left">0.00017</td>
<td align="left">1.0</td>
<td align="left">1.0</td>
<td align="left">1.0</td>
</tr>
<tr>
<td align="left">Kraken</td>
<td align="left">0.01420*</td>
<td align="left">1.0</td>
<td align="left">0.83333</td>
<td align="left">0.90909</td>
</tr>
<tr>
<td align="left">MetaPhlAn2</td>
<td align="left">0.00924</td>
<td align="left">0.85714</td>
<td align="left">1.0</td>
<td align="left">0.92308</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>These species were sampled in the same way as in the previous table: by taking the intersect of species detected by all three tools on a higher-complexity community. MiCoP detected the exact set of species present in the sample, while Kraken had one false negative and MetaPhlAn had one false positive. Additionally, MiCoP’s abundance estimation was more than an order of magnitude better than the other tools. * Based on read classification proportions; Kraken does not claim to perform abundance estimation</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>Due to the relatively small number of species present, with several genomes sampled from each species, all methods were able to predict almost the exact set of species present. MetaPhlAn generated one false positive and Kraken generated one false negative. However, MiCoP achieved an L1 error that was more than 10 times lower than that of Kraken or MetaPhlAn2. This result demonstrates that MiCoP is effective at estimating the relative abundance of eukaryotes in a sample more accurately than existing methods, even when those methods predict almost the exact set of species in the sample correctly.</p>
</sec>
<sec id="Sec6">
<title>MiCoP detects greater diversity of viruses and eukaryotes in real world data</title>
<p>The Human Microbiome Project (HMP) is an ongoing large-scale effort to understand and characterize the human microbiome across a variety of body sites [
<xref ref-type="bibr" rid="CR24">24</xref>
<xref ref-type="bibr" rid="CR26">26</xref>
]. One of the main HMP studies took 4788 samples from 300 patients across 18 body sites [
<xref ref-type="bibr" rid="CR26">26</xref>
]. Notably, this study used MetaPhlAn to profile their metagenomic samples [
<xref ref-type="bibr" rid="CR26">26</xref>
]. As our simulations indicate, it is possible that analyses using MetaPhlAn have failed to capture the diversity and prevalence of the human virome and eukaryome due to a functional bias towards bacteria. We have previously shown MiCoP’s superior performance to MetaPhlAn on simulated datasets. However, real data may not be as clean as simulated data, due to factors such as library preparation, mutations in organisms in real world environments, horizontal gene transfer, etcetera. We have shown that Kraken leads to a number of false positives even in the simulated data, and thus we do not assess its results on real data where ground truth is not available.</p>
<p>In order to compare the performance of MiCoP against MetaPhlAn2 on real world data, we applied both methods to publicly-available mock community data. Mock communities have the advantage of being examples of real live microbiomes, but with the community composition controlled and known in advance, providing an effective means for evaluating performance on real world data. Additionally, because classic mapping methods such as BWA-MEM were not originally designed for metagenomics [
<xref ref-type="bibr" rid="CR22">22</xref>
], they can occasionally assign reads incorrectly when applied to noisier real world data, especially when organisms in the sample are not in the reference database but closely related organisms are. Mock communities thus provide an opportunity to establish parameter settings for species filtering that perform well on real data. Several large scale and high profile metagenomics studies have applied BLAST or other alignment methods with a variety of different parameter settings with little to no explanation [
<xref ref-type="bibr" rid="CR27">27</xref>
<xref ref-type="bibr" rid="CR29">29</xref>
], so establishing a standard for these settings for our method is useful and illustrative. We focus on the precision and recall metrics for the mock community comparisons, as the studies that the communities were taken from report the abundances of the species in ways that could not be reliably translated into normalized relative abundances. As with our simulation studies, we use a database composed of the genomes available from NCBI’s RefSeq Viral and Fungal databases. We focus on fungi in particular because RefSeq’s databases for non-fungal eukaryotes are currently very limited. For MetaPhlAn2, we use their provided marker gene reference database.</p>
<p>We first applied MiCoP and MetaPhlAn2 to a viral mock community consisting of 9 species that was released by Conceição-Neto
<italic>et al</italic>
. in 2015 [
<xref ref-type="bibr" rid="CR30">30</xref>
]. We empirically found that the optimal parameter settings for MiCoP required at least 10 reads with 60% of bases mapped to the reference genome to consider a species present. Results are shown in Table 
<xref rid="Tab4" ref-type="table">4</xref>
. MetaPhlAn only detected 1 of the 9 species in the sample, with no false positives, while MiCoP detected 7 of the 9 species in the sample with 1 false positive. The one false positive that MiCoP detected occurred due to the previously-mentioned reference bias problem. The species that was actually present in the sample, Feline panleukopenia virus, was not in the NCBI viral reference database, while the closely-related [
<xref ref-type="bibr" rid="CR31">31</xref>
] Canine parvovirus was, leading to many of the reads from Feline panleukopenia virus mapping well to the Canine parvovirus genome. While very stringent parameter settings could filter out this false positive, they would also filter out many truly present species from the results. This result highlights both the reference bias problem and the inherent tradeoff between false positive and false negative rates. Regardless, MiCoP represented the viral community much more accurately than MetaPhlAn2, which failed to capture the community diversity. Finally, in terms of speed, MetaPhlAn2 processed the reads about twice as quickly as MiCoP, although MiCoP still processed the 12.4 million reads in 2 h and 22 min.
<table-wrap id="Tab4">
<label>Table 4</label>
<caption>
<p>Comparison of the performance of MiCoP and MetaPhlAn2 on a mock viral community consisting of 9 species</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F1-Score</th>
<th align="left">Reads per minute</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">MiCoP</td>
<td align="left">0.875</td>
<td align="left">0.77778</td>
<td align="left">0.82353</td>
<td align="left">87629</td>
</tr>
<tr>
<td align="left">MetaPhlAn2</td>
<td align="left">1.0</td>
<td align="left">0.11111</td>
<td align="left">0.19999</td>
<td align="left">162845</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>MetaPhlAn2 only detects 1 of 9 species, with no false positives, while MiCoP detects 7 of 9 species with one false positive, thus profiling the community much more accurately. MetaPhlAn2 processed the reads about twice as fast as MiCoP</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>We also applied MiCoP and MetaPhlAn2 to a fungal mock community consisting of 20 species from 4 genera that was released by Tonge et al. in 2014 [
<xref ref-type="bibr" rid="CR32">32</xref>
]. NCBI’s fungal RefSeq database contains far fewer species than its viral counterpart, and was missing many of the species in the mock community. Thus, species-level results identified several false but closely-related species in the sample, similarly to the previously explored Canine parvovirus example, and we determined that classification of real fungal communities can generally only be done accurately at the genus level. We applied somewhat stricter parameter settings than were applied for viruses, owing to the greater amount of shared genomic sequence between different fungi as compared with viruses. In particular, we considered a genus present if at least 100 reads matched at least 99% with any of the reference genomes for that genus with a maximum of 1 indel or substitution per read. Results are shown in Table 
<xref rid="Tab5" ref-type="table">5</xref>
. MetaPhlAn2 did not detect any fungal genera, while MiCoP detected 3 of the 4 genera in the sample. The genus that MiCoP did not detect was not present in the NCBI RefSeq fungal database, so MiCoP did as well as possible given the state of the NCBI database. In terms of speed, MetaPhlAn2 processed the reads significantly faster than MiCoP, but MiCoP still processed the 4.9 million reads in a reasonable time of 2 h and 55 min.
<table-wrap id="Tab5">
<label>Table 5</label>
<caption>
<p>Comparison of the genus-level performance of MiCoP and MetaPhlAn2 on a mock fungal community consisting of 4 genera</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F1-Score</th>
<th align="left">Reads per minute</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">MiCoP</td>
<td align="left">1.0</td>
<td align="left">0.75</td>
<td align="left">0.85714</td>
<td align="left">6934</td>
</tr>
<tr>
<td align="left">MetaPhlAn2</td>
<td align="left">NaN (0/0)</td>
<td align="left">0.0</td>
<td align="left">NaN</td>
<td align="left">187961</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>MiCoP detects 3 of the 4 genera with no false positives, while MetaPhlAn2 detects nothing. Because MetaPhlAn2 has 0 true and false positives, precision cannot be computed. MetaPhlAn2 was faster, but MiCoP still finished in less than 3 h</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>We then compared the performance of MetaPhlAn2 and MiCoP on the HMP data, using the parameter settings validated on the mock community datasets. Following an example provided by the MetaPhlAn authors, we downloaded 20 samples from the HMP, 10 from buccal mucosa and 10 from tongue dorsum. We analyzed each sample using MiCoP and MetaPhlAn2. Note that the relative abundances of fungi were computed with respect to the reads that could be identified as coming from fungi, and likewise for viruses, not with respect to the entire sample. MetaPhlAn2 estimates relative abundances for the entire sample; here, we recomputed the relative abundances for fungi and viruses relative to themselves. Results are shown in Figs. 
<xref rid="Fig2" ref-type="fig">2</xref>
and
<xref rid="Fig3" ref-type="fig">3</xref>
.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>MiCoP and MetaPhlAn2 HMP Virus Profiling Results. Abundance estimation for viruses when applying MiCoP and MetaPhlAn2 to 20 Human Microbiome Project samples, 10 from buccal mucosa and 10 from tongue dorsum. MiCoP detects a total of 34 species present, with the sample being dominated by bacterial phages, particularly
<italic>Streptococcus</italic>
phages. MetaPhlAn finds a much lower virome diversity than MiCoP, with only 12 species identified. The sample is again dominated by
<italic>Streptococcus</italic>
phages, but MetaPhlAn’s results suggest that there is only a single type of this phage dominating the sample, while MiCoP suggests that a wide variety of
<italic>Streptococcus</italic>
phages are present. MetaPhlAn’s results may stem from the reference bias issue explored in the simulation studies</p>
</caption>
<graphic xlink:href="12864_2019_5699_Fig2_HTML" id="MO2"></graphic>
</fig>
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>MiCoP and MetaPhlAn HMP Fungi Profiling Results. Abundance estimation for fungi when applying MiCoP and MetaPhlAn2 to 20 Human Microbiome Project samples, 10 from buccal mucosa and 10 from tongue dorsum. MiCoP detects a total of 6 genera present. MetaPhlAn detects only two genera (
<italic>Candida</italic>
and
<italic>Aspergillaceae</italic>
), which are also present in MiCoP’s results. As the human oral eukaryome is known to be diverse [
<xref ref-type="bibr" rid="CR33">33</xref>
<xref ref-type="bibr" rid="CR35">35</xref>
], our results indicate that MiCoP captures the fungal community diversity better</p>
</caption>
<graphic xlink:href="12864_2019_5699_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p>We found that MiCoP consistently identified a more diverse virome and eukaryome than MetaPhlAn across all samples and both body sites. For fungi, MetaPhlAn2 identified two genera in the HMP samples,
<italic>Candida</italic>
and
<italic>Aspergillaceae</italic>
, while MiCoP identified 6 genera, including the two identified by MetaPhlAn2. Additionally, while the
<italic>Candida</italic>
genus dominated the MetaPhlAn2 results with 96.3% abundance, genera identified by MiCoP were distributed in a more balanced manner. For viruses, MetaPhlAn identified 12 species present while MiCoP identified 34 species. Both results were dominated by
<italic>Streptococcus</italic>
phages, but MetaPhlAn’s results were dominated by
<italic>Streptococcus phage EJ1</italic>
, while MiCoP identified a diverse group of
<italic>Streptococcus</italic>
phages. As the human oral virome and eukaryome are known to be highly diverse [
<xref ref-type="bibr" rid="CR33">33</xref>
<xref ref-type="bibr" rid="CR35">35</xref>
], our results indicate that MiCoP is capturing more of the community diversity than MetaPhlAn. MiCoP can be used as an effective alternative to popular general-purpose metagenomic abundance estimation tools when a more comprehensive characterization of the human virome and eukaryome is desired.</p>
</sec>
</sec>
<sec id="Sec7" sec-type="discussion">
<title>Discussion</title>
<p>MiCoP illustrates the benefits of a mapping-based approach for metagenomic analyses, especially of viral and eukaryotic species. Methods that are infeasible for the largest bacterial reference databases can be leveraged for smaller reference databases due to increased sensitivity. The mapping-based approach is a particular example of this, as it is more sensitive to viral and eukaryotic species and gives valuable coverage information that other methods do not provide. Generally speaking, we observe that different mapping methods tend to be optimized for different types of microbes, and many existing methods are less effective for non-bacterial species. We also note the issue of reference bias, which our simulations showed can significantly impact the performance of profiling methods. If users attempt to use existing methods with their default databases, they may not accurately detect non-bacterial species. Our real data analysis supports this view, as MiCoP identified more species than previous studies had reported.</p>
<p>In terms of run time, MiCoP trades off speed for sensitivity, so Kraken and MetaPhlAn2 run faster. However, this difference is relatively minor (significantly less than an order of magnitude) when using viral reference databases such as the NCBI RefSeq viral genomes due to their small size. The difference is more pronounced with eukaryote reference databases, due to the large genome size of eukaryotes, and can be more than an order of magnitude. Thus, MiCoP is likely to scale better for viral data than for eukaryotes. Finally, for any metagenome profiling method, there is an inherent tradeoff between false positives and false negatives when deciding what amount of evidence is necessary to consider a microbe present. In the case of MiCoP, this tradeoff is modulated by settings such as number of reads mapped to a genome and percent of bases correctly mapped. We leave these settings up to the user, as they may have different application-specific needs, but we use as a default setting the empirically-tested settings described in our mock community evaluation.</p>
</sec>
<sec id="Sec8" sec-type="conclusion">
<title>Conclusions</title>
<p>MiCoP aims to help researchers more comprehensively and accurately identify viral and eukaryotic species in metagenomic samples. We have demonstrated that a mapping-based approach, thought by many to be infeasible for large bacterial databases, is computationally tractable and more accurate than existing general-purpose methods when profiling viruses and eukaryotes in metagenomic samples. In addition, we illustrated the significant issue of reference bias, and showed that MiCoP avoids some reference bias by using more comprehensive viral and fungal databases than many popular methods.</p>
<p>There are several potential future directions for MiCoP. One possible extension would be to add a precomputation method that reduces the reference database size by removing genomes that have no chance of being in a set of reads, using k-mer or MinHash based methods. This would enable MiCoP to run faster and use less memory, perhaps making it feasible to analyze large bacterial reference databases. Another possible direction involves assembly of sequences that were not mapped to any reference genome. This would allow for the detection of species that are not available in a reference database, but caution would have to be taken to avoid false discoveries.</p>
</sec>
<sec id="Sec9">
<title>Methods</title>
<sec id="Sec10">
<title>Reference database and mapping method</title>
<p>Any metagenomic profiling method that aims to classify sequence reads as belonging to certain reference genomes is dependent to a large extent on the reference database used [
<xref ref-type="bibr" rid="CR15">15</xref>
]. We show empirical evidence of this reference bias in our “
<xref rid="Sec2" ref-type="sec">Results</xref>
” section. Choosing the reference database involves a tradeoff between smaller databases that result in lower sensitivity but can be searched faster, and larger databases that take longer to search but enable more accurate results. In general, increasingly powerful computer hardware and fast mapping algorithms have enabled searching of large reference databases in a reasonable amount of time [
<xref ref-type="bibr" rid="CR19">19</xref>
,
<xref ref-type="bibr" rid="CR36">36</xref>
]. Additionally, viral and eukaryotic reference databases are currently much smaller than bacterial reference databases, making mapping-based approaches feasible. We therefore performed analysis using the full NCBI RefSeq Viral and Fungal databases. In addition to database selection, the selection of the mapping algorithm used in a mapping-based approach heavily affects results. We evaluated several new and established mapping methods, including Megablast [
<xref ref-type="bibr" rid="CR21">21</xref>
], BWA-MEM [
<xref ref-type="bibr" rid="CR23">23</xref>
], Bowtie2 [
<xref ref-type="bibr" rid="CR37">37</xref>
], and Diamond [
<xref ref-type="bibr" rid="CR38">38</xref>
]. We found that BWA-MEM produced the best results overall, comparable in accuracy to Megablast but with a much faster run time that was feasible for large modern metagenomic sequencing datasets.</p>
</sec>
<sec id="Sec11">
<title>Probabilistic assignment of multi-mapped reads</title>
<p>While classifying uniquely-mapped reads is trivial, proper assignment of multi-mapped reads has a major impact on results. BWA-MEM’s default setting randomly chooses which genome to assign multi-mapped reads to; this setting led to a large amount of false positives in our simulated datasets. However, simply discarding all multi-mapped reads leads to poor read utilization and negatively affects sensitivity and abundance estimation. Thus, a method for accurately assigning multi-mapped reads is of critical importance in read classification.</p>
<p>In the first stage of our two-stage approach, uniquely-mapped reads are classified according to the genome that they map to, and the relative read counts for each genome are then calculated. All multi-mapped reads, and the list of the genomes that they map to, are set aside during this stage. During the second stage, we assign multi-mapped reads to a genome with probability equal to the relative uniquely-mapped read counts for each of those genomes. A consequence of this approach is that genomes whose only mapped reads are multi-mapped will have no chance of reads being mapped to them, and reads that map only to species bearing no uniquely-mapped reads will not be mapped at all. We also filter out genomes with fewer than 10 uniquely-mapped reads, as there is insufficient evidence to indicate their presence; this heuristic technique has been successfully employed in previous studies [
<xref ref-type="bibr" rid="CR19">19</xref>
]. In comparison to letting BWA-MEM randomly choose multi-mapped read classification, we observed that these filtering steps have a minor-to-negligible impact on the sensitivity and vastly increase precision on the species level by eliminating many false positives.</p>
</sec>
<sec id="Sec12">
<title>Relative abundance estimation</title>
<p>Following the classification of reads to genomes, we estimate the relative abundances of each organism in the sample. Many read classification methods do not support this step, even though indicating the actual pervasiveness of different species in a sample is more informative than pure read counting. To do this, we normalize the read counts for each genome by the length of the genome. We then normalize the adjusted counts of each genome by the sum of the adjusted counts, so that all species abundances sum up to 100%.</p>
</sec>
<sec id="Sec13">
<title>Simulated datasets</title>
<p>Simulated reads from viral and eukaryotic genomes were generated using Grinder [
<xref ref-type="bibr" rid="CR39">39</xref>
]. We used two different settings of simulated microbial communities, low complexity communities and high complexity communities. The parameters for these communities were set in accordance with the simulations performed by the CAMI consortium benchmark [
<xref ref-type="bibr" rid="CR15">15</xref>
]. Low complexity communities consisted of 40 genomes with abundances selected from a lognormal distribution with mean 1 and standard deviation 2, then normalized such that they total 100%. High complexity communities were similarly produced, except with 544 genomes, mean 1.5, and standard deviation 1. All viral and eukaryotic genome simulations consisted of 1 million reads with lengths picked from a normal distribution with mean 150 and standard deviation 15.</p>
</sec>
</sec>
</body>
<back>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>BWA</term>
<def>
<p>Burrows-wheeler aligner</p>
</def>
</def-item>
<def-item>
<term>CAMI</term>
<def>
<p>Critical assessment of metagenome interpretation</p>
</def>
</def-item>
<def-item>
<term>HMP</term>
<def>
<p>Human microbiome project</p>
</def>
</def-item>
<def-item>
<term>MetaPhlAn</term>
<def>
<p>Metagenomic phylogenetic analysis</p>
</def>
</def-item>
<def-item>
<term>MiCoP</term>
<def>
<p>Microbial community profiling</p>
</def>
</def-item>
<def-item>
<term>NCBI</term>
<def>
<p>National center for biotechnology information</p>
</def>
</def-item>
<def-item>
<term>RefSeq</term>
<def>
<p>NCBI reference sequence database</p>
</def>
</def-item>
<def-item>
<term>TP/FP/FN</term>
<def>
<p>True positives / False positives / False negatives</p>
</def>
</def-item>
</def-list>
</glossary>
<ack>
<title>Acknowledgements</title>
<p>The authors would like to thank Dr. Lana Martin for her editorial assistance with the manuscript.</p>
<sec id="d29e1380">
<title>Funding</title>
<p>The article processing and publication charges were funded via UCLA Institutional Funds. NL would like to acknowledge the support of NSF grant DGE-1829071 and NIH grant T32 EB016640. SM acknowledges support from a QCB Collaboratory Postdoctoral Fellowship, and the QCB Collaboratory community directed by Matteo Pellegrini. SM and EE are supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676, 1065276, 1302448, 1320589 and 1331176, and National Institutes of Health grants K25-HL080079, U01-DA024417, P01-HL30568, P01-HL28481, R01-GM083198, R01-ES021801, R01-MH101782, and R01-ES022282. DK was supported by the National Science Foundation under Grant No. 1664803.</p>
</sec>
<sec id="d29e1385" sec-type="data-availability">
<title>Availability of data and materials</title>
<p>The code, data, and documentation for this study is publicly available on GitHub at:
<ext-link ext-link-type="uri" xlink:href="https://github.com/smangul1/MiCoP">https://github.com/smangul1/MiCoP</ext-link>
</p>
<p>BWA-MEM and the NCBI RefSeq databases are also publicly available online via their respective websites.</p>
</sec>
<sec id="d29e1396">
<title>About this supplement</title>
<p>This article has been published as part of
<italic>BMC Genomics Volume 20 Supplement 5, 2019: Selected articles from the 7th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS 2017): genomics</italic>
. The full contents of the supplement are available online at
<ext-link ext-link-type="uri" xlink:href="https://bmcgenomics.biomedcentral.com/articles/supplements/volume-20-supplement-5">https://bmcgenomics.biomedcentral.com/articles/supplements/volume-20-supplement-5</ext-link>
.</p>
</sec>
</ack>
<notes notes-type="author-contribution">
<title>Authors’ contributions</title>
<p>NL wrote the manuscript and code and carried out the experiments. SM developed the project and experiments with NL. MA, IM, NW, DK, and EE collaborated on the development of the project and its direction and goals. NW provided interpretation of the viral results. All authors have read and approved this manuscript.</p>
</notes>
<notes>
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</notes>
<notes>
<title>Consent for publication</title>
<p>Not applicable.</p>
</notes>
<notes notes-type="COI-statement">
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</notes>
<notes>
<title>Publisher’s Note</title>
<p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
</notes>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Handelsman</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Metagenomics: application of genomics to uncultured microorganisms</article-title>
<source>Microbiol Mol Biol Rev</source>
<year>2004</year>
<volume>68</volume>
<issue>4</issue>
<fpage>669</fpage>
<lpage>85</lpage>
<pub-id pub-id-type="doi">10.1128/MMBR.68.4.669-685.2004</pub-id>
<pub-id pub-id-type="pmid">15590779</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wooley</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Friedberg</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>A primer on metagenomics</article-title>
<source>PLoS Comput Biol</source>
<year>2010</year>
<volume>6</volume>
<issue>2</issue>
<fpage>1000667</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000667</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stewart</surname>
<given-names>EJ</given-names>
</name>
</person-group>
<article-title>Growing unculturable bacteria</article-title>
<source>J Bacteriol</source>
<year>2012</year>
<volume>194</volume>
<issue>16</issue>
<fpage>4151</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1128/JB.00345-12</pub-id>
<pub-id pub-id-type="pmid">22661685</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Venter</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Remington</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Paulsen</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Environmental genome shotgun sequencing of the sargasso sea</article-title>
<source>Science</source>
<year>2004</year>
<volume>304</volume>
<issue>5667</issue>
<fpage>66</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="doi">10.1126/science.1093857</pub-id>
<pub-id pub-id-type="pmid">15001713</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
</person-group>
<article-title>Microbiology: metagenomics</article-title>
<source>Nature</source>
<year>2008</year>
<volume>455</volume>
<issue>7212</issue>
<fpage>481</fpage>
<lpage>3</lpage>
<pub-id pub-id-type="doi">10.1038/455481a</pub-id>
<pub-id pub-id-type="pmid">18818648</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosario</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Breitbart</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Exploring the viral world through metagenomics</article-title>
<source>Curr Opin Virol</source>
<year>2011</year>
<volume>1</volume>
<issue>4</issue>
<fpage>289</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="doi">10.1016/j.coviro.2011.06.004</pub-id>
<pub-id pub-id-type="pmid">22440785</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rose</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Constantinides</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Tapinos</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Prosperi</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Challenges in the analysis of viral metagenomes</article-title>
<source>Virus Evol</source>
<year>2016</year>
<volume>2</volume>
<issue>2</issue>
<fpage>022</fpage>
<pub-id pub-id-type="doi">10.1093/ve/vew022</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8</label>
<mixed-citation publication-type="other">Lukeš J, Stensvold CR, Jirk
<inline-formula id="IEq2">
<alternatives>
<tex-math id="M7">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${\overset {\circ }{\text {u}}}$\end{document}</tex-math>
<mml:math id="M8">
<mml:mover class="overset">
<mml:mrow>
<mml:mtext>u</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:mover>
</mml:math>
<inline-graphic xlink:href="12864_2019_5699_Article_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
-Pomajbíková K, Parfrey LW. Are human intestinal eukaryotes beneficial or commensals?. PLoS Pathog. 2015; 11(8):1005039.</mixed-citation>
</ref>
<ref id="CR9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Gibbons</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ghodsi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Treangen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences</article-title>
<source>Genome Biol</source>
<year>2011</year>
<volume>12</volume>
<issue>1</issue>
<fpage>11</fpage>
<pub-id pub-id-type="doi">10.1186/1465-6906-12-S1-P11</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segata</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Waldron</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ballarini</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Narasimhan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Jousson</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Metagenomic microbial community profiling using unique clade-specific marker genes</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>9</volume>
<issue>8</issue>
<fpage>811</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.2066</pub-id>
<pub-id pub-id-type="pmid">22688413</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Truong</surname>
<given-names>DT</given-names>
</name>
<name>
<surname>Franzosa</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Tickle</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Scholz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Weingart</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pasolli</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Tett</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Segata</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Metaphlan2 for enhanced metagenomic taxonomic profiling</article-title>
<source>Nat Methods</source>
<year>2015</year>
<volume>12</volume>
<issue>10</issue>
<fpage>902</fpage>
<lpage>3</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.3589</pub-id>
<pub-id pub-id-type="pmid">26418763</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Willner</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>From deep sequencing to viral tagging: recent advances in viral metagenomics</article-title>
<source>Bioessays</source>
<year>2013</year>
<volume>35</volume>
<issue>5</issue>
<fpage>436</fpage>
<lpage>42</lpage>
<pub-id pub-id-type="doi">10.1002/bies.201200174</pub-id>
<pub-id pub-id-type="pmid">23450659</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Viral metagenomics</article-title>
<source>Nat Rev Microbiol</source>
<year>2005</year>
<volume>3</volume>
<issue>6</issue>
<fpage>504</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro1163</pub-id>
<pub-id pub-id-type="pmid">15886693</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wood</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Kraken: ultrafast metagenomic sequence classification using exact alignments</article-title>
<source>Genome Biol</source>
<year>2014</year>
<volume>15</volume>
<issue>3</issue>
<fpage>46</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2014-15-3-r46</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sczyrba</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hofmann</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Belmann</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Koslicki</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Janssen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Dröge</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gregor</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Majda</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fiedler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dahms</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Critical assessment of metagenome interpretation—a benchmark of metagenomics software</article-title>
<source>Nat Methods</source>
<year>2017</year>
<volume>14</volume>
<issue>11</issue>
<fpage>1063</fpage>
<lpage>71</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.4458</pub-id>
<pub-id pub-id-type="pmid">28967888</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cowan</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Stafford</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Muyanga</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cameron</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wittwer</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Metagenomic gene discovery: past, present and future</article-title>
<source>Trends Biotechnol</source>
<year>2005</year>
<volume>23</volume>
<issue>6</issue>
<fpage>321</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1016/j.tibtech.2005.04.001</pub-id>
<pub-id pub-id-type="pmid">15922085</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17</label>
<mixed-citation publication-type="other">Gilbert JA, Dupont CL. Microbial metagenomics: beyond the genome. Annual Rev Mar Sci. 2010:347–71.</mixed-citation>
</ref>
<ref id="CR18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ounit</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wanamaker</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Close</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Lonardi</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Clark: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers</article-title>
<source>BMC Genomics</source>
<year>2015</year>
<volume>16</volume>
<issue>1</issue>
<fpage>236</fpage>
<pub-id pub-id-type="doi">10.1186/s12864-015-1419-2</pub-id>
<pub-id pub-id-type="pmid">25879410</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Petersen</surname>
<given-names>TN</given-names>
</name>
<name>
<surname>Lukjancenko</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Thomsen</surname>
<given-names>MCF</given-names>
</name>
<name>
<surname>Sperotto</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Lund</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Aarestrup</surname>
<given-names>FM</given-names>
</name>
<name>
<surname>Sicheritz-Pontén</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Mgmapper: reference based mapping and taxonomy annotation of metagenomics sequence reads</article-title>
<source>PLoS ONE</source>
<year>2017</year>
<volume>12</volume>
<issue>5</issue>
<fpage>0176469</fpage>
</element-citation>
</ref>
<ref id="CR20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corvelo</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>WE</given-names>
</name>
<name>
<surname>Robine</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Zody</surname>
<given-names>MC</given-names>
</name>
</person-group>
<article-title>taxmaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time</article-title>
<source>Genome Res</source>
<year>2018</year>
<volume>28</volume>
<issue>5</issue>
<fpage>751</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="doi">10.1101/gr.225276.117</pub-id>
<pub-id pub-id-type="pmid">29588360</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>A greedy algorithm for aligning dna sequences</article-title>
<source>J Comput Biol</source>
<year>2000</year>
<volume>7</volume>
<issue>1-2</issue>
<fpage>203</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1089/10665270050081478</pub-id>
<pub-id pub-id-type="pmid">10890397</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Breitwieser</surname>
<given-names>FP</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Centrifuge: rapid and sensitive classification of metagenomic sequences</article-title>
<source>Genome Res</source>
<year>2016</year>
<volume>26</volume>
<issue>12</issue>
<fpage>1721</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1101/gr.210641.116</pub-id>
<pub-id pub-id-type="pmid">27852649</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<label>23</label>
<mixed-citation publication-type="other">Li H. Aligning sequence reads, clone sequences and assembly contigs with bwa-mem. 2013:1303–3997. arXiv.</mixed-citation>
</ref>
<ref id="CR24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fraser-Liggett</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<article-title>The human microbiome project</article-title>
<source>Nature</source>
<year>2007</year>
<volume>449</volume>
<issue>7164</issue>
<fpage>804</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="doi">10.1038/nature06244</pub-id>
<pub-id pub-id-type="pmid">17943116</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Methé</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Creasy</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Giglio</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gevers</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Petrosino</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Abubucker</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Badger</surname>
<given-names>JH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A framework for human microbiome research</article-title>
<source>Nature</source>
<year>2012</year>
<volume>486</volume>
<issue>7402</issue>
<fpage>215</fpage>
<lpage>21</lpage>
<pub-id pub-id-type="doi">10.1038/nature11209</pub-id>
<pub-id pub-id-type="pmid">22699610</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gevers</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Abubucker</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Badger</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Chinwalla</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Creasy</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Earl</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>FitzGerald</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>RS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Structure, function and diversity of the healthy human microbiome</article-title>
<source>Nature</source>
<year>2012</year>
<volume>486</volume>
<issue>7402</issue>
<fpage>207</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1038/nature11234</pub-id>
<pub-id pub-id-type="pmid">22699609</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paez-Espino</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Eloe-Fadrosh</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Pavlopoulos</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Huntemann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mikhailova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
</person-group>
<article-title>Uncovering earth’s virome</article-title>
<source>Nature</source>
<year>2016</year>
<volume>536</volume>
<issue>7617</issue>
<fpage>425</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1038/nature19094</pub-id>
<pub-id pub-id-type="pmid">27533034</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dutilh</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Cassman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>McNair</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Sanchez</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>Boling</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Barr</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Speth</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Seguritan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Aziz</surname>
<given-names>RK</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes</article-title>
<source>Nat Commun</source>
<year>2014</year>
<volume>5</volume>
<fpage>4498</fpage>
<pub-id pub-id-type="doi">10.1038/ncomms5498</pub-id>
<pub-id pub-id-type="pmid">25058116</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aziz</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Dwivedi</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Akhter</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Breitbart</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes</article-title>
<source>Front Microbiol</source>
<year>2015</year>
<volume>6</volume>
<fpage>381</fpage>
<pub-id pub-id-type="pmid">26005436</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conceição-Neto</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Zeller</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lefrère</surname>
<given-names>H</given-names>
</name>
<name>
<surname>De Bruyn</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Beller</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Deboutte</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Yinda</surname>
<given-names>CK</given-names>
</name>
<name>
<surname>Lavigne</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Maes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Van Ranst</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis</article-title>
<source>Sci Rep</source>
<year>2015</year>
<volume>5</volume>
<fpage>16532</fpage>
<pub-id pub-id-type="doi">10.1038/srep16532</pub-id>
<pub-id pub-id-type="pmid">26559140</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parrish</surname>
<given-names>CR</given-names>
</name>
</person-group>
<article-title>Mapping specific functions in the capsid structure of canine parvovirus and feline panleukopenia virus using infectious plasmid clones</article-title>
<source>Virology</source>
<year>1991</year>
<volume>183</volume>
<issue>1</issue>
<fpage>195</fpage>
<lpage>205</lpage>
<pub-id pub-id-type="doi">10.1016/0042-6822(91)90132-U</pub-id>
<pub-id pub-id-type="pmid">1647068</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tonge</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Pashley</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Gant</surname>
<given-names>TW</given-names>
</name>
</person-group>
<article-title>Amplicon–based metagenomic analysis of mixed fungal samples using proton release amplicon sequencing</article-title>
<source>PloS ONE</source>
<year>2014</year>
<volume>9</volume>
<issue>4</issue>
<fpage>93849</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0093849</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abeles</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Robles-Sikisaka</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ly</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lum</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Salzman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Boehm</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Pride</surname>
<given-names>DT</given-names>
</name>
</person-group>
<article-title>Human oral viruses are personal, persistent and gender-consistent</article-title>
<source>ISME J</source>
<year>2014</year>
<volume>8</volume>
<issue>9</issue>
<fpage>1753</fpage>
<lpage>67</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2014.31</pub-id>
<pub-id pub-id-type="pmid">24646696</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<label>34</label>
<mixed-citation publication-type="other">Zawadzki PJ, Perkowski K, Padzik M, Mierzwińska-Nastalska E, Szaflik JP, Conn DB, Chomicz L. Examination of oral microbiota diversity in adults and older adults as an approach to prevent spread of risk factors for human infections. BioMed Res Int. 2017.</mixed-citation>
</ref>
<ref id="CR35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wade</surname>
<given-names>WG</given-names>
</name>
</person-group>
<article-title>The oral microbiome in health and disease</article-title>
<source>Pharmacol Res</source>
<year>2013</year>
<volume>69</volume>
<issue>1</issue>
<fpage>137</fpage>
<lpage>43</lpage>
<pub-id pub-id-type="doi">10.1016/j.phrs.2012.11.006</pub-id>
<pub-id pub-id-type="pmid">23201354</pub-id>
</element-citation>
</ref>
<ref id="CR36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davenport</surname>
<given-names>CF</given-names>
</name>
<name>
<surname>Neugebauer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Beckmann</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Friedrich</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kameri</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kokott</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Paetow</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Siekmann</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wieding-Drewes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wienhöfer</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genometa-a fast and accurate classifier for short metagenomic shotgun reads</article-title>
<source>PLoS ONE</source>
<year>2012</year>
<volume>7</volume>
<issue>8</issue>
<fpage>41224</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0041224</pub-id>
</element-citation>
</ref>
<ref id="CR37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Fast gapped-read alignment with bowtie 2</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>9</volume>
<issue>4</issue>
<fpage>357</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id>
<pub-id pub-id-type="pmid">22388286</pub-id>
</element-citation>
</ref>
<ref id="CR38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buchfink</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>Fast and sensitive protein alignment using diamond</article-title>
<source>Nat Methods</source>
<year>2015</year>
<volume>12</volume>
<issue>1</issue>
<fpage>59</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.3176</pub-id>
<pub-id pub-id-type="pmid">25402007</pub-id>
</element-citation>
</ref>
<ref id="CR39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Angly</surname>
<given-names>FE</given-names>
</name>
<name>
<surname>Willner</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
</person-group>
<article-title>Grinder: a versatile amplicon and shotgun sequence simulator</article-title>
<source>Nucleic Acids Res</source>
<year>2012</year>
<volume>40</volume>
<issue>12</issue>
<fpage>94</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gks251</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000304 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000304 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:6551237
   |texte=   MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:31167634" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021