Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0001599 ( Pmc/Corpus ); précédent : 0001598; suivant : 0001600 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Evaluation of shotgun metagenomics sequence classification methods using
<italic>in silico</italic>
and
<italic>in vitro</italic>
simulated communities</title>
<author>
<name sortKey="Peabody, Michael A" sort="Peabody, Michael A" uniqKey="Peabody M" first="Michael A." last="Peabody">Michael A. Peabody</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Rossum, Thea" sort="Van Rossum, Thea" uniqKey="Van Rossum T" first="Thea" last="Van Rossum">Thea Van Rossum</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lo, Raymond" sort="Lo, Raymond" uniqKey="Lo R" first="Raymond" last="Lo">Raymond Lo</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Brinkman, Fiona S L" sort="Brinkman, Fiona S L" uniqKey="Brinkman F" first="Fiona S. L." last="Brinkman">Fiona S. L. Brinkman</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26537885</idno>
<idno type="pmc">4634789</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634789</idno>
<idno type="RBID">PMC:4634789</idno>
<idno type="doi">10.1186/s12859-015-0788-5</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000159</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Evaluation of shotgun metagenomics sequence classification methods using
<italic>in silico</italic>
and
<italic>in vitro</italic>
simulated communities</title>
<author>
<name sortKey="Peabody, Michael A" sort="Peabody, Michael A" uniqKey="Peabody M" first="Michael A." last="Peabody">Michael A. Peabody</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Rossum, Thea" sort="Van Rossum, Thea" uniqKey="Van Rossum T" first="Thea" last="Van Rossum">Thea Van Rossum</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lo, Raymond" sort="Lo, Raymond" uniqKey="Lo R" first="Raymond" last="Lo">Raymond Lo</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Brinkman, Fiona S L" sort="Brinkman, Fiona S L" uniqKey="Brinkman F" first="Fiona S. L." last="Brinkman">Fiona S. L. Brinkman</name>
<affiliation>
<nlm:aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same
<italic>in silico</italic>
and
<italic>in vitro</italic>
test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions.</p>
</sec>
<sec>
<title>Results</title>
<p>An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both
<italic>in silico</italic>
and
<italic>in vitro</italic>
mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Wooley, Jc" uniqKey="Wooley J">JC Wooley</name>
</author>
<author>
<name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
<author>
<name sortKey="Friedberg, I" uniqKey="Friedberg I">I Friedberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Handelsman, J" uniqKey="Handelsman J">J Handelsman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Acinas, Sg" uniqKey="Acinas S">SG Acinas</name>
</author>
<author>
<name sortKey="Sarma Rupavtarm, R" uniqKey="Sarma Rupavtarm R">R Sarma-Rupavtarm</name>
</author>
<author>
<name sortKey="Klepac Ceraj, V" uniqKey="Klepac Ceraj V">V Klepac-Ceraj</name>
</author>
<author>
<name sortKey="Polz, Mf" uniqKey="Polz M">MF Polz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
<author>
<name sortKey="Hug, La" uniqKey="Hug L">LA Hug</name>
</author>
<author>
<name sortKey="Thomas, Bc" uniqKey="Thomas B">BC Thomas</name>
</author>
<author>
<name sortKey="Sharon, I" uniqKey="Sharon I">I Sharon</name>
</author>
<author>
<name sortKey="Castelle, Cj" uniqKey="Castelle C">CJ Castelle</name>
</author>
<author>
<name sortKey="Singh, A" uniqKey="Singh A">A Singh</name>
</author>
<author>
<name sortKey="Wilkins, Mj" uniqKey="Wilkins M">MJ Wilkins</name>
</author>
<author>
<name sortKey="Wrighton, Kc" uniqKey="Wrighton K">KC Wrighton</name>
</author>
<author>
<name sortKey="Williams, Kh" uniqKey="Williams K">KH Williams</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brady, A" uniqKey="Brady A">A Brady</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ander, C" uniqKey="Ander C">C Ander</name>
</author>
<author>
<name sortKey="Schulz Trieglaff, Ob" uniqKey="Schulz Trieglaff O">OB Schulz-Trieglaff</name>
</author>
<author>
<name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
<author>
<name sortKey="Cox, Aj" uniqKey="Cox A">AJ Cox</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rappe, Ms" uniqKey="Rappe M">MS Rappé</name>
</author>
<author>
<name sortKey="Giovannoni, Sj" uniqKey="Giovannoni S">SJ Giovannoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mohammed, Mh" uniqKey="Mohammed M">MH Mohammed</name>
</author>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
<author>
<name sortKey="Singh, Nk" uniqKey="Singh N">NK Singh</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
<author>
<name sortKey="Waldron, L" uniqKey="Waldron L">L Waldron</name>
</author>
<author>
<name sortKey="Ballarini, A" uniqKey="Ballarini A">A Ballarini</name>
</author>
<author>
<name sortKey="Narasimhan, V" uniqKey="Narasimhan V">V Narasimhan</name>
</author>
<author>
<name sortKey="Jousson, O" uniqKey="Jousson O">O Jousson</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="V Trovsk, T" uniqKey="V Trovsk T">T Větrovský</name>
</author>
<author>
<name sortKey="Baldrian, P" uniqKey="Baldrian P">P Baldrian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, M" uniqKey="Wu M">M Wu</name>
</author>
<author>
<name sortKey="Scott, Aj" uniqKey="Scott A">AJ Scott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Darling, Ae" uniqKey="Darling A">AE Darling</name>
</author>
<author>
<name sortKey="Jospin, G" uniqKey="Jospin G">G Jospin</name>
</author>
<author>
<name sortKey="Lowe, E" uniqKey="Lowe E">E Lowe</name>
</author>
<author>
<name sortKey="Matsen, Fa" uniqKey="Matsen F">FA Matsen</name>
</author>
<author>
<name sortKey="Bik, Hm" uniqKey="Bik H">HM Bik</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amann, R" uniqKey="Amann R">R Amann</name>
</author>
<author>
<name sortKey="Ludwig, W" uniqKey="Ludwig W">W Ludwig</name>
</author>
<author>
<name sortKey="Schleifer, K" uniqKey="Schleifer K">K Schleifer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bazinet, Al" uniqKey="Bazinet A">AL Bazinet</name>
</author>
<author>
<name sortKey="Cummings, Mp" uniqKey="Cummings M">MP Cummings</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Ke" uniqKey="Nelson K">KE Nelson</name>
</author>
<author>
<name sortKey="Weinstock, Gm" uniqKey="Weinstock G">GM Weinstock</name>
</author>
<author>
<name sortKey="Highlander, Sk" uniqKey="Highlander S">SK Highlander</name>
</author>
<author>
<name sortKey="Worley, Kc" uniqKey="Worley K">KC Worley</name>
</author>
<author>
<name sortKey="Creasy, Hh" uniqKey="Creasy H">HH Creasy</name>
</author>
<author>
<name sortKey="Wortman, Jr" uniqKey="Wortman J">JR Wortman</name>
</author>
<author>
<name sortKey="Rusch, Db" uniqKey="Rusch D">DB Rusch</name>
</author>
<author>
<name sortKey="Mitreva, M" uniqKey="Mitreva M">M Mitreva</name>
</author>
<author>
<name sortKey="Sodergren, E" uniqKey="Sodergren E">E Sodergren</name>
</author>
<author>
<name sortKey="Chinwalla, At" uniqKey="Chinwalla A">AT Chinwalla</name>
</author>
<author>
<name sortKey="Feldgarden, M" uniqKey="Feldgarden M">M Feldgarden</name>
</author>
<author>
<name sortKey="Gevers, D" uniqKey="Gevers D">D Gevers</name>
</author>
<author>
<name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author>
<name sortKey="Madupu, R" uniqKey="Madupu R">R Madupu</name>
</author>
<author>
<name sortKey="Ward, Dv" uniqKey="Ward D">DV Ward</name>
</author>
<author>
<name sortKey="Birren, Bw" uniqKey="Birren B">BW Birren</name>
</author>
<author>
<name sortKey="Gibbs, Ra" uniqKey="Gibbs R">RA Gibbs</name>
</author>
<author>
<name sortKey="Methe, B" uniqKey="Methe B">B Methe</name>
</author>
<author>
<name sortKey="Petrosino, Jf" uniqKey="Petrosino J">JF Petrosino</name>
</author>
<author>
<name sortKey="Strausberg, Rl" uniqKey="Strausberg R">RL Strausberg</name>
</author>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="White, Or" uniqKey="White O">OR White</name>
</author>
<author>
<name sortKey="Wilson, Rk" uniqKey="Wilson R">RK Wilson</name>
</author>
<author>
<name sortKey="Durkin, S" uniqKey="Durkin S">S Durkin</name>
</author>
<author>
<name sortKey="Giglio, Mg" uniqKey="Giglio M">MG Giglio</name>
</author>
<author>
<name sortKey="Gujja, S" uniqKey="Gujja S">S Gujja</name>
</author>
<author>
<name sortKey="Howarth, C" uniqKey="Howarth C">C Howarth</name>
</author>
<author>
<name sortKey="Kodira, Cd" uniqKey="Kodira C">CD Kodira</name>
</author>
<author>
<name sortKey="Kyrpides, N" uniqKey="Kyrpides N">N Kyrpides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sunagawa, S" uniqKey="Sunagawa S">S Sunagawa</name>
</author>
<author>
<name sortKey="Mende, Dr" uniqKey="Mende D">DR Mende</name>
</author>
<author>
<name sortKey="Zeller, G" uniqKey="Zeller G">G Zeller</name>
</author>
<author>
<name sortKey="Izquierdo Carrasco, F" uniqKey="Izquierdo Carrasco F">F Izquierdo-Carrasco</name>
</author>
<author>
<name sortKey="Berger, Sa" uniqKey="Berger S">SA Berger</name>
</author>
<author>
<name sortKey="Kultima, Jr" uniqKey="Kultima J">JR Kultima</name>
</author>
<author>
<name sortKey="Coelho, Lp" uniqKey="Coelho L">LP Coelho</name>
</author>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author>
<name sortKey="Tap, J" uniqKey="Tap J">J Tap</name>
</author>
<author>
<name sortKey="Nielsen, Hb" uniqKey="Nielsen H">HB Nielsen</name>
</author>
<author>
<name sortKey="Rasmussen, S" uniqKey="Rasmussen S">S Rasmussen</name>
</author>
<author>
<name sortKey="Brunak, S" uniqKey="Brunak S">S Brunak</name>
</author>
<author>
<name sortKey="Pedersen, O" uniqKey="Pedersen O">O Pedersen</name>
</author>
<author>
<name sortKey="Guarner, F" uniqKey="Guarner F">F Guarner</name>
</author>
<author>
<name sortKey="De Vos, Wm" uniqKey="De Vos W">WM de Vos</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Dore, J" uniqKey="Dore J">J Doré</name>
</author>
<author>
<name sortKey="Ehrlich, Sd" uniqKey="Ehrlich S">SD Ehrlich</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Richter, Dc" uniqKey="Richter D">DC Richter</name>
</author>
<author>
<name sortKey="Ott, F" uniqKey="Ott F">F Ott</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Schmid, R" uniqKey="Schmid R">R Schmid</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oh, S" uniqKey="Oh S">S Oh</name>
</author>
<author>
<name sortKey="Caro Quintero, A" uniqKey="Caro Quintero A">A Caro-Quintero</name>
</author>
<author>
<name sortKey="Tsementzi, D" uniqKey="Tsementzi D">D Tsementzi</name>
</author>
<author>
<name sortKey="Deleon Rodriguez, N" uniqKey="Deleon Rodriguez N">N DeLeon-Rodriguez</name>
</author>
<author>
<name sortKey="Luo, C" uniqKey="Luo C">C Luo</name>
</author>
<author>
<name sortKey="Poretsky, R" uniqKey="Poretsky R">R Poretsky</name>
</author>
<author>
<name sortKey="Konstantinidis, Kt" uniqKey="Konstantinidis K">KT Konstantinidis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghai, R" uniqKey="Ghai R">R Ghai</name>
</author>
<author>
<name sortKey="Rodriguez Valera, F" uniqKey="Rodriguez Valera F">F Rodriguez-Valera</name>
</author>
<author>
<name sortKey="Mcmahon, Kd" uniqKey="Mcmahon K">KD McMahon</name>
</author>
<author>
<name sortKey="Toyama, D" uniqKey="Toyama D">D Toyama</name>
</author>
<author>
<name sortKey="Rinke, R" uniqKey="Rinke R">R Rinke</name>
</author>
<author>
<name sortKey="Cristina Souza De Oliveira, T" uniqKey="Cristina Souza De Oliveira T">T Cristina Souza de Oliveira</name>
</author>
<author>
<name sortKey="Wagner Garcia, J" uniqKey="Wagner Garcia J">J Wagner Garcia</name>
</author>
<author>
<name sortKey="Pellon De Miranda, F" uniqKey="Pellon De Miranda F">F Pellon de Miranda</name>
</author>
<author>
<name sortKey="Henrique Silva, F" uniqKey="Henrique Silva F">F Henrique-Silva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Rj" uniqKey="Smith R">RJ Smith</name>
</author>
<author>
<name sortKey="Jeffries, Tc" uniqKey="Jeffries T">TC Jeffries</name>
</author>
<author>
<name sortKey="Roudnew, B" uniqKey="Roudnew B">B Roudnew</name>
</author>
<author>
<name sortKey="Fitch, Aj" uniqKey="Fitch A">AJ Fitch</name>
</author>
<author>
<name sortKey="Seymour, Jr" uniqKey="Seymour J">JR Seymour</name>
</author>
<author>
<name sortKey="Delpin, Mw" uniqKey="Delpin M">MW Delpin</name>
</author>
<author>
<name sortKey="Newton, K" uniqKey="Newton K">K Newton</name>
</author>
<author>
<name sortKey="Brown, Mh" uniqKey="Brown M">MH Brown</name>
</author>
<author>
<name sortKey="Mitchell, Jg" uniqKey="Mitchell J">JG Mitchell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bolger, Am" uniqKey="Bolger A">AM Bolger</name>
</author>
<author>
<name sortKey="Lohse, M" uniqKey="Lohse M">M Lohse</name>
</author>
<author>
<name sortKey="Usadel, B" uniqKey="Usadel B">B Usadel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garcia Etxebarria, K" uniqKey="Garcia Etxebarria K">K Garcia-Etxebarria</name>
</author>
<author>
<name sortKey="Garcia Garcera, M" uniqKey="Garcia Garcera M">M Garcia-Garcerà</name>
</author>
<author>
<name sortKey="Calafell, F" uniqKey="Calafell F">F Calafell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, De" uniqKey="Wood D">DE Wood</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Ruscheweyh, H J" uniqKey="Ruscheweyh H">H-J Ruscheweyh</name>
</author>
<author>
<name sortKey="Weber, N" uniqKey="Weber N">N Weber</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Gibbons, T" uniqKey="Gibbons T">T Gibbons</name>
</author>
<author>
<name sortKey="Ghodsi, M" uniqKey="Ghodsi M">M Ghodsi</name>
</author>
<author>
<name sortKey="Treangen, T" uniqKey="Treangen T">T Treangen</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
<author>
<name sortKey="Monzoorul Haque, M" uniqKey="Monzoorul Haque M">M Monzoorul Haque</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Ye, Y" uniqKey="Ye Y">Y Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Diaz, Nn" uniqKey="Diaz N">NN Diaz</name>
</author>
<author>
<name sortKey="Krause, L" uniqKey="Krause L">L Krause</name>
</author>
<author>
<name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author>
<name sortKey="Niehaus, K" uniqKey="Niehaus K">K Niehaus</name>
</author>
<author>
<name sortKey="Nattkemper, Tw" uniqKey="Nattkemper T">TW Nattkemper</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fierer, N" uniqKey="Fierer N">N Fierer</name>
</author>
<author>
<name sortKey="Leff, Jw" uniqKey="Leff J">JW Leff</name>
</author>
<author>
<name sortKey="Adams, Bj" uniqKey="Adams B">BJ Adams</name>
</author>
<author>
<name sortKey="Nielsen, Un" uniqKey="Nielsen U">UN Nielsen</name>
</author>
<author>
<name sortKey="Bates, St" uniqKey="Bates S">ST Bates</name>
</author>
<author>
<name sortKey="Lauber, Cl" uniqKey="Lauber C">CL Lauber</name>
</author>
<author>
<name sortKey="Owens, S" uniqKey="Owens S">S Owens</name>
</author>
<author>
<name sortKey="Gilbert, Ja" uniqKey="Gilbert J">JA Gilbert</name>
</author>
<author>
<name sortKey="Wall, Dh" uniqKey="Wall D">DH Wall</name>
</author>
<author>
<name sortKey="Caporaso, Jg" uniqKey="Caporaso J">JG Caporaso</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fukushima, M" uniqKey="Fukushima M">M Fukushima</name>
</author>
<author>
<name sortKey="Kakinuma, K" uniqKey="Kakinuma K">K Kakinuma</name>
</author>
<author>
<name sortKey="Kawaguchi, R" uniqKey="Kawaguchi R">R Kawaguchi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey=" Kstad, Oa" uniqKey=" Kstad O">OA Økstad</name>
</author>
<author>
<name sortKey="Kolst, A B" uniqKey="Kolst A">A-B Kolstø</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Hamada, M" uniqKey="Hamada M">M Hamada</name>
</author>
<author>
<name sortKey="Horton, P" uniqKey="Horton P">P Horton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Xie, C" uniqKey="Xie C">C Xie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buchfink, B" uniqKey="Buchfink B">B Buchfink</name>
</author>
<author>
<name sortKey="Xie, C" uniqKey="Xie C">C Xie</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rogozin, Ib" uniqKey="Rogozin I">IB Rogozin</name>
</author>
<author>
<name sortKey="Makarova, Ks" uniqKey="Makarova K">KS Makarova</name>
</author>
<author>
<name sortKey="Natale, Da" uniqKey="Natale D">DA Natale</name>
</author>
<author>
<name sortKey="Spiridonov, An" uniqKey="Spiridonov A">AN Spiridonov</name>
</author>
<author>
<name sortKey="Tatusov, Rl" uniqKey="Tatusov R">RL Tatusov</name>
</author>
<author>
<name sortKey="Wolf, Yi" uniqKey="Wolf Y">YI Wolf</name>
</author>
<author>
<name sortKey="Yin, J" uniqKey="Yin J">J Yin</name>
</author>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerlach, W" uniqKey="Gerlach W">W Gerlach</name>
</author>
<author>
<name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
<author>
<name sortKey="Mohammed, Mh" uniqKey="Mohammed M">MH Mohammed</name>
</author>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Richter, Dc" uniqKey="Richter D">DC Richter</name>
</author>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Klar, B" uniqKey="Klar B">B Klar</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Rupek, P" uniqKey="Rupek P">P Rupek</name>
</author>
<author>
<name sortKey="Richter, Dc" uniqKey="Richter D">DC Richter</name>
</author>
<author>
<name sortKey="Urich, T" uniqKey="Urich T">T Urich</name>
</author>
<author>
<name sortKey="Gilbert, Ja" uniqKey="Gilbert J">JA Gilbert</name>
</author>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Wilke, A" uniqKey="Wilke A">A Wilke</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Paarmann, D" uniqKey="Paarmann D">D Paarmann</name>
</author>
<author>
<name sortKey="D Ouza, M" uniqKey="D Ouza M">M D’Souza</name>
</author>
<author>
<name sortKey="Olson, R" uniqKey="Olson R">R Olson</name>
</author>
<author>
<name sortKey="Glass, Em" uniqKey="Glass E">EM Glass</name>
</author>
<author>
<name sortKey="Kubal, M" uniqKey="Kubal M">M Kubal</name>
</author>
<author>
<name sortKey="Paczian, T" uniqKey="Paczian T">T Paczian</name>
</author>
<author>
<name sortKey="Rodriguez, A" uniqKey="Rodriguez A">A Rodriguez</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
<author>
<name sortKey="Wilke, A" uniqKey="Wilke A">A Wilke</name>
</author>
<author>
<name sortKey="Wilkening, J" uniqKey="Wilkening J">J Wilkening</name>
</author>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Seshadri, R" uniqKey="Seshadri R">R Seshadri</name>
</author>
<author>
<name sortKey="Kravitz, Sa" uniqKey="Kravitz S">SA Kravitz</name>
</author>
<author>
<name sortKey="Smarr, L" uniqKey="Smarr L">L Smarr</name>
</author>
<author>
<name sortKey="Gilna, P" uniqKey="Gilna P">P Gilna</name>
</author>
<author>
<name sortKey="Frazier, M" uniqKey="Frazier M">M Frazier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S Sun</name>
</author>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Altintas, I" uniqKey="Altintas I">I Altintas</name>
</author>
<author>
<name sortKey="Lin, A" uniqKey="Lin A">A Lin</name>
</author>
<author>
<name sortKey="Peltier, S" uniqKey="Peltier S">S Peltier</name>
</author>
<author>
<name sortKey="Stocks, K" uniqKey="Stocks K">K Stocks</name>
</author>
<author>
<name sortKey="Allen, Ee" uniqKey="Allen E">EE Allen</name>
</author>
<author>
<name sortKey="Ellisman, M" uniqKey="Ellisman M">M Ellisman</name>
</author>
<author>
<name sortKey="Grethe, J" uniqKey="Grethe J">J Grethe</name>
</author>
<author>
<name sortKey="Wooley, J" uniqKey="Wooley J">J Wooley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krause, L" uniqKey="Krause L">L Krause</name>
</author>
<author>
<name sortKey="Diaz, Nn" uniqKey="Diaz N">NN Diaz</name>
</author>
<author>
<name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author>
<name sortKey="Kelley, S" uniqKey="Kelley S">S Kelley</name>
</author>
<author>
<name sortKey="Nattkemper, Tw" uniqKey="Nattkemper T">TW Nattkemper</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerlach, W" uniqKey="Gerlach W">W Gerlach</name>
</author>
<author>
<name sortKey="Junemann, S" uniqKey="Junemann S">S Jünemann</name>
</author>
<author>
<name sortKey="Tille, F" uniqKey="Tille F">F Tille</name>
</author>
<author>
<name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author>
<name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niu, B" uniqKey="Niu B">B Niu</name>
</author>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L Fu</name>
</author>
<author>
<name sortKey="Wu, S" uniqKey="Wu S">S Wu</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, S" uniqKey="Wu S">S Wu</name>
</author>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L Fu</name>
</author>
<author>
<name sortKey="Niu, B" uniqKey="Niu B">B Niu</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Monzoorul Haque, M" uniqKey="Monzoorul Haque M">M Monzoorul Haque</name>
</author>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
<author>
<name sortKey="Komanduri, D" uniqKey="Komanduri D">D Komanduri</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Raymond, F" uniqKey="Raymond F">F Raymond</name>
</author>
<author>
<name sortKey="Godzaridis, E" uniqKey="Godzaridis E">E Godzaridis</name>
</author>
<author>
<name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Olson, R" uniqKey="Olson R">R Olson</name>
</author>
<author>
<name sortKey="Disz, T" uniqKey="Disz T">T Disz</name>
</author>
<author>
<name sortKey="Pusch, Gd" uniqKey="Pusch G">GD Pusch</name>
</author>
<author>
<name sortKey="Vonstein, V" uniqKey="Vonstein V">V Vonstein</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
<author>
<name sortKey="Overbeek, R" uniqKey="Overbeek R">R Overbeek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davenport, Cf" uniqKey="Davenport C">CF Davenport</name>
</author>
<author>
<name sortKey="Neugebauer, J" uniqKey="Neugebauer J">J Neugebauer</name>
</author>
<author>
<name sortKey="Beckmann, N" uniqKey="Beckmann N">N Beckmann</name>
</author>
<author>
<name sortKey="Friedrich, B" uniqKey="Friedrich B">B Friedrich</name>
</author>
<author>
<name sortKey="Kameri, B" uniqKey="Kameri B">B Kameri</name>
</author>
<author>
<name sortKey="Kokott, S" uniqKey="Kokott S">S Kokott</name>
</author>
<author>
<name sortKey="Paetow, M" uniqKey="Paetow M">M Paetow</name>
</author>
<author>
<name sortKey="Siekmann, B" uniqKey="Siekmann B">B Siekmann</name>
</author>
<author>
<name sortKey="Wieding Drewes, M" uniqKey="Wieding Drewes M">M Wieding-Drewes</name>
</author>
<author>
<name sortKey="Wienhofer, M" uniqKey="Wienhofer M">M Wienhöfer</name>
</author>
<author>
<name sortKey="Wolf, S" uniqKey="Wolf S">S Wolf</name>
</author>
<author>
<name sortKey="Tummler, B" uniqKey="Tummler B">B Tümmler</name>
</author>
<author>
<name sortKey="Ahlers, V" uniqKey="Ahlers V">V Ahlers</name>
</author>
<author>
<name sortKey="Sprengel, F" uniqKey="Sprengel F">F Sprengel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ames, Sk" uniqKey="Ames S">SK Ames</name>
</author>
<author>
<name sortKey="Hysom, Da" uniqKey="Hysom D">DA Hysom</name>
</author>
<author>
<name sortKey="Gardner, Sn" uniqKey="Gardner S">SN Gardner</name>
</author>
<author>
<name sortKey="Lloyd, Gs" uniqKey="Lloyd G">GS Lloyd</name>
</author>
<author>
<name sortKey="Gokhale, Mb" uniqKey="Gokhale M">MB Gokhale</name>
</author>
<author>
<name sortKey="Allen, Je" uniqKey="Allen J">JE Allen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berendzen, J" uniqKey="Berendzen J">J Berendzen</name>
</author>
<author>
<name sortKey="Bruno, Wj" uniqKey="Bruno W">WJ Bruno</name>
</author>
<author>
<name sortKey="Cohn, Jd" uniqKey="Cohn J">JD Cohn</name>
</author>
<author>
<name sortKey="Hengartner, Nw" uniqKey="Hengartner N">NW Hengartner</name>
</author>
<author>
<name sortKey="Kuske, Cr" uniqKey="Kuske C">CR Kuske</name>
</author>
<author>
<name sortKey="Mcmahon, Bh" uniqKey="Mcmahon B">BH McMahon</name>
</author>
<author>
<name sortKey="Wolinsky, Ma" uniqKey="Wolinsky M">MA Wolinsky</name>
</author>
<author>
<name sortKey="Xie, G" uniqKey="Xie G">G Xie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharma, Vk" uniqKey="Sharma V">VK Sharma</name>
</author>
<author>
<name sortKey="Kumar, N" uniqKey="Kumar N">N Kumar</name>
</author>
<author>
<name sortKey="Prakash, T" uniqKey="Prakash T">T Prakash</name>
</author>
<author>
<name sortKey="Taylor, Td" uniqKey="Taylor T">TD Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiang, H" uniqKey="Jiang H">H Jiang</name>
</author>
<author>
<name sortKey="An, L" uniqKey="An L">L An</name>
</author>
<author>
<name sortKey="Lin, Sm" uniqKey="Lin S">SM Lin</name>
</author>
<author>
<name sortKey="Feng, G" uniqKey="Feng G">G Feng</name>
</author>
<author>
<name sortKey="Qiu, Y" uniqKey="Qiu Y">Y Qiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Porter, Ms" uniqKey="Porter M">MS Porter</name>
</author>
<author>
<name sortKey="Beiko, Rg" uniqKey="Beiko R">RG Beiko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freitas, Tak" uniqKey="Freitas T">TAK Freitas</name>
</author>
<author>
<name sortKey="Li, P E" uniqKey="Li P">P-E Li</name>
</author>
<author>
<name sortKey="Scholz, Mb" uniqKey="Scholz M">MB Scholz</name>
</author>
<author>
<name sortKey="Chain, Psg" uniqKey="Chain P">PSG Chain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ounit, R" uniqKey="Ounit R">R Ounit</name>
</author>
<author>
<name sortKey="Wanamaker, S" uniqKey="Wanamaker S">S Wanamaker</name>
</author>
<author>
<name sortKey="Close, Tj" uniqKey="Close T">TJ Close</name>
</author>
<author>
<name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Tringe, Sg" uniqKey="Tringe S">SG Tringe</name>
</author>
<author>
<name sortKey="Doerks, T" uniqKey="Doerks T">T Doerks</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Ward, N" uniqKey="Ward N">N Ward</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stark, M" uniqKey="Stark M">M Stark</name>
</author>
<author>
<name sortKey="Berger, Sa" uniqKey="Berger S">SA Berger</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, M" uniqKey="Wu M">M Wu</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kerepesi, C" uniqKey="Kerepesi C">C Kerepesi</name>
</author>
<author>
<name sortKey="Banky, D" uniqKey="Banky D">D Bánky</name>
</author>
<author>
<name sortKey="Grolmusz, V" uniqKey="Grolmusz V">V Grolmusz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brady, A" uniqKey="Brady A">A Brady</name>
</author>
<author>
<name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parks, Dh" uniqKey="Parks D">DH Parks</name>
</author>
<author>
<name sortKey="Macdonald, Nj" uniqKey="Macdonald N">NJ MacDonald</name>
</author>
<author>
<name sortKey="Beiko, Rg" uniqKey="Beiko R">RG Beiko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macdonald, Nj" uniqKey="Macdonald N">NJ MacDonald</name>
</author>
<author>
<name sortKey="Parks, Dh" uniqKey="Parks D">DH Parks</name>
</author>
<author>
<name sortKey="Beiko, Rg" uniqKey="Beiko R">RG Beiko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klingenberg, H" uniqKey="Klingenberg H">H Klingenberg</name>
</author>
<author>
<name sortKey="A Hauer, Kp" uniqKey="A Hauer K">KP Aßhauer</name>
</author>
<author>
<name sortKey="Lingner, T" uniqKey="Lingner T">T Lingner</name>
</author>
<author>
<name sortKey="Meinicke, P" uniqKey="Meinicke P">P Meinicke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reddy, Rm" uniqKey="Reddy R">RM Reddy</name>
</author>
<author>
<name sortKey="Mohammed, Mh" uniqKey="Mohammed M">MH Mohammed</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patil, Kr" uniqKey="Patil K">KR Patil</name>
</author>
<author>
<name sortKey="Haider, P" uniqKey="Haider P">P Haider</name>
</author>
<author>
<name sortKey="Pope, Pb" uniqKey="Pope P">PB Pope</name>
</author>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Morrison, M" uniqKey="Morrison M">M Morrison</name>
</author>
<author>
<name sortKey="Scheffer, T" uniqKey="Scheffer T">T Scheffer</name>
</author>
<author>
<name sortKey="Mchardy, Ac" uniqKey="Mchardy A">AC McHardy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patil, Kr" uniqKey="Patil K">KR Patil</name>
</author>
<author>
<name sortKey="Roune, L" uniqKey="Roune L">L Roune</name>
</author>
<author>
<name sortKey="Mchardy, Ac" uniqKey="Mchardy A">AC McHardy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosen, G" uniqKey="Rosen G">G Rosen</name>
</author>
<author>
<name sortKey="Garbarine, E" uniqKey="Garbarine E">E Garbarine</name>
</author>
<author>
<name sortKey="Caseiro, D" uniqKey="Caseiro D">D Caseiro</name>
</author>
<author>
<name sortKey="Polikar, R" uniqKey="Polikar R">R Polikar</name>
</author>
<author>
<name sortKey="Sokhansanj, B" uniqKey="Sokhansanj B">B Sokhansanj</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosen, Gl" uniqKey="Rosen G">GL Rosen</name>
</author>
<author>
<name sortKey="Reichenberger, Er" uniqKey="Reichenberger E">ER Reichenberger</name>
</author>
<author>
<name sortKey="Rosenfeld, Am" uniqKey="Rosenfeld A">AM Rosenfeld</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nalbantoglu, Ou" uniqKey="Nalbantoglu O">OU Nalbantoglu</name>
</author>
<author>
<name sortKey="Way, Sf" uniqKey="Way S">SF Way</name>
</author>
<author>
<name sortKey="Hinrichs, Sh" uniqKey="Hinrichs S">SH Hinrichs</name>
</author>
<author>
<name sortKey="Sayood, K" uniqKey="Sayood K">K Sayood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pati, A" uniqKey="Pati A">A Pati</name>
</author>
<author>
<name sortKey="Heath, Ls" uniqKey="Heath L">LS Heath</name>
</author>
<author>
<name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
<author>
<name sortKey="Ivanova, N" uniqKey="Ivanova N">N Ivanova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mohammed, Mh" uniqKey="Mohammed M">MH Mohammed</name>
</author>
<author>
<name sortKey="Ghosh, Ts" uniqKey="Ghosh T">TS Ghosh</name>
</author>
<author>
<name sortKey="Reddy, Rm" uniqKey="Reddy R">RM Reddy</name>
</author>
<author>
<name sortKey="Reddy, Cvsk" uniqKey="Reddy C">CVSK Reddy</name>
</author>
<author>
<name sortKey="Singh, Nk" uniqKey="Singh N">NK Singh</name>
</author>
<author>
<name sortKey="Mande, Ss" uniqKey="Mande S">SS Mande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rasheed, Z" uniqKey="Rasheed Z">Z Rasheed</name>
</author>
<author>
<name sortKey="Rangwala, H" uniqKey="Rangwala H">H Rangwala</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Yang, H" uniqKey="Yang H">H Yang</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Zhao, F" uniqKey="Zhao F">F Zhao</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, F" uniqKey="Yu F">F Yu</name>
</author>
<author>
<name sortKey="Sun, Y" uniqKey="Sun Y">Y Sun</name>
</author>
<author>
<name sortKey="Liu, L" uniqKey="Liu L">L Liu</name>
</author>
<author>
<name sortKey="Farmerie, W" uniqKey="Farmerie W">W Farmerie</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26537885</article-id>
<article-id pub-id-type="pmc">4634789</article-id>
<article-id pub-id-type="publisher-id">788</article-id>
<article-id pub-id-type="doi">10.1186/s12859-015-0788-5</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Evaluation of shotgun metagenomics sequence classification methods using
<italic>in silico</italic>
and
<italic>in vitro</italic>
simulated communities</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Peabody</surname>
<given-names>Michael A.</given-names>
</name>
<address>
<email>map1@sfu.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Van Rossum</surname>
<given-names>Thea</given-names>
</name>
<address>
<email>tva4@sfu.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lo</surname>
<given-names>Raymond</given-names>
</name>
<address>
<email>raymondl@sfu.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Brinkman</surname>
<given-names>Fiona S. L.</given-names>
</name>
<address>
<email>brinkman@sfu.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<aff id="Aff1">Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>4</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>4</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>16</volume>
<elocation-id>363</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>6</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>10</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© Peabody et al. 2015</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same
<italic>in silico</italic>
and
<italic>in vitro</italic>
test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions.</p>
</sec>
<sec>
<title>Results</title>
<p>An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both
<italic>in silico</italic>
and
<italic>in vitro</italic>
mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Metagenomics</kwd>
<kwd>Evaluation</kwd>
<kwd>Accuracy</kwd>
<kwd>Comparison</kwd>
<kwd>Taxonomic classification</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2015</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Metagenomics involves collecting samples from an environment (water, saliva, etc.) and then extracting and studying the genetic material from the microorganisms present in these samples [
<xref ref-type="bibr" rid="CR1">1</xref>
]. This approach is transforming microbiology, ecology, medicine, and other research areas investigating various microbiomes, allowing us to analyze for the first time microbial species, including those not culturable, at a level of detail not previously possible [
<xref ref-type="bibr" rid="CR2">2</xref>
]. Metagenomics sequence reads can be taxonomically classified to identify the microbes, or functionally classified (gene functions, metabolic pathways, etc.) to identify the functional potential of the community. There exist two general approaches for characterizing the taxonomic content of environmental samples: (1) sequencing of PCR amplicons corresponding to phylogenetic marker genes (e.g. 16S rRNA; “amplicon analysis”); (2) shotgun sequencing whereby all genomic DNA in the community is sequenced. A drawback of the shotgun sequencing approach is increased cost, but advantages include the ability to gain insights into metabolism and gene function through functional classification, and the avoidance of potentially biased amplification steps [
<xref ref-type="bibr" rid="CR3">3</xref>
]. Furthermore, a notable subset of taxa cannot be captured by traditional 16S sequencing owing to divergent 16S rRNA gene sequences [
<xref ref-type="bibr" rid="CR4">4</xref>
]. This, combined with the continuing decrease in cost of sequencing, may result in shotgun metagenomics becoming increasingly used for the taxonomic classification of microbial communities.</p>
<p>Taxonomic classification methods generally fall into four categories, reflecting their different strategies: (1) sequence similarity based methods, which use the results of a sequence similarity search against a database of a reference set of sequences, (2) sequence composition based methods, which are based on characteristics of their nucleotide composition (e.g. tetranucleotide usage or codon usage) [
<xref ref-type="bibr" rid="CR5">5</xref>
], (3) hybrid methods which incorporate components of the first two, and (4) marker-based methods which identify species based on the occurrence of certain specific marker sequences. Composition methods generate models from the reference organisms’ genomes, and will classify the input sequence reads based on which model(s) fit the read best. They have had trouble with classifying reads of short length (<1000 base pairs), with Phymm being the first method published demonstrating reasonable accuracy at short read lengths [
<xref ref-type="bibr" rid="CR6">6</xref>
]. Sequence similarity based methods, on the other hand, perform very well at identifying reads from genomes within the reference database that they search against, even at read lengths as short as 80 base pairs [
<xref ref-type="bibr" rid="CR7">7</xref>
]. However, many reads from metagenomics samples come from genomes that are not in any reference database [
<xref ref-type="bibr" rid="CR8">8</xref>
]. Similarity based methods have traditionally used BLAST [
<xref ref-type="bibr" rid="CR9">9</xref>
], and have been generally slower to run compared to composition based methods. Hybrid methods combine the similarity approach and the composition approach, with the goal of improving classification or speed. For improving classification, scores may be combined from both the similarity portion and the composition portion of the method for each prediction [
<xref ref-type="bibr" rid="CR6">6</xref>
]. Another hybrid strategy, aimed at increasing speed, is to use the composition approach to narrow down the set of candidate organisms, and thus have the similarity search occur against a fraction of the original database [
<xref ref-type="bibr" rid="CR10">10</xref>
].</p>
<p>A related group of methods try to determine community composition from metagenomes by utilizing marker genes. These methods differ from methods that perform taxonomic classification, as they do not to try to classify all of the reads. Instead, they focus on classifying only marker genes to try to determine the microbial community composition of the sample. Most marker based approaches utilize universal genes. However, another approach, utilized by MetaPhlAn, involves use of clade-specific marker genes [
<xref ref-type="bibr" rid="CR11">11</xref>
].</p>
<p>The first step in a marker based approach is to identify reads that hit to one of the markers. As the size of the reference database of markers these methods use is relatively small, these methods are comparatively quick to run. In addition to focusing on a limited set of markers, which greatly reduce the computational cost of analysis, these methods are not affected by differences in genome size. If the goal of the analysis is to identify the community composition of the sample, taxonomic classification methods are biased by genome sizes, as organisms with larger genomes will generate more reads. Amplicon sequencing using the 16S rRNA gene also suffers bias due to variability in 16S rRNA copy number [
<xref ref-type="bibr" rid="CR12">12</xref>
]. Thus, marker based approaches using shotgun metagenomics sequencing data may provide the least biased relative abundance information for organisms in the community.</p>
<sec id="Sec2">
<title>Tools vary in several additional characteristics which may influence researcher’s choice</title>
<p>In addition to the class of method, there are many other characteristics which may affect the consideration of which method to use. For example, whether a method is available via a GUI, command line, or web server can be an important consideration, as is whether the method can also perform functional (gene function) classification, or how much memory and compute time the method requires. In addition, some methods are limited to certain groups of microbes. Some methods, such as AMPHORA2 [
<xref ref-type="bibr" rid="CR13">13</xref>
], are limited to analysis of Bacteria and Archaea. Others, such as PhyloSift [
<xref ref-type="bibr" rid="CR14">14</xref>
], can additionally predict Viruses and Eukaryotes. Furthermore, some methods continue to be supported while others are not, and some eventually become unavailable or difficult to access.</p>
<p>Another distinction that can be made is between methods which are rank-flexible, versus rank-specific. Rank-flexible methods vary the rank at which reads are predicted by classifying each read to the lowest taxonomic level at which the given method is confident. An example of a simple rank-flexible method is the lowest common ancestor (LCA) approach, first used by MEGAN [
<xref ref-type="bibr" rid="CR15">15</xref>
]. This approach takes the set of taxa that the read hit in the similarity search (taking only those hits scoring within a threshold of the top hit), and assigns the read to the LCA of this set. In contrast, rank-specific methods give the same rank predictions for all reads.</p>
</sec>
<sec id="Sec3">
<title>Clade exclusion is an important technique to evaluate how well methods will perform on environmental samples</title>
<p>Sequence similarity based methods perform very well when identifying query reads identical to genomes/sequences within the reference database that they search against. However, because the majority of microorganisms have not had their genome sequenced, in most environments many of the sequence reads that would be generated in a metagenomics experiment would be quite unrelated to any sequences that are in a reference database, or at minimum not identical [
<xref ref-type="bibr" rid="CR16">16</xref>
]. Thus, one of the approaches used in the evaluation of taxonomic classifiers is clade-level exclusion. This involves removing all sequences from a database at a certain taxonomic level and then evaluating the ability to make predictions at higher taxonomic levels. For example, if performing species level exclusion for
<italic>Pseudomonas aeruginosa</italic>
, all
<italic>Pseudomonas aeruginosa</italic>
genome sequences would be removed from the reference database and/or models of the methods being evaluated. Then, the method’s ability to classify reads from
<italic>Pseudomonas aeruginosa</italic>
at higher taxonomic levels (i.e.,
<italic>Pseudomonas</italic>
,
<italic>Pseudomonadaceae</italic>
, etc.) would be evaluated. Such clade exclusion methodology is one way to avoid obtaining artificially high accuracy levels caused by the problem of testing and training with identical data.</p>
</sec>
<sec id="Sec4">
<title>The present work builds upon a previous evaluation performed without clade exclusion</title>
<p>There has been one previous evaluation of metagenomics bioinformatics methods reported that is not limited to examining a small set of tools with its own tool [
<xref ref-type="bibr" rid="CR17">17</xref>
]. This study was an important first step in comparing many metagenomics classification tools; however, the microbial genomes used in the analysis were found in the reference databases and training sets of the methods evaluated. This means that the accuracy of the methods shown from the study will be considerably higher than when they are used to classify reads from organisms not in the reference databases or training sets. Samples from most environments, such as soil, ocean, and freshwater samples, are very diverse and the majority of organisms existing in these environments have not been characterized. The human gut is an environment in which intense research interest has resulted in substantial effort to sequence relevant microbes [
<xref ref-type="bibr" rid="CR18">18</xref>
]; however, even in the human gut, it appears that the majority of species are not present in reference databases [
<xref ref-type="bibr" rid="CR19">19</xref>
]. In addition, the previous comparison relied solely on
<italic>in silico</italic>
simulated reads. As sequence simulators cannot capture all of the factors that may affect read sampling in metagenomics,
<italic>in vitro</italic>
communities (i.e., samples of known bacterial cultures spiked into distilled water and sequenced) are an important complementary set of data to evaluate methods on. An unpublished study was recently made publicly available, which includes an evaluation using
<italic>in silico</italic>
evolved genomes [
<xref ref-type="bibr" rid="CR20">20</xref>
]. This approach, with its artificially evolved sequences, complements the clade exclusion approach taken here where we use both computationally simulated and real sequences. One additional notable difference is that their evaluation looked only at the phylum level classifications, whereas this study looks at classifications at all taxonomic levels. Furthermore, they constructed their communities to contain only 5 % taxonomically novel (artificially evolved sequences). Therefore, the results are not comparable to our evaluations using clade exclusion where all of the sequences are from genomes not in the reference databases of the methods, and where performance is based on classification at all taxonomic levels rather than just at the phylum level.</p>
<p>In the present study, a variety of metagenomic taxonomic classification methods are evaluated on mock communities simulated both
<italic>in silico</italic>
and
<italic>in vitro</italic>
(distilled water spiked with known bacteria from pure culture, and sequenced). The performance of the methods in terms of their sensitivity, precision, and number of incorrectly predicted species are analyzed. In addition, the performance of the methods is compared as simulated read length is increased, and level of clade exclusion is varied. Methods evaluated more fully were chosen to encompass the range of types of methods available, as well as based on their popularity, and amenability to clade exclusion. We demonstrate how the accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios, rather the results illustrate the strengths and weaknesses of different methods for different purposes—information critical for researchers to be aware of when performing their particular analysis.</p>
</sec>
</sec>
<sec id="Sec5">
<title>Methods</title>
<sec id="Sec6">
<title>Simulation of MetaSimHC and freshwater
<italic>in silico</italic>
and
<italic>in vitro</italic>
datasets</title>
<p>Two different microbial communities were used for this evaluation, both made up of diverse taxa for which completed genome sequences were available. The first was previously proposed as a “high complexity” dataset in [
<xref ref-type="bibr" rid="CR21">21</xref>
], and will be referred to as MetaSimHC. This was chosen since it has been proposed to be a reference dataset for analysis of methods, and consists of diverse microbial species covering several phyla of both Bacteria and Archaea. The second was chosen with the aim of having a set of species commonly found in freshwater, suitable as a control for a watershed metagenomics project we participated in [
<xref ref-type="bibr" rid="CR22">22</xref>
]. This was done by identifying species that were common among several publicly available freshwater datasets [
<xref ref-type="bibr" rid="CR23">23</xref>
<xref ref-type="bibr" rid="CR25">25</xref>
], and will be referred to as FW (freshwater). The organisms used in each of these datasets can be found in Table 
<xref rid="Tab1" ref-type="table">1</xref>
. Both of these datasets were simulated using MetaSim (version 0.9.5; [
<xref ref-type="bibr" rid="CR21">21</xref>
]) at sequence lengths of 100, 250, 500, and 1000 bp, with each organism at 1X coverage. Although the sets of sequences of differing read length were generated independently, they are generated at 1X coverage so the effects of sampling only portions of genomes that are predicted particularly well or poorly should be mitigated. No error model was used, because there was not an error model for Illumina reads at the longer read lengths (500 and 1000), and we wanted to be consistent as read length was varied. Also, the
<italic>in vitro</italic>
dataset gives us data off of an actual sequencer which allows us to see how methods perform on data with real sequencing errors. Clade exclusion was performed at the level of species, genus, family, order, and class. The FW dataset was simulated both with MetaSim (FW
<italic>in silico</italic>
) and an
<italic>in vitro</italic>
mock community (FW
<italic>in vitro</italic>
). To construct the FW
<italic>in vitro</italic>
, the bacteria were grown up in pure culture, and then their DNA were extracted and spiked in equal concentrations into sterile, distilled water for sequencing. All complete bacterial and archaeal genomes were downloaded from NCBI on June 17, 2013, for the creation of databases and supervised models used in the different methods. The numbers of genomes left in the databases and training sets of the methods in the evaluation scenarios are shown in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1. The datasets used in these evaluation scenarios have been deposited to the MG-RAST database and accession numbers can be found in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S2, and the number of reads simulated from each organism for the
<italic>in silico</italic>
datasets can be found in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S3. Note that while certainly test datasets could be constructed using a larger number of species, it is non-trivial to construct a similar
<italic>in vitro</italic>
<italic>,</italic>
mock community dataset using a high number of species. We purposefully constructed our dataset to contain taxa with a variety of levels of divergence from one another, including closely related species (i.e. multiple species from the
<italic>Pseudomonas</italic>
genera). The latter helps evaluate the ability of methods to handle taxa prediction when closely related taxa are present.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Microbes used in the 2 simulated mock communities</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="3">MetaSimHC
<sup>a</sup>
</th>
<th colspan="3">Freshwater
<sup>b</sup>
(FW)
<italic>in silico</italic>
and
<italic>in vitro</italic>
</th>
</tr>
<tr>
<th>Genus</th>
<th>Species</th>
<th>Strain</th>
<th>Genus</th>
<th>Species</th>
<th>Strain</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<italic>Agrobacterium</italic>
</td>
<td>
<italic>tumefaciens</italic>
</td>
<td>C58</td>
<td>
<italic>Bacillus</italic>
</td>
<td>
<italic>amyloliquefaciens</italic>
</td>
<td>FZB42</td>
</tr>
<tr>
<td>
<italic>Anabaena</italic>
</td>
<td>
<italic>variabilis</italic>
</td>
<td>ATCC 29413</td>
<td>
<italic>Bacillus</italic>
</td>
<td>
<italic>cereus</italic>
</td>
<td>ATCC 14579</td>
</tr>
<tr>
<td>
<italic>Archaeoglobus</italic>
</td>
<td>
<italic>fulgidus</italic>
</td>
<td>DSM 4304</td>
<td>
<italic>Burkholderia</italic>
</td>
<td>
<italic>cenocepacia</italic>
</td>
<td>J2315</td>
</tr>
<tr>
<td>
<italic>Bdellovibrio</italic>
</td>
<td>
<italic>bacteriovorus</italic>
</td>
<td>HD100</td>
<td>
<italic>Escherichia</italic>
</td>
<td>
<italic>coli</italic>
</td>
<td>K-12</td>
</tr>
<tr>
<td>
<italic>Campylobacter</italic>
</td>
<td>
<italic>jejuni</italic>
</td>
<td>81–176</td>
<td>
<italic>Frankia</italic>
</td>
<td>
<italic>sp.</italic>
</td>
<td>CcI3</td>
</tr>
<tr>
<td>
<italic>Clostridium</italic>
</td>
<td>
<italic>acetobutylicum</italic>
</td>
<td>ATCC 824</td>
<td>
<italic>Micrococcus</italic>
</td>
<td>
<italic>luteus</italic>
</td>
<td>NCTC 2665</td>
</tr>
<tr>
<td>
<italic>Lactococcus</italic>
</td>
<td>
<italic>lactis</italic>
</td>
<td>SK11</td>
<td>
<italic>Pseudomonas</italic>
</td>
<td>
<italic>aeruginosa</italic>
</td>
<td>PAO1</td>
</tr>
<tr>
<td>
<italic>Nitrosomonas</italic>
</td>
<td>
<italic>europaea</italic>
</td>
<td>ATCC 19718</td>
<td>
<italic>Pseudomonas</italic>
</td>
<td>
<italic>aeruginosa</italic>
</td>
<td>UCBPP-PA14</td>
</tr>
<tr>
<td>
<italic>Pseudomonas</italic>
</td>
<td>
<italic>aeruginosa</italic>
</td>
<td>PA7</td>
<td>
<italic>Pseudomonas</italic>
</td>
<td>
<italic>fluorescens</italic>
</td>
<td>Pf-5</td>
</tr>
<tr>
<td>
<italic>Streptomyces</italic>
</td>
<td>
<italic>coelicolor</italic>
</td>
<td>A3(2)</td>
<td>
<italic>Pseudomonas</italic>
</td>
<td>
<italic>putida</italic>
</td>
<td>KT2440</td>
</tr>
<tr>
<td>
<italic>Sulfolobus</italic>
</td>
<td>
<italic>tokodaii</italic>
</td>
<td>str. 7</td>
<td>
<italic>Rhodobacter</italic>
</td>
<td>
<italic>capsulatus</italic>
</td>
<td>SB 1003</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>
<italic>Streptomyces</italic>
</td>
<td>
<italic>coelicolor</italic>
</td>
<td>A3(2)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>a</sup>
MetaSimHC is a test dataset of 11 diverse microbial genomes covering several phyla of Bacteria and Archaea proposed in [
<xref ref-type="bibr" rid="CR21">21</xref>
]</p>
<p>
<sup>b</sup>
Freshwater (FW) is a set of bacterial genomes found in previous freshwater metagenomics studies (see
<xref rid="Sec5" ref-type="sec">Methods</xref>
)</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>Because there is such a large difference in microbial communities (e.g. soil versus acid mine drainage) in terms of number of organisms, which organisms are present, their taxonomic novelty, and diversity in terms of abundance distribution, it is not possible to simulate communities that will be appropriate for all environmental communities. This is why we suggest researchers test their own mock communities that approximate their expected community.</p>
</sec>
<sec id="Sec7">
<title>Laboratory preparation and sequencing of the mock freshwater
<italic>in vitro</italic>
community</title>
<p>
<italic>Bacillus amyloliquefaciens</italic>
FZB42 (ATCC# 23842),
<italic>Bacillus cereus</italic>
(ATCC# 14579),
<italic>Escherichia coli</italic>
K12 (ATCC# 23716),
<italic>Micrococcus luteus</italic>
NCTC 2665 (ATCC# 4698),
<italic>Pseudomonas fluorescens</italic>
Pf-5 (ATCC# BAA-477), and
<italic>Pseudomonas putida</italic>
KT2440 (ATCC# 47054) were obtained as freeze-dried stocks and used per recommended protocol to start cultures in prescribed media.
<italic>Burkholderia cenocepacia</italic>
J2315 was cultured in Luria broth at 37 °C.
<italic>Frankia</italic>
sp. CcI3 was grown in liquid
<italic>Frankia</italic>
defined minimal medium (FDM) in stationary culture at 30 °C for 1 week.
<italic>Pseudomonas aeruginosa</italic>
UCBPP-PA14 was cultured in Luria-Bertani broth at 37 °C.
<italic>Rhodobacter capsulatus</italic>
SB 1003 was cultured on 0.3 % yeast extract, 0.3 % bactopeptone, CaCl
<sub>2</sub>
(1 mM) and MgSO
<sub>4</sub>
(1 mM) at 30 °C.
<italic>Streptomyces coelicolor</italic>
A3 was cultured in 0.5 % Tryptone, 0.3 % yeast extract, pH 7.1 at 28 °C for 1 week. For each of the strains of bacteria, after they were plated on the appropriate media, single colonies were picked. These were cultured overnight in 3 ml of appropriate media at the appropriate temperature (as above).
<italic>Frankia</italic>
sp. CcI3 and
<italic>Pseudomonas aeruginosa</italic>
UCBPP-PA14 were cultured for several days until they reached stationary phase. The other bacteria strains were fast growing, so the starter cultures were diluted 1:100, and grown with vigorous shaking (250 rpm) to saturation overnight. Genomic DNA was extracted from these cultures with the NucleoSpin Tissue kit from Macherey-Nagel according to manufacturer’s instructions. For Gram-positive bacteria, cells were pre-incubated with buffer containing 20 mg/ml lysozyme for an hour at 37 °C, followed by Proteinase K at 56 °C until complete lysis was obtained. The library was prepared using a Nextera XT DNA sample preparation kit following the manufacturer’s instructions. This library was sequenced with a MiSeq platform using a V2 500 cycles kit.</p>
</sec>
<sec id="Sec8">
<title>Quality control of sequenced reads</title>
<p>Trimmomatic-0.25 [
<xref ref-type="bibr" rid="CR26">26</xref>
] was used to (1) trim reads using a sliding window of 15 and PHRED quality score of Q < =20, followed by (2) checking if any of the last 5 bases had a Q < =5, and if so removing up to that base, and finally (3) filtering out any reads with length <85 bases. After quality control, there were 300,969 reads with an average length of 223 nucleotides.</p>
</sec>
<sec id="Sec9">
<title>Evaluation of methods and metrics</title>
<p>Performance metrics used to evaluate different software are sensitivity, precision, taxonomic distance, and running time. Sensitivity and precision are calculated based on the numbers of true-positives (TP), false-positives (FP), and false-negatives (FN). True-positives are the number of reads assigned correctly, false-positives are the number of reads assigned incorrectly, and false-negatives are the number of reads unassigned. Sensitivity was calculated as TP/(TP + FN), and precision as TP/(TP + FP). Taxonomic distance was calculated from correctly assigned reads as the average number of ranks above the best possible rank the assignment could be made at, and running time as the number of minutes taken for the program to complete classification. For sensitivity, precision, and taxonomic distance, the values were averaged over all the species in the test dataset. This gave equal weighting to all of the species in the datasets; otherwise, the species with larger genomes (which have more reads) would have a larger influence on the scores. For the
<italic>in silico</italic>
datasets, reads were categorized as correctly assigned (TP) if they classified to a node (taxonomic rank) that was anywhere in the path from the correct species to the superkingdom level (e.g. Bacteria) of the NCBI taxonomic tree, and as incorrect if the read was assigned to a node that was not in this path. In the case where overpredictions were considered correct, the taxonomic level that was used to determine if a read was classified correctly was the best possible correct level that could be predicted. For example, under species clade exclusion, reads would still be classified as correct if they were in the correct genus but classified to an incorrect species. Although most of the methods evaluated were rank-flexible in their predictions, RITA and PhymmBL are rank-specific, and thus were only shown for the evaluation where overpredictions were considered correct. Although RITA does have a rank-flexible mode, it requires having 16S rDNA profiles of a community. PhymmBL provides a confidence score which in theory could provide a cut-off for which rank to assign the reads; however, we would have had to choose the cut-offs ourselves, and previous researchers have found confidence scores to be high for a false positive dataset [
<xref ref-type="bibr" rid="CR27">27</xref>
]. MG-RAST was evaluated due to the popularity of the method, but because it does not allow the user to create custom clade exclusion reference databases, it is an example of a method where we were only able to evaluate it without clade exclusion.</p>
<p>Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S4 lists the version numbers of all of the methods evaluated. All methods were run with default parameters except for filtered Kraken [
<xref ref-type="bibr" rid="CR28">28</xref>
] which was run using the kraken-filter script with a threshold of 0.20, which moves assignments up to successfully higher levels of the taxonomic tree until the threshold is reached. This separate analysis was done because we noticed that Kraken was tending to overclassify reads and there was an option that would help assign reads with greater confidence. Note that some methods have variations in the way they can be run. For example, some methods can take a variety of similarity search programs as input, or have the option to utilize paired-end sequence read information. In some cases these variations had relatively small differences in sensitivity, precision, and taxonomic distance of methods, and in these cases only one of the variants was presented in the figures to be concise. Briefly, MEGAN4 [
<xref ref-type="bibr" rid="CR29">29</xref>
] has the option to allow the use of paired-end information from sequence reads, and the paired-end version is presented; MetaPhyler [
<xref ref-type="bibr" rid="CR30">30</xref>
] can use BLASTX, BLASTN, or a combination of the results, and the results for the BLASTX/BLASTN combination are presented; MEGAN4 and DiScRIBinATE [
<xref ref-type="bibr" rid="CR31">31</xref>
] have the option of taking results as input from either RAPSearch2 [
<xref ref-type="bibr" rid="CR32">32</xref>
] or BLASTX, and the RAPSearch2 versions are presented. RAPSearch2 is an alternative to BLAST, which we found to run over 30 times faster than BLASTX, with comparable accuracy (see
<xref rid="Sec10" ref-type="sec">Results</xref>
).</p>
</sec>
</sec>
<sec id="Sec10">
<title>Results</title>
<p>Table 
<xref rid="Tab2" ref-type="table">2</xref>
provides an overview of methods and their features, grouped by their class. Note that it does not include all methods available, and there are more methods being continually published. Included is the number of citations each method has received, to give an indication of how much of an influence or use each method has. However, it should be noted that several of the methods have capabilities beyond just classification, such as comparisons between samples and visualization, and thus may be cited when used for purposes other than classification. Also, it is worth noting that methods that were published earlier may be highly cited, yet newer methods often improve upon their strategies. As discussed below, even with accuracy assessment aside, the different method properties can have different advantages under certain analysis scenarios and so are summarized here. Notably, many methods cannot undergo full, robust evaluation with clade exclusion, since their reference databases cannot be manipulated, and so methods chosen for full evaluation of the accuracy were limited to ones that allowed it.
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>List of metagenomics sequence classification methods and their characteristics sorted by class of method</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Method name</th>
<th>Class of method</th>
<th>Sequence alignment method/Composition method</th>
<th>Standalone
<sup>a</sup>
/Web server</th>
<th>Most recent year published (first time published)
<sup>b</sup>
</th>
<th>Functional classification if applicable</th>
<th>References</th>
<th>Number of citations
<sup>c</sup>
</th>
</tr>
</thead>
<tbody>
<tr>
<td>MEGAN4</td>
<td>Similarity</td>
<td>MEGABLAST, BLASTN, BLASTX, RAPSEARCH2 [
<xref ref-type="bibr" rid="CR32">32</xref>
] / N/A</td>
<td>Yes/No</td>
<td>2011 (2007)</td>
<td>KEGG, SEED</td>
<td>[
<xref ref-type="bibr" rid="CR15">15</xref>
,
<xref ref-type="bibr" rid="CR29">29</xref>
,
<xref ref-type="bibr" rid="CR45">45</xref>
<xref ref-type="bibr" rid="CR47">47</xref>
]</td>
<td>1089</td>
</tr>
<tr>
<td>MG-RAST</td>
<td>Similarity</td>
<td>BLASTN, BLAT / N/A</td>
<td>No/Yes</td>
<td>2008</td>
<td>SEED, NOG, COG, KEGG</td>
<td>[
<xref ref-type="bibr" rid="CR48">48</xref>
]</td>
<td>691</td>
</tr>
<tr>
<td>CAMERA</td>
<td>Similarity</td>
<td>All 6 BLAST programs / N/A</td>
<td>No/Yes</td>
<td>2007 (2011)</td>
<td>Pfam, TIGRFAM, COG, KOG, PRK</td>
<td>[
<xref ref-type="bibr" rid="CR49">49</xref>
,
<xref ref-type="bibr" rid="CR50">50</xref>
]</td>
<td>324</td>
</tr>
<tr>
<td>CARMA3</td>
<td>Similarity</td>
<td>BLASTX, HMMER3 [
<xref ref-type="bibr" rid="CR51">51</xref>
] / N/A</td>
<td>Yes/Yes</td>
<td>2011 (2008)</td>
<td>GO</td>
<td>[
<xref ref-type="bibr" rid="CR41">41</xref>
,
<xref ref-type="bibr" rid="CR52">52</xref>
,
<xref ref-type="bibr" rid="CR53">53</xref>
]</td>
<td>201</td>
</tr>
<tr>
<td>WebMGA</td>
<td>Similarity</td>
<td>FR-HIT [
<xref ref-type="bibr" rid="CR54">54</xref>
] / N/A</td>
<td>No/Yes</td>
<td>2013</td>
<td>Pfam, TIGRFAM, COG, KOG, PRK, GO</td>
<td>[
<xref ref-type="bibr" rid="CR55">55</xref>
]</td>
<td>54</td>
</tr>
<tr>
<td>DiScRIBinATE (SOrt-ITEMS)
<sup>d</sup>
</td>
<td>Similarity</td>
<td>BLASTX, RAPSEARCH2 / N/A</td>
<td>Yes/No</td>
<td>2010 (2009)</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR31">31</xref>
,
<xref ref-type="bibr" rid="CR56">56</xref>
]</td>
<td>48</td>
</tr>
<tr>
<td>Ray Meta</td>
<td>Similarity</td>
<td>Exact match k-mers / N/A</td>
<td>Yes/No</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR57">57</xref>
]</td>
<td>34</td>
</tr>
<tr>
<td>Kraken</td>
<td>Similarity</td>
<td>Exact match k-mers / N/A</td>
<td>Yes/No</td>
<td>2014</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR28">28</xref>
]</td>
<td>15</td>
</tr>
<tr>
<td>RTM</td>
<td>Similarity</td>
<td>k-mers / N/A</td>
<td>Yes/Yes</td>
<td>2012</td>
<td>KEGG</td>
<td>[
<xref ref-type="bibr" rid="CR58">58</xref>
]</td>
<td>12</td>
</tr>
<tr>
<td>Genometa</td>
<td>Similarity</td>
<td>Bowtie [
<xref ref-type="bibr" rid="CR59">59</xref>
], BWA [
<xref ref-type="bibr" rid="CR60">60</xref>
] / N/A</td>
<td>Yes/No</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR61">61</xref>
]</td>
<td>7</td>
</tr>
<tr>
<td>LMAT</td>
<td>Similarity</td>
<td>Exact match k-mers / N/A</td>
<td>Yes/No</td>
<td>2013</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR62">62</xref>
]</td>
<td>6</td>
</tr>
<tr>
<td>Sequedex</td>
<td>Similarity</td>
<td>Exact match k-mers / N/A</td>
<td>Yes/No</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR63">63</xref>
]</td>
<td>5</td>
</tr>
<tr>
<td>MetaBin</td>
<td>Similarity</td>
<td>BLASTX, BLAT / N/A</td>
<td>Yes/Yes</td>
<td>2012</td>
<td>COG</td>
<td>[
<xref ref-type="bibr" rid="CR64">64</xref>
]</td>
<td>4</td>
</tr>
<tr>
<td>TAMER</td>
<td>Similarity</td>
<td>MEGABLAST / N/A</td>
<td>Yes/No</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR65">65</xref>
]</td>
<td>4</td>
</tr>
<tr>
<td>metaBEETL</td>
<td>Similarity</td>
<td>Direct comparison of compressed text indices / N/A</td>
<td>Yes/No</td>
<td>2013</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR7">7</xref>
]</td>
<td>2</td>
</tr>
<tr>
<td>SPANNER</td>
<td>Similarity</td>
<td>BLASTP / N/A</td>
<td>Yes/No</td>
<td>2013</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR66">66</xref>
]</td>
<td>2</td>
</tr>
<tr>
<td>GOTTCHA</td>
<td>Similarity</td>
<td>BWA / N/A</td>
<td>Yes/No</td>
<td>2015</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR67">67</xref>
]</td>
<td>0</td>
</tr>
<tr>
<td>CLARK</td>
<td>Similarity</td>
<td>k-mers / N/A</td>
<td>Yes/No</td>
<td>2015</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR68">68</xref>
]</td>
<td>0</td>
</tr>
<tr>
<td>MLTreeMap</td>
<td>Marker</td>
<td>BLASTX / N/A</td>
<td>Yes/Yes</td>
<td>2010 (2007)</td>
<td>4 Enzyme families</td>
<td>[
<xref ref-type="bibr" rid="CR69">69</xref>
,
<xref ref-type="bibr" rid="CR70">70</xref>
]</td>
<td>206</td>
</tr>
<tr>
<td>AMPHORA2</td>
<td>Marker</td>
<td>HMMER3 / N/A</td>
<td>Yes/Yes</td>
<td>2012 (2008)</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR13">13</xref>
,
<xref ref-type="bibr" rid="CR71">71</xref>
,
<xref ref-type="bibr" rid="CR72">72</xref>
]</td>
<td>190</td>
</tr>
<tr>
<td>MetaPhlAn</td>
<td>Marker</td>
<td>MEGABLAST, Bowtie2 [
<xref ref-type="bibr" rid="CR73">73</xref>
] / N/A</td>
<td>Yes/Yes</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR11">11</xref>
]</td>
<td>114</td>
</tr>
<tr>
<td>MetaPhyler</td>
<td>Marker</td>
<td>BLASTN, BLASTX / N/A</td>
<td>Yes/No</td>
<td>2011</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR30">30</xref>
]</td>
<td>42</td>
</tr>
<tr>
<td>mOTU</td>
<td>Marker</td>
<td>HMMER3 / N/A</td>
<td>Yes/Yes</td>
<td>2013</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR19">19</xref>
]</td>
<td>24</td>
</tr>
<tr>
<td>Phylosift</td>
<td>Marker</td>
<td>LAST, HMMER3 / N/A</td>
<td>Yes/No</td>
<td>2014</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR14">14</xref>
]</td>
<td>18</td>
</tr>
<tr>
<td>phymmBL</td>
<td>Hybrid</td>
<td>MEGABLAST / IMM</td>
<td>Yes/No</td>
<td>2011 (2009)</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR6">6</xref>
,
<xref ref-type="bibr" rid="CR74">74</xref>
]</td>
<td>182</td>
</tr>
<tr>
<td>RITA</td>
<td>Hybrid</td>
<td>Pipeline of BLAST variations / NB</td>
<td>Yes/Yes</td>
<td>2012 (2011)</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR75">75</xref>
,
<xref ref-type="bibr" rid="CR76">76</xref>
]</td>
<td>38</td>
</tr>
<tr>
<td>SPHINX</td>
<td>Hybrid</td>
<td>BLASTX / k-means</td>
<td>No/Yes</td>
<td>2010</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR10">10</xref>
]</td>
<td>17</td>
</tr>
<tr>
<td>TaxyPro</td>
<td>Hybrid</td>
<td>CoMet web server / Mixture model</td>
<td>Yes/No</td>
<td>2013</td>
<td>Pfam</td>
<td>[
<xref ref-type="bibr" rid="CR77">77</xref>
]</td>
<td>3</td>
</tr>
<tr>
<td>TWARIT</td>
<td>Hybrid</td>
<td>BWA short read alignment [
<xref ref-type="bibr" rid="CR60">60</xref>
] / k-means</td>
<td>No/Yes</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR78">78</xref>
]</td>
<td>2</td>
</tr>
<tr>
<td>PhyloPythiaS</td>
<td>Composition</td>
<td>N/A / SVM</td>
<td>Yes/Yes</td>
<td>2011 (2007)</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR30">30</xref>
,
<xref ref-type="bibr" rid="CR79">79</xref>
,
<xref ref-type="bibr" rid="CR80">80</xref>
]</td>
<td>269</td>
</tr>
<tr>
<td>TACOA</td>
<td>Composition</td>
<td>N/A / k-NN</td>
<td>Yes/No</td>
<td>2009</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR33">33</xref>
]</td>
<td>65</td>
</tr>
<tr>
<td>NBC</td>
<td>Composition</td>
<td>N/A / NB</td>
<td>Yes/Yes</td>
<td>2011 (2008)</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR81">81</xref>
,
<xref ref-type="bibr" rid="CR82">82</xref>
]</td>
<td>35</td>
</tr>
<tr>
<td>RAIphy</td>
<td>Composition</td>
<td>N/A / RAI</td>
<td>Yes/No</td>
<td>2011</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR83">83</xref>
]</td>
<td>18</td>
</tr>
<tr>
<td>ClaMS</td>
<td>Composition</td>
<td>N/A / DBC signature</td>
<td>Yes/No</td>
<td>2011</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR84">84</xref>
]</td>
<td>10</td>
</tr>
<tr>
<td>INDUS</td>
<td>Composition</td>
<td>N/A / k-means</td>
<td>No/Yes</td>
<td>2011</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR85">85</xref>
]</td>
<td>8</td>
</tr>
<tr>
<td>TAC-ELM</td>
<td>Composition</td>
<td>N/A / Neural Network</td>
<td>Yes/No</td>
<td>2012</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR86">86</xref>
]</td>
<td>5</td>
</tr>
<tr>
<td>MetaCV</td>
<td>Composition</td>
<td>N/A / CV</td>
<td>Yes/No</td>
<td>2013</td>
<td>KEGG</td>
<td>[
<xref ref-type="bibr" rid="CR87">87</xref>
]</td>
<td>4</td>
</tr>
<tr>
<td>GSTaxClassifier</td>
<td>Composition</td>
<td>N/A / Bayesian</td>
<td>No/No</td>
<td>2010</td>
<td>N/A</td>
<td>[
<xref ref-type="bibr" rid="CR88">88</xref>
]</td>
<td>2</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>N/A</italic>
not applicable,
<italic>IMM</italic>
interpolated Markov model,
<italic>NB</italic>
naive Bayes,
<italic>SVM</italic>
support vector machine,
<italic>k-NN</italic>
k-Nearest Neighbour,
<italic>RAI</italic>
relative abundance index,
<italic>DBC signature</italic>
de Bruijn chain signature,
<italic>CV</italic>
composition vector</p>
<p>
<sup>a</sup>
Standalone refers to whether the program can be run locally</p>
<p>
<sup>b</sup>
Some methods have had several publications, with later publications regarding improvements on functionality. In these cases the most recent publication was listed, with the first time the method was published in brackets</p>
<p>
<sup>c</sup>
Number of citations is based on Web of Science as of April 21, 2015</p>
<p>
<sup>d</sup>
DiScRIBinATE is the successor for SOrt-ITEMS so they were included in the same row</p>
</table-wrap-foot>
</table-wrap>
</p>
<sec id="Sec11">
<title>Several methods vastly overestimate the number of species present</title>
<p>To assess accuracy, first the quality of the assignments made by different methods was examined with no clade exclusion, so that as many representative methods could be comparatively examined as possible. The sensitivity, precision, and taxonomic distance (Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figures S1 and S2) were computed on the MetaSimHC dataset with no clade exclusion. Results were as expected, with all methods generally showing a relatively high sensitivity and precision. The exceptions are TACOA [
<xref ref-type="bibr" rid="CR33">33</xref>
], which is known to perform poorly on short reads, and MetaPhyler, which is a marker based method and thus only classifies a small proportion of the reads, resulting in low sensitivity (but high precision). Next, the numbers of incorrectly predicted species, based on different thresholds of percentage abundance in the predicted community were tabulated (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S5). It is notable that several methods greatly overpredict the numbers of species present, considering that the sequences the methods are trying to classify exist in the reference databases or training sets. Under genus clade exclusion conditions (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S6), the number of incorrectly predicted species increases further for any method that makes incorrect predictions at the examined taxonomic level.</p>
</sec>
<sec id="Sec12">
<title>Sensitivity and precision vary widely between methods, with sensitivity generally decreasing at higher levels of clade exclusion and increasing with read length</title>
<p>The quality of the assignments made by the different methods was further examined under clade exclusion scenarios at different taxonomic levels. Sensitivity and precision were computed on the MetaSimHC dataset (Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
) and found to vary notably. To examine in greater detail what led to the differences in sensitivity and precision of these methods, the taxonomic distance for each method was evaluated (Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S3). Furthermore, the proportion of reads assigned at each taxonomic rank was determined. An example of the results under the genus clade exclusion scenario is shown in Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
, with the data for the rest in Additional file
<xref rid="MOESM3" ref-type="media">3</xref>
. Additionally, the numbers of reads miss-assigned and correctly assigned or overpredicted for each rank were compiled (genus clade exclusion Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S4, the rest of the data in Additional file
<xref rid="MOESM4" ref-type="media">4</xref>
). Many of the methods assign a considerable proportion of reads to the species level, when species level assignment is impossible since they are excluded from the database. Also notable is that TACOA assigns the majority of reads to the superkingdom level, so the method will be of limited use for those interested in more specific taxonomic ranks, at least at these shorter read lengths.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Performance as clade exclusion level is varied. Sensitivity (
<bold>a</bold>
) and precision (
<bold>b</bold>
) on the MetaSimHC dataset of simulated 250 bp reads. There is a wide range of variability in the sensitivity and precision of the methods with sensitivity tending to decrease as the level of clade exclusion moves from species to class. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see
<xref rid="Sec5" ref-type="sec">Methods</xref>
)</p>
</caption>
<graphic xlink:href="12859_2015_788_Fig1_HTML" id="MO1"></graphic>
</fig>
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Distributions of assignments to taxonomic ranks. Proportion of reads assigned at each taxonomic rank on the MetaSimHC dataset of simulated 250 bp reads under genus clade exclusion (includes both correct and incorrect assignments). Although the lowest possible correct rank is family, many methods still classify the majority of reads at the species level. CARMA3 and DiScRIBinATE are slightly more conservative, classifying a large number of reads at the family or order levels, whereas TACOA is extremely conservative, classifying the majority of the reads at the superkingdom level</p>
</caption>
<graphic xlink:href="12859_2015_788_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p>In some cases, overpredictions (e.g. predictions made to an incorrect species in the correct genus) are less problematic than incorrect predictions (e.g. predictions made to an incorrect genus). Thus, sensitivity and precision were recalculated after reclassifying overpredictions as correct classifications (Fig. 
<xref rid="Fig3" ref-type="fig">3</xref>
). There was notable increase in sensitivity and precision for methods such as MEGAN4 and MetaBin which are less conservative in their predictions. For more conservative methods such as CARMA3 and DiScRIBinATE, there was little change.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Performance as clade exclusion level is varied with overpredictions (see
<xref rid="Sec5" ref-type="sec">Methods</xref>
for details) classified as correct. Sensitivity (
<bold>a</bold>
) and precision (
<bold>b</bold>
) on the MetaSimHC dataset of simulated 250 bp reads. Methods such as MEGAN4 which classify many reads at lower taxonomic levels see a considerable increase in performance, whereas more conservative methods such as CARMA3 see only a slight improvement. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see
<xref rid="Sec5" ref-type="sec">Methods</xref>
)</p>
</caption>
<graphic xlink:href="12859_2015_788_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p>The changes in sensitivity, precision, and taxonomic distance as read length increased was then examined. This was done on the MetaSimHC dataset (Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S5). Sensitivity followed the expected trend of increasing along with read lengths; however, precision and taxonomic distance showed no clear trend and remained relatively unchanged.</p>
</sec>
<sec id="Sec13">
<title>Analysis of the FW dataset reveals similar performance between
<italic>in vitro</italic>
data and
<italic>in silico</italic>
data, and between the FW and MetaSimHC datasets</title>
<p>A comparison between the FW
<italic>in silico</italic>
versus
<italic>in vitro</italic>
datasets is illustrated in Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
under species clade exclusion, and in Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S6 without clade exclusion. For the
<italic>in vitro</italic>
dataset, as it is not possible to determine which read absolutely should be associated with which organism in the mock microbial community, a hit to any of the taxa in the FW dataset was considered correct. In addition, this meant the sensitivity, precision, and taxonomic distance was based on all of the reads classified rather than averaged over all taxa. The results are similar between the
<italic>in vitro</italic>
and
<italic>in silico</italic>
communities, suggesting that for this simple community the methods evaluated are relatively robust to Illumina sequencing errors with the sequencing technology used. A comparison of results between MetaSimHC and FW
<italic>in silico</italic>
revealed that the relative performance of methods remained similar when analyzing these two different datasets (Fig. 
<xref rid="Fig5" ref-type="fig">5</xref>
). Additionally, the numbers of incorrectly predicted species, based on different thresholds of percentage abundance in the predicted community, were again tabulated for the
<italic>in vitro</italic>
data (Table 
<xref rid="Tab3" ref-type="table">3</xref>
). Many of the methods incorrectly predict hundreds of species, with MetaCV incorrectly predicting 1226 species, although after filtering out low abundance predictions the numbers of incorrect predictions were drastically reduced. Under genus clade exclusion conditions (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S7), the number of incorrectly predicted species increases further, and even after filtering out low abundance predictions there were sometimes considerable numbers of false species predictions. The number of incorrectly predicted species is higher for the
<italic>in vitro</italic>
data relative to the
<italic>in silico</italic>
data (Table 
<xref rid="Tab4" ref-type="table">4</xref>
). The greater number of incorrectly predicted species is particularly notable in some methods that perform very well on the
<italic>in silico</italic>
data such as MEGAN4 BlastN, which goes from 0 incorrectly predicted species to 110. The performance for each of the component genomes on all
<italic>in silico</italic>
datasets is provided in Additional file
<xref rid="MOESM5" ref-type="media">5</xref>
.
<fig id="Fig4">
<label>Fig. 4</label>
<caption>
<p>Performance of FW
<italic>in silico</italic>
versus FW
<italic>in vitro</italic>
. Sensitivity (
<bold>a</bold>
) and precision (
<bold>b</bold>
) of methods on the FW dataset comparing the performance on the
<italic>in silico</italic>
community versus the
<italic>in vitro</italic>
community under species clade exclusion. The results are similar between the
<italic>in vitro</italic>
and
<italic>in silico</italic>
communities, demonstrating that methods appear to be relatively robust to real Illumina sequencing errors for this simple community. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see
<xref rid="Sec5" ref-type="sec">Methods</xref>
)</p>
</caption>
<graphic xlink:href="12859_2015_788_Fig4_HTML" id="MO4"></graphic>
</fig>
<fig id="Fig5">
<label>Fig. 5</label>
<caption>
<p>Performance of MetaSimHC compared to FW
<italic>in silico</italic>
. Sensitivity (
<bold>a</bold>
) and precision (
<bold>b</bold>
) of methods on the MetaSimHC dataset compared to the FW
<italic>in silico</italic>
of simulated 250 bp reads. Values are averaged over all levels of clade exclusion from species to class. Although the microbes in the dataset changed, the relative performance of the methods remains very similar. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see
<xref rid="Sec5" ref-type="sec">Methods</xref>
)</p>
</caption>
<graphic xlink:href="12859_2015_788_Fig5_HTML" id="MO5"></graphic>
</fig>
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>Number of correctly and incorrectly predicted species
<sup>a</sup>
for different thresholds
<sup>b</sup>
without clade exclusion. Some methods vastly overpredict the number of species, even when the true number of species is low (in this case the true number of species is 11)</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th colspan="2">No cutoff
<sup>b</sup>
</th>
<th colspan="2">Cutoff > 0.01 %
<sup>b</sup>
</th>
<th colspan="2">Cutoff > 0.1 %
<sup>b</sup>
</th>
<th colspan="2">Cutoff > 1 %
<sup>b</sup>
</th>
</tr>
<tr>
<th>Method</th>
<th>Correct</th>
<th>Incorrect</th>
<th>Correct</th>
<th>Incorrect</th>
<th>Correct</th>
<th>Incorrect</th>
<th>Correct</th>
<th>Incorrect</th>
</tr>
</thead>
<tbody>
<tr>
<td>CARMA3</td>
<td>11</td>
<td>56</td>
<td>11</td>
<td>4</td>
<td>11</td>
<td>0</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>CLARK</td>
<td>11</td>
<td>364</td>
<td>11</td>
<td>25</td>
<td>11</td>
<td>5</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>DiScRIBinATE RAPSearch2
<sup>c</sup>
</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Kraken</td>
<td>11</td>
<td>327</td>
<td>11</td>
<td>25</td>
<td>11</td>
<td>5</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>Filtered Kraken</td>
<td>11</td>
<td>14</td>
<td>11</td>
<td>1</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>MEGAN4 BlastN</td>
<td>11</td>
<td>110</td>
<td>11</td>
<td>19</td>
<td>11</td>
<td>3</td>
<td>9</td>
<td>1</td>
</tr>
<tr>
<td>MEGAN4 RAPSearch2</td>
<td>11</td>
<td>183</td>
<td>11</td>
<td>41</td>
<td>11</td>
<td>1</td>
<td>9</td>
<td>1</td>
</tr>
<tr>
<td>MetaBin</td>
<td>11</td>
<td>561</td>
<td>10</td>
<td>77</td>
<td>10</td>
<td>6</td>
<td>10</td>
<td>1</td>
</tr>
<tr>
<td>MetaCV</td>
<td>11</td>
<td>1226</td>
<td>11</td>
<td>232</td>
<td>11</td>
<td>6</td>
<td>10</td>
<td>1</td>
</tr>
<tr>
<td>MetaPhyler</td>
<td>11</td>
<td>9</td>
<td>11</td>
<td>9</td>
<td>11</td>
<td>5</td>
<td>7</td>
<td>1</td>
</tr>
<tr>
<td>PhymmBL
<sup>c</sup>
</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>RITA</td>
<td>11</td>
<td>466</td>
<td>10</td>
<td>80</td>
<td>10</td>
<td>10</td>
<td>10</td>
<td>1</td>
</tr>
<tr>
<td>TACOA
<sup>c</sup>
</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>MG-RAST best hit</td>
<td>11</td>
<td>927</td>
<td>10</td>
<td>180</td>
<td>10</td>
<td>36</td>
<td>10</td>
<td>8</td>
</tr>
<tr>
<td>MG-RAST LCA</td>
<td>11</td>
<td>476</td>
<td>11</td>
<td>69</td>
<td>11</td>
<td>5</td>
<td>11</td>
<td>1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>a</sup>
Using the FW
<italic>in vitro</italic>
dataset of sequenced reads from 11 species</p>
<p>
<sup>b</sup>
A cutoff of > × %, for example 0.01 %, would indicate that only species with a predicted abundance of at least x % of the total set of predictions were considered. Correctly predicted species are any of the 11 species that were used to simulate the reads in the dataset, whereas any other predicted species was incorrect</p>
<p>
<sup>c</sup>
These methods do not predict to the species level at this read length (they require longer read lengths). See additional analyses at other levels of clade exclusion</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="Tab4">
<label>Table 4</label>
<caption>
<p>Number of incorrectly predicted species
<sup>a</sup>
for different abundance thresholds
<sup>b</sup>
without clade exclusion. Fewer incorrectly predicted species are predicted with the
<italic>in silico</italic>
data that does not contain errors versus the
<italic>in vitro</italic>
data containing sequencing errors (Table 
<xref rid="Tab3" ref-type="table">3</xref>
)</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th colspan="2">No cutoff
<sup>b</sup>
</th>
<th colspan="2">Cutoff > 0.01 %
<sup>b</sup>
</th>
<th colspan="2">Cutoff > 0.1 %
<sup>b</sup>
</th>
<th colspan="2">Cutoff > 1 %
<sup>b</sup>
</th>
</tr>
<tr>
<th>Method</th>
<th>Correct</th>
<th>Incorrect</th>
<th>Correct</th>
<th>Incorrect</th>
<th>Correct</th>
<th>Incorrect</th>
<th>Correct</th>
<th>Incorrect</th>
</tr>
</thead>
<tbody>
<tr>
<td>CARMA3</td>
<td>11</td>
<td>41</td>
<td>11</td>
<td>3</td>
<td>11</td>
<td>1</td>
<td>11</td>
<td>1</td>
</tr>
<tr>
<td>CLARK</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>DiScRIBinATE RAPSearch2
<sup>c</sup>
</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Kraken</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>Filtered Kraken</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>MEGAN4 BLASTN</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>MEGAN4 RAPSearch2</td>
<td>11</td>
<td>92</td>
<td>11</td>
<td>29</td>
<td>11</td>
<td>1</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>MetaBin</td>
<td>11</td>
<td>286</td>
<td>11</td>
<td>41</td>
<td>11</td>
<td>3</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>MetaCV</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>MetaPhyler</td>
<td>10</td>
<td>12</td>
<td>10</td>
<td>12</td>
<td>10</td>
<td>8</td>
<td>7</td>
<td>3</td>
</tr>
<tr>
<td>PhymmBL
<sup>c</sup>
</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>RITA</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>TACOA
<sup>c</sup>
</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>MG-RAST best hit</td>
<td>10</td>
<td>646</td>
<td>10</td>
<td>136</td>
<td>10</td>
<td>26</td>
<td>10</td>
<td>6</td>
</tr>
<tr>
<td>MG-RAST LCA</td>
<td>10</td>
<td>300</td>
<td>10</td>
<td>54</td>
<td>10</td>
<td>8</td>
<td>9</td>
<td>3</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>a</sup>
Using the FW
<italic>in silico</italic>
dataset of sequenced reads from 11 species</p>
<p>
<sup>b</sup>
A cutoff of > × %, for example 0.01 %, would indicate that only species with a predicted abundance of at least × % of the total set of predictions were considered</p>
<p>
<sup>c</sup>
These methods do not predict to the species level at this read length (they require longer read lengths). See additional analyses at other levels of clade exclusion</p>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec id="Sec14">
<title>There is substantial variation in the computational cost of different methods</title>
<p>To evaluate how long the various methods took to run, 22,000 reads of 100, 250, 500 and 1000 bp, and an additional 44,000 reads of 250 bp were simulated using the MetaSimHC dataset. The time taken by the methods to complete an analysis of these sequences varied widely, and nearly all methods scaled roughly linearly with both read length and number of reads on our datasets (Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S7). Sequence similarity based methods that rely on BLASTX take considerably longer than all other methods except TACOA, taking over 24 h for just 22,000 reads of 250 bp under the CPU conditions in the test (one Intel Xeon E5-2660 2.2 GHz CPU and 282 GB of RAM). At the other extreme, Kraken and CLARK took less than 1 min to classify all of the reads.</p>
</sec>
</sec>
<sec id="Sec15">
<title>Discussion</title>
<p>All of the methods analyzed performed very well in terms of sensitivity and precision when the query sequences were in the reference databases (i.e. when there was no clade exclusion). Of course, this type of analysis would be expected to give potentially artificially high accuracy values since one is essentially evaluating using test data identical to the reference/training data. Under this type of analysis scenario, the more informative metrics to examine are taxonomic distance and the number of incorrectly predicted species. Notably, several methods substantially overpredicted the number of species present in the simulated communities. This included popular methods such as MG-RAST and MEGAN4. However, most of these incorrectly predicted species are predicted at a very low abundance. By setting a threshold to filter out low abundance predictions, the number of incorrect predictions can be considerably reduced. The thresholds presented here are not intended as suggestions, but rather to demonstrate the principle of using thresholds to filter out incorrect predictions. Microbial communities in certain environments are very complex, such as those found in soil [
<xref ref-type="bibr" rid="CR34">34</xref>
]. These environments, which are very diverse and contain a large number of organisms, would have a large proportion of the microbes found at less than 1 % of the total abundance of the community, and thus a 1 % filtering threshold would filter out many of the microbes actually in the metagenome. If thresholds are used, they should ideally be chosen based on a mock community control that reflects the anticipated level of diversity and complexity expected in the metagenomics analysis being performed. If the goal is to choose thresholds based on relative abundance, genome size of the organisms would also be useful to take into account. Otherwise, if two organisms are present in the community at low levels but one organism’s genome is much bigger, the organism with the smaller genome may get filtered out while the organism with the larger genome does not, due to greater number of reads from the larger genome. It is important for researchers doing metagenomics projects to know the level of precision of the method that they are using to have an idea of how well they can trust the taxa predicted at lower abundance. There is a trade-off between finding all of the taxa that exist in the sample, and confidence in the prediction of the taxa. Two ways to adjust this trade-off are to choose a more precise (conservative) method, or to alter the minimum abundance threshold, with only the taxa over this abundance threshold being reported. Some methods already have a way of choosing this threshold. For example, MEGAN4 by default requires at least 5 reads to hit a taxon before the taxon is reported. The reads that are initially assigned to a taxon with less than the chosen threshold number of reads are then pushed up the taxonomy until they reach a taxon with a number of reads assigned to it that is over the threshold. However, when many reads are analyzed, overprediction will still occur and we have found for our analyses that it is necessary to use an additional threshold for removal of low abundance reads that are likely false predictions for such methods. Ideally this threshold may be chosen in part from an analysis of an
<italic>in vitro</italic>
mock community sample—an important experimental control in any metagenomics analysis. Such evaluation of methods using real sequence data also acts as an additional important control regarding other aspects of metagenomics sequencing pipelines.</p>
<p>As demonstrated in Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
, the sensitivity and precision of methods vary dramatically. Methods show a general trend of decreasing sensitivity as the rank of clade exclusion increases. This is expected as the sequences left in the database will become increasingly divergent, and the scores of the matches, if any, will decrease. There is a notable decrease in performance for methods relying on sequence composition or nucleotide-based BLASTN similarity searches, versus the protein/amino acid sequence-based BLASTX and RAPSearch2 similarity based methods. This confirms what has been reported previously, that sequence composition based methods have lower performance than sequence similarity based methods at shorter read lengths [
<xref ref-type="bibr" rid="CR6">6</xref>
]. BLASTN is likely outperformed by amino acid-based similarity approaches under clade exclusion because nucleotide sequence search is well known to be less sensitive for more divergent sequences due to its lower number of different characters (4 bases versus the 20 amino acids).</p>
<p>The differences in performance between methods can be partially explained by the distribution of taxonomic ranks that they assign reads to. As seen in Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
, CARMA3 and DiScRIBinATE are assigning reads more conservatively; that is, they are assigning much fewer reads to the lower taxonomic ranks. Many of these lower level predictions of other methods are in fact overpredictions, as demonstrated by their large increases in sensitivity and precision between Figs. 
<xref rid="Fig1" ref-type="fig">1</xref>
and
<xref rid="Fig3" ref-type="fig">3</xref>
. Due to the way we evaluated methods, the most conservative methods will show the highest sensitivity and precision, but may not be making classifications at specific enough taxonomic ranks to be useful. TACOA, for example, shows high sensitivity and precision, yet makes classifications at very high taxonomic ranks that would not be useful for most researchers.</p>
<p>Not surprisingly, the sensitivity increases for methods as read length increases. The most dramatic increase appears to be between read lengths of 100 and 250 bp. Thus, when choosing a sequencing technology, it may be important to try and obtain a sequence read length of at least around 250 bp. The precision and the taxonomic distance of methods remained relatively unchanged. This was likely due to any increased performance in precision and taxonomic distance offset by additionally classified reads (as seen by the increase in sensitivity) with greater dissimilarity to sequences in the databases of methods, which would have poorer performance in terms of precision and taxonomic distance.</p>
<p>Our comparison of the
<italic>in silico</italic>
to the
<italic>in vitro</italic>
freshwater community showed similar results in terms of relative performance of the methods. This gives us some confidence in our results of the other
<italic>in silico</italic>
simulations, as well as demonstrating the robustness of the evaluated methods to real sequence errors for this simple community. However, this would not necessarily generalize to more diverse communities, or other sequencing technologies. The sensitivity and precision of the methods followed the trends seen in the MetaSimHC
<italic>in silico</italic>
evaluation, although filtered Kraken showed somewhat lower relative precision. Upon further analysis, this appeared to be due to the nature of the way precision was calculated in this comparison. For the comparison to be done fairly between the
<italic>in silico</italic>
and
<italic>in vitro</italic>
community, the metrics were based on all reads rather than the average for all organisms. Filtered Kraken seemed to stand out in that for most organisms it classified few of the reads, and the ones it classified were mostly correct. However, for two organisms (
<italic>E. coli</italic>
and
<italic>B. cereus</italic>
), the majority of the reads were classified incorrectly. This means that because more of the reads of
<italic>E. coli</italic>
and
<italic>B. cereus</italic>
were classified than the other organisms, their (mostly inaccurate) classifications had a relatively large influence on the precision. The numbers of genomes/taxa in the mock communities was small, relative to the anticipated number of species in most real metagenomic analyses, so abnormal results from individual genomes could have a large impact on the results, as seen here with filtered Kraken. It is also notable that
<italic>E. coli</italic>
and
<italic>B. cereus</italic>
, mainly due to historical reasons, come from regions of the taxonomic tree that are not reflective of the typical case for many environments; genomes with high sequence similarity and composition in this part of the tree are classified as the same species, whereas if they were found in other parts of the tree they would be classified as different species or genera [
<xref ref-type="bibr" rid="CR35">35</xref>
,
<xref ref-type="bibr" rid="CR36">36</xref>
]. Thus, species that are not yet discovered will not be classified in a similar manner to the genomes in
<italic>Escherichia</italic>
or
<italic>Bacillus</italic>
, and so the performance of methods on these genomes likely does not reflect performance on as yet undiscovered microbes in metagenomics samples. However, it must be emphasized that there is no one mock community dataset that can best evaluate all metagenomics software. Key is for researchers to design mock communities for evaluation that are suitable for their experiment, and use this published analysis to appreciate the types of issues they should watch out for.</p>
<p>The differences we saw in computational cost of the methods were substantial. Although we only ran a few small test datasets of thousands of reads, we were able to clearly show very large differences in computational cost of the methods. Current metagenomics datasets often include millions of reads; without access to large amounts of compute power, many researchers will not find it practical to utilize BLASTX based methods for Illumina sequence sized data sets as are currently produced. The need for a more rapid alternative is already being addressed by such methods as RAPSearch2 [
<xref ref-type="bibr" rid="CR32">32</xref>
], LAST [
<xref ref-type="bibr" rid="CR37">37</xref>
], PAUDA [
<xref ref-type="bibr" rid="CR38">38</xref>
], and DIAMOND [
<xref ref-type="bibr" rid="CR39">39</xref>
]. Notably, RAPSearch2 shows similar, or in some cases even increased, performance relative to the same methods using BLASTX, while requiring much less time to run (over 30x faster in our analyses). Many methods provide the option of running multiple threads, so access to additional processors will allow the methods to run substantially quicker. Furthermore, for most methods reads are classified independently from one another, so files of reads can be broken up into multiple smaller files and each file run on a separate processor, and the results of the classifications combined. In addition to computational cost, the amount of RAM used by different methods varies considerably. Both Kraken and CLARK require large amounts of RAM, but do provide reduced standard databases for users with low-memory computing environments (known as MiniKraken and Clark-
<italic>l</italic>
). Certain methods also allow users to adjust settings to allow trade-offs between speed, accuracy and RAM usage, such as the sampling factor value in CLARK. A final consideration of computational resources when choosing a method is the amount of disk space that a method requires. The databases used by some methods require relatively large amounts of disk space, such as the standard database of Kraken which requires at least 160 GB of disk space. Another aspect that may affect method choice is the relative ease of generating new databases for the methods. Certain methods rely on the results of a similarity search, and expanding the database is a relatively simple process of generating a new database for that similarity search, such as BLAST. However, other methods may require substantial computational resources that researchers may not have access to. For example, the authors of GOTTCHA state that the creation of a database from the 2500 prokaryotic genome projects available in 2012 required 2 TB of RAM. Other methods, such as many online only methods, do not even allow the modification/expansion of the database.</p>
<p>Protein sequence similarity-based methods (e.g. BLASTX, RAPSearch2) perform very well in clade exclusion scenarios but do not perform as well as nucleotide based methods when there is no clade exclusion. This is likely because a proportion of microbial genome sequence (commonly around 6–14 % [
<xref ref-type="bibr" rid="CR40">40</xref>
]) are non-coding. Protein similarity-based methods still have a relatively high sensitivity, generally >0.94 and, as noted in [
<xref ref-type="bibr" rid="CR41">41</xref>
], this is due to many reads overlapping at least partially with a coding region. This explanation makes sense with our finding that as read length is increased, sensitivity of the aforementioned methods increases (from 0.94 at read lengths of 100 to 0.99 at read lengths of 1000 nucleotides for MEGAN4 BLASTX on the MetaSimHC dataset), as it would be less likely that a longer read would cover only non-coding regions. A quick examination of these incorrectly classified reads confirmed that they were the non-coding regions of the genomes, in many cases rRNA genes.</p>
<p>The results presented should guide researchers to the choice of method that best fits their research question and computational resources. Clearly, certain methods perform well in certain situations. Kraken, Filtered Kraken, and MEGAN4 BLASTN perform exceedingly well when there is no clade exclusion, yet their sensitivity is low when there is clade exclusion. However, filtered Kraken classifies only a small percentage of reads when the species present in the dataset is not in the database. For example, filtered Kraken classifies less than 8 % of the reads under genera exclusion (Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
). A strategy researchers may therefore use is to take their dataset and first run it on filtered Kraken, followed by running the reads not classified by filtered Kraken on a more conservative method such as DiScRIBinATE RAPSearch2. Filtered Kraken would classify the reads from genomes in the reference database, while leaving the majority of reads from genomes not in the reference database unclassified. Then, DiScRIBinATE RAPSearch2, which will assign a much greater proportion of reads from genomes not in reference databases, could be run on the unclassified reads. If a conservative method such as DiScRIBinATE RAPSearch2 is run alone, it may miss many of the assignments of known genomes to the species rank, due to its tendency to make assignments at higher ranks. However, in some cases, such as when analyzing less well characterized microbiomes (such as in water versus human feces) the use of such conservative methods could be entirely appropriate. The pipeline idea of combining methods is integrated into some methods like RITA, which first identifies a highest-confidence set of predictions, then subjects the sequences not yet classified to a series of downstream classification steps. CARMA3 performs well in both the no-clade exclusion scenario (with a small taxonomic distance, classifying many reads to the species level) as well as the clade exclusion scenario. However, CARMA3 takes a considerable time to run, and may not be computationally feasible for those with large datasets and without access to notable compute power. Another technique involving combining methods would be to use multiple methods and look for consistent assignments among methods [
<xref ref-type="bibr" rid="CR27">27</xref>
]. Depending on the type of analysis, this could increase precision and confidence in the assignments, although at the cost of sensitivity in most cases and run time (due to running multiple methods).</p>
<p>The test datasets used in this evaluation are limited in their complexity and diversity, as well as the number of reads simulated. For example, millions of reads are often sequenced for metagenomics samples, while our datasets were smaller, containing tens to hundreds of thousands of reads. Furthermore, many environments sampled are far more complex and diverse, containing a much larger number of microbes with varying relative abundance, such as soil or the human gut. Our analyses were also either on
<italic>in silico</italic>
simulated communities or communities sequenced with a single sequencing technology. The aim of this research was not to recommend any specific method, but to raise awareness of the advantages and disadvantages of different methods and issues in metagenome analyses. This evaluation highlights that there are large differences in methods on even the relatively simple communities used for our datasets, such as number of organisms predicted, sensitivity and precision, how specific the classifications tend to be (taxonomic rank), and computational resources required to run. However, other factors such as the diversity and microbes present in a community, and the sequencing technology used, will also affect the performance of the methods. Additionally, certain tools may have advantages and be particularly useful for specific environments. For example, some tools contain genomes in their databases that are not present in RefSeq, while most methods use RefSeq exclusively for their databases. An example of this is MetaPhlAn, which includes many draft genomes from the larger Human Microbiome Project (HMP) [
<xref ref-type="bibr" rid="CR42">42</xref>
], and thus may be particularly useful for human microbiome samples. Metagenomics as a field is expanding rapidly. New tools are needed to classify the sequences obtained from these studies. There is a large need, and lots of interest in this, as evidenced by the large number of methods released over the past few years. However, it is non-trivial to perform an evaluation of methods. This is due to the sheer number of metagenomic methods available, the difficulty in setting up some of these methods, and the challenge in performing robust evaluation techniques such as clade exclusion or leave-one-out evaluation. Furthermore, methods only available on the web are generally unable to be thoroughly evaluated as in many cases they do not allow the use of custom reference databases or training sets, and sometimes limit the number of reads that can be uploaded. To address these difficulties, an initiative called the Critical Assessment of Metagenomic Interpretation (CAMI) has been initiated [
<xref ref-type="bibr" rid="CR43">43</xref>
]. This community-led initiative will have researchers run their own methods on data sets made up of unpublished microbial genomes. This will be a valuable contribution to methodology assessment, but researchers are still encouraged to use mock microbial communities as controls for their own particular analyses, especially mock communities that reflect the types of microbes, diversity, and complexity they expect to see in their study. While CAMI will provide a useful additional comparative evaluation of methods, one should always perform a metagenomics analysis using appropriate controls to best refine methodology and any threshold cutoffs suitable for the specific analysis needs.</p>
<p>Another issue is that there does not seem to be a consensus on the way to evaluate performance. Some researchers consider classification of a read to a taxonomic level more specific than what is correct (e.g. a novel
<italic>Escherichia</italic>
species being assigned to
<italic>Escherichia coli</italic>
rather than
<italic>Escherichia</italic>
) as assigned correctly (e.g. [
<xref ref-type="bibr" rid="CR28">28</xref>
]). Other researchers, however, classify these overprediction assignments as false positives or mispredictions (e.g. [
<xref ref-type="bibr" rid="CR31">31</xref>
]). Depending on the research goal, one may prefer a more liberal or conservative method. For example, if a researcher is interested in comparing the genera in one metagenomics sample to another sample, overpredictions that are incorrect at the species level will not matter if they are correct at the genera level. The more conservative method may assign the same reads to the family level, and will thus completely miss the relevant taxa. On the other hand, if a researcher is interested in taking all of the predictions at all taxonomic ranks, they may make erroneous conclusions that a specific species is increased in one sample over another if it is just an overprediction. It should also be stressed that many methods allow flexibility in the parameters used, so it may be possible to tune a method to be more or less conservative. However, some parameters cannot be changed, and there are fundamental differences in the ways reads are classified by different methods. For example, MEGAN4 and MG-RAST make assignments based on bit-score as the sole parameter for judging significance. Other methods, such as DiScRIBinATE, CARMA3, and MetaPhyler, employ additional measures such as alignment parameter thresholds and/or a reciprocal BLAST search step, which have been shown to improve the accuracy of taxonomic assignments in certain scenarios [
<xref ref-type="bibr" rid="CR44">44</xref>
]. For example, using these methods a read from a novel
<italic>Pseudomonas</italic>
species with a single hit over the bit-score threshold to
<italic>Pseudomonas aeruginosa</italic>
may not align well enough to be assigned to the species level based on the additional alignment parameters, and thus could be assigned correctly to
<italic>Pseudomonas</italic>
. However, in MEGAN4 or MG-RAST the read would pass the bit-score threshold and because there were no other hits, it would be assigned directly to
<italic>Pseudomonas aeruginosa</italic>
.</p>
<p>Again, careful examination of controls (like an
<italic>in vitro</italic>
mock community sequenced alongside metagenomics samples) may provide insight into the best method to use and suitable threshold cutoffs for low abundance reads, especially if that mock community includes a suitable level of diversity and/or includes species expected in the metagenomics analysis. Developers of new methods are encouraged to enable their method to be evaluated using customized reference datasets, including clade exclusion-based analysis, to enable robust analysis of their method.</p>
</sec>
<sec id="Sec16">
<title>Conclusions</title>
<p>There has been a real need for a comprehensive evaluation of metagenomics classification methods, due to the notable number of new methods being released. In this case we have focused on taxonomic classification, for which an expanded comparative analysis was needed, to build on previous assessments and include more clade exclusion-based analysis. For the methods we analyzed, there is no single method that stands out as superior to all others, as there are a wide variety of characteristics in which the methods differ—characteristics that may make them more suitable for certain research group infrastructure, and research projects, than others. Few researchers will have the time to evaluate methods robustly themselves, so may just use the method which is most popular or easiest to use, which would not necessarily be well suited for their particular computational resources and/or goals. This evaluation explains some of the issues researchers should consider when choosing an analysis approach for their metagenomics project, and reveals that very misleading results can occur, in particular notable overprediction of the number of taxa and/or missed taxa, if an inaccurate or unsuitable analysis approach is used. The results from this evaluation will hopefully help guide researchers’ decisions in selecting appropriate analysis methods suitable for their metagenomics studies. As new methods are developed, further evaluations will need to be performed, including with a reference dataset like MetaSimHC, and/or the CAMI approach. This study provides a model for such analyses to compare method accuracies and benefits, and highlights criteria that should be evaluated. It would be very helpful for evaluation purposes if method developers would allow their method’s reference databases to be manipulated, to permit analyses like clade exclusion, to avoid biases that can occur when no clade exclusion is performed (including with unpublished genomes as planned for CAMI, depending on the relatedness of other taxa to these unpublished genomes). Regardless, researchers are strongly encouraged to include appropriate negative and positive controls for their metagenomic experiments, including appropriate
<italic>in vitro</italic>
mock communities reflecting their expected type of data (high/low diversity, well characterized previously or not, etc.) to help fine tune their methodology as appropriate for their specific experiment. Robust metagenomic data analysis is absolutely critical at this stage of the development of microbiome research as a key research area. Microbiome research promises to be widely applicable to many, studying human health, the environment, agrifood, mining and other natural resource management, but it will only be valuable if high-quality, careful analysis is performed.</p>
</sec>
<sec id="Sec17">
<title>Availability of supporting data</title>
<p>The data sets supporting the results of this article are available in the MG-RAST repository (the
<italic>in silico</italic>
and
<italic>in vitro</italic>
test data sets) and accession numbers can be found in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S2.</p>
</sec>
</body>
<back>
<app-group>
<app id="App1">
<sec id="Sec18">
<title>Additional files</title>
<p>
<media position="anchor" xlink:href="12859_2015_788_MOESM1_ESM.docx" id="MOESM1">
<label>Additional file 1: Supplementary Tables.</label>
<caption>
<p>
<bold>Table S1.</bold>
Number of genomes left in the reference databases and training sets of the methods used in the evaluation scenarios.
<bold>Table S2.</bold>
Datasets used in the evaluation scenarios and their accession numbers.
<bold>Table S3.</bold>
Number of reads simulated for each organism in the
<italic>in silico</italic>
datasets.
<bold>Table S4.</bold>
Methods that were the focus of this evaluation and their version numbers. Methods were run with default parameters except for what we called filtered Kraken which used the kraken-filter script with a threshold score of 0.20.
<bold>Table S5.</bold>
Number of correctly and incorrectly predicted species
<sup>a</sup>
for different thresholds
<sup>b</sup>
without clade exclusion, illustrating how some methods vastly overpredict the number species, even when the true number of species is low (in this case the true number of species is 11).
<bold>Table S6.</bold>
Number of incorrectly predicted species
<sup>a</sup>
for different abundance thresholds
<sup>b</sup>
with genus clade exclusion.
<bold>Table S7.</bold>
Number of incorrectly predicted species
<sup>a</sup>
for different abundance thresholds
<sup>b</sup>
with genus clade exclusion. Even more incorrectly predicted species are predicted under these conditions versus without clade exclusion. (DOCX 34 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2015_788_MOESM2_ESM.pptx" id="MOESM2">
<label>Additional file 2: Supplementary Figures.</label>
<caption>
<p>
<bold>Figure S1.</bold>
Sensitivity and precision with no clade exclusion. Performance of methods on the MetaSimHC dataset of simulated 250 bp reads.
<bold>Figure S2.</bold>
Taxonomic distance of methods on the MetaSimHC dataset of simulated 250 bp reads with no clade exclusion.
<bold>Figure S3.</bold>
Taxonomic distance of methods on the MetaSimHC dataset of simulated 250 bp reads with various level of clade exclusion.
<bold>Figure S4.</bold>
Distributions of misassigned (A) and correct/overpredicted assignments (B) to each taxonomic rank on the MetaSimHC dataset of simulated 250 bp reads under genus clade exclusion.
<bold>Figure S5.</bold>
Performance as read length is varied. Sensitivity (A), precision (B), and taxonomic distance (C) of methods on the MetaSimHC dataset simulated at lengths of 100, 250, 500, and 1000 bases with genera clade exclusion.
<bold>Figure S6.</bold>
Performance of FW
<italic>in silico</italic>
versus FW
<italic>in vitro</italic>
without clade exclusion. Sensitivity (A) and precision (B) of methods on the FW dataset comparing the performance on the in silico community versus the
<italic>in vitro</italic>
community.
<bold>Figure S7.</bold>
Comparison of running time. Running time for the various methods was calculated on a MetaSimHC dataset of 22,000 simulated reads of various read lengths (A), or 22,000 and 44,000 reads of 250 bp (B). (PPTX 100 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2015_788_MOESM3_ESM.txt" id="MOESM3">
<label>Additional file 3:</label>
<caption>
<p>
<bold>The proportion of reads assigned at each taxonomic rank on all</bold>
<bold>
<italic>in silico</italic>
</bold>
<bold>datasets.</bold>
(TXT 213 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2015_788_MOESM4_ESM.txt" id="MOESM4">
<label>Additional file 4:</label>
<caption>
<p>
<bold>The numbers of reads misassigned and correctly assigned (or overpredicted) for each rank on all</bold>
<bold>
<italic>in silico</italic>
</bold>
<bold>datasets.</bold>
(TXT 210 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2015_788_MOESM5_ESM.txt" id="MOESM5">
<label>Additional file 5:</label>
<caption>
<p>
<bold>The performance for each of the component genomes on all</bold>
<bold>
<italic>in silico</italic>
</bold>
<bold>datasets.</bold>
(TXT 674 kb)</p>
</caption>
</media>
</p>
</sec>
</app>
</app-group>
<fn-group>
<fn>
<p>
<bold>Competing interests</bold>
</p>
<p>The authors declare that they have no competing interests.</p>
</fn>
<fn>
<p>
<bold>Authors’ contributions</bold>
</p>
<p>MAP and FSLB conceived the work. MAP created the
<italic>in silico</italic>
mock communities and performed the analysis. MAP and TVR wrote scripts to create the clade exclusion scenarios. RL created the
<italic>in vitro</italic>
mock communities. MAP wrote the manuscript, with revisions and contributions by TVR, RL, and FSLB. All authors read and approved the final manuscript.</p>
</fn>
</fn-group>
<ack>
<title>Acknowledgements</title>
<p>We acknowledge all method developers for making their programs available for use and thank the following laboratories for assistance in running their programs: Tarini Ghosh and Sharmila Mande (DiScRIBinATE), Robert Beiko (RITA), and Derrick Wood (Kraken). We thank the following researchers for generously supplying us with the following strains: James Zlosnik and David Speert (University of British Columbia) -
<italic>Burkholderia cenocepacia</italic>
J2315; David Benson (University of Connecticut) -
<italic>Frankia</italic>
sp. CcI3; Tom Beatty (University of British Columbia) -
<italic>Rhodobacter capsulatus</italic>
SB1003; John Hopwood (Innes centre) -
<italic>Streptomyces coelicolor</italic>
A3(2); Fred Ausubel (Harvard Medical School)
<italic>Pseudomonas aeruginosa</italic>
UCBPP-PA14. We thank Miguel Ignacio Uyaguari Diaz for library preparation and sequencing of the
<italic>in vitro</italic>
community. This work was supported by Genome Canada, Genome BC, Simon Fraser University Community Trust, and the Public Health Agency of Canada. MAP was supported by a Michael Smith Foundation for Health Research and Canadian Institutes of Health Research Bioinformatics training program fellowship and an NSERC PGSD scholarship. TVR was supported by an NSERC PGSM & CGSD scholarship.</p>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wooley</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Friedberg</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>A primer on metagenomics</article-title>
<source>PLoS Comput Biol</source>
<year>2010</year>
<volume>6</volume>
<fpage>e1000667</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000667</pub-id>
<pub-id pub-id-type="pmid">20195499</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Handelsman</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Metagenomics: application of genomics to uncultured microorganisms</article-title>
<source>Microbiol Mol Biol Rev</source>
<year>2004</year>
<volume>68</volume>
<fpage>669</fpage>
<lpage>85</lpage>
<pub-id pub-id-type="doi">10.1128/MMBR.68.4.669-685.2004</pub-id>
<pub-id pub-id-type="pmid">15590779</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Acinas</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Sarma-Rupavtarm</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Klepac-Ceraj</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Polz</surname>
<given-names>MF</given-names>
</name>
</person-group>
<article-title>PCR-induced sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries constructed from the same sample</article-title>
<source>Appl Environ Microbiol</source>
<year>2005</year>
<volume>71</volume>
<fpage>8966</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.71.12.8966-8969.2005</pub-id>
<pub-id pub-id-type="pmid">16332901</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Hug</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Sharon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Castelle</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wilkins</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Wrighton</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>JF</given-names>
</name>
</person-group>
<article-title>Unusual biology across a group comprising more than 15 % of domain Bacteria</article-title>
<source>Nature</source>
<year>2015</year>
<volume>523</volume>
<issue>7559</issue>
<fpage>208</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="doi">10.1038/nature14486</pub-id>
<pub-id pub-id-type="pmid">26083755</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5.</label>
<mixed-citation publication-type="other">Higashi S, Barreto A da MS, Cantão ME, de Vasconcelos ATR. Analysis of composition-based metagenomic classification. BMC Genomics. 2012;13 Suppl 5:S1.</mixed-citation>
</ref>
<ref id="CR6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brady</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models</article-title>
<source>Nat Methods</source>
<year>2009</year>
<volume>6</volume>
<fpage>673</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1358</pub-id>
<pub-id pub-id-type="pmid">19648916</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ander</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Schulz-Trieglaff</surname>
<given-names>OB</given-names>
</name>
<name>
<surname>Stoye</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2013</year>
<volume>14 Suppl 5</volume>
<fpage>S2</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-14-S5-S2</pub-id>
<pub-id pub-id-type="pmid">23734710</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rappé</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>The uncultured microbial majority</article-title>
<source>Annu Rev Microbiol</source>
<year>2003</year>
<volume>57</volume>
<fpage>369</fpage>
<lpage>94</lpage>
<pub-id pub-id-type="doi">10.1146/annurev.micro.57.030502.090759</pub-id>
<pub-id pub-id-type="pmid">14527284</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Basic local alignment search tool</article-title>
<source>J Mol Biol</source>
<year>1990</year>
<volume>215</volume>
<fpage>403</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id>
<pub-id pub-id-type="pmid">2231712</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohammed</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Ghosh</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>NK</given-names>
</name>
<name>
<surname>Mande</surname>
<given-names>SS</given-names>
</name>
</person-group>
<article-title>SPHINX--an algorithm for taxonomic binning of metagenomic sequences</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>22</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq608</pub-id>
<pub-id pub-id-type="pmid">21030462</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segata</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Waldron</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ballarini</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Narasimhan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Jousson</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Metagenomic microbial community profiling using unique clade-specific marker genes</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>9</volume>
<fpage>811</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.2066</pub-id>
<pub-id pub-id-type="pmid">22688413</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Větrovský</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Baldrian</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses</article-title>
<source>PLoS One</source>
<year>2013</year>
<volume>8</volume>
<fpage>e57923</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0057923</pub-id>
<pub-id pub-id-type="pmid">23460914</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Scott</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>1033</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts079</pub-id>
<pub-id pub-id-type="pmid">22332237</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Darling</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Jospin</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lowe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Matsen</surname>
<given-names>FA</given-names>
</name>
<name>
<surname>Bik</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>PhyloSift: phylogenetic analysis of genomes and metagenomes</article-title>
<source>PeerJ</source>
<year>2014</year>
<volume>2</volume>
<fpage>e243</fpage>
<pub-id pub-id-type="doi">10.7717/peerj.243</pub-id>
<pub-id pub-id-type="pmid">24482762</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Auch</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
</person-group>
<article-title>MEGAN analysis of metagenomic data</article-title>
<source>Genome Res</source>
<year>2007</year>
<volume>17</volume>
<fpage>377</fpage>
<lpage>86</lpage>
<pub-id pub-id-type="doi">10.1101/gr.5969107</pub-id>
<pub-id pub-id-type="pmid">17255551</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ludwig</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Schleifer</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Phylogenetic identification and in situ detection of individual microbial cells without cultivation</article-title>
<source>Microbiol Rev</source>
<year>1995</year>
<volume>59</volume>
<fpage>143</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="pmid">7535888</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bazinet</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Cummings</surname>
<given-names>MP</given-names>
</name>
</person-group>
<article-title>A comparative evaluation of sequence classification programs</article-title>
<source>BMC Bioinformatics</source>
<year>2012</year>
<volume>13</volume>
<fpage>92</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-13-92</pub-id>
<pub-id pub-id-type="pmid">22574964</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<collab>Human Microbiome Jumpstart Reference Strains Consortium</collab>
<name>
<surname>Nelson</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Weinstock</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Highlander</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Worley</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Creasy</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Wortman</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Mitreva</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sodergren</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Chinwalla</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Feldgarden</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gevers</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Haas</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Madupu</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ward</surname>
<given-names>DV</given-names>
</name>
<name>
<surname>Birren</surname>
<given-names>BW</given-names>
</name>
<name>
<surname>Gibbs</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Methe</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Petrosino</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Strausberg</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>White</surname>
<given-names>OR</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Durkin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Giglio</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Gujja</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Howarth</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kodira</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A catalog of reference genomes from the human microbiome</article-title>
<source>Science</source>
<year>2010</year>
<volume>328</volume>
<fpage>994</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1126/science.1183605</pub-id>
<pub-id pub-id-type="pmid">20489017</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sunagawa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mende</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Zeller</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Izquierdo-Carrasco</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Kultima</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Coelho</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Arumugam</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tap</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>HB</given-names>
</name>
<name>
<surname>Rasmussen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Brunak</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Guarner</surname>
<given-names>F</given-names>
</name>
<name>
<surname>de Vos</surname>
<given-names>WM</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Doré</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ehrlich</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Metagenomic species profiling using universal phylogenetic marker genes</article-title>
<source>Nat Methods</source>
<year>2013</year>
<volume>10</volume>
<fpage>1196</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.2693</pub-id>
<pub-id pub-id-type="pmid">24141494</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20.</label>
<mixed-citation publication-type="other">Lindgreen S, Adair KL, Gardner P. An evaluation of the accuracy and speed of metagenome analysis tools. bioRxiv. 2015;017830.</mixed-citation>
</ref>
<ref id="CR21">
<label>21.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Richter</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Ott</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Auch</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Schmid</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>MetaSim—a sequencing simulator for genomics and metagenomics</article-title>
<source>PLoS One</source>
<year>2008</year>
<volume>3</volume>
<fpage>e3373</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0003373</pub-id>
<pub-id pub-id-type="pmid">18841204</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22.</label>
<mixed-citation publication-type="other">Genome British Columbia : Applied Metagenomics of the Watershed Microbiome.
<ext-link ext-link-type="uri" xlink:href="http://www.genomebc.ca/research-programs/projects/energy-mining-environment/applied-metagenomics-of-the-watershed-microbiome/">http://www.genomebc.ca/research-programs/projects/energy-mining-environment/applied-metagenomics-of-the-watershed-microbiome/</ext-link>
 (2011). Accessed 27 Oct 2015.</mixed-citation>
</ref>
<ref id="CR23">
<label>23.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oh</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Caro-Quintero</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tsementzi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>DeLeon-Rodriguez</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Poretsky</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Konstantinidis</surname>
<given-names>KT</given-names>
</name>
</person-group>
<article-title>Metagenomic insights into the evolution, function, and complexity of the planktonic microbial community of Lake Lanier, a temperate freshwater ecosystem</article-title>
<source>Appl Environ Microbiol</source>
<year>2011</year>
<volume>77</volume>
<fpage>6000</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.00107-11</pub-id>
<pub-id pub-id-type="pmid">21764968</pub-id>
</element-citation>
</ref>
<ref id="CR24">
<label>24.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghai</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rodriguez-Valera</surname>
<given-names>F</given-names>
</name>
<name>
<surname>McMahon</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Toyama</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rinke</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cristina Souza de Oliveira</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Wagner Garcia</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Pellon de Miranda</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Henrique-Silva</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Metagenomics of the water column in the pristine upper course of the Amazon river</article-title>
<source>PLoS One</source>
<year>2011</year>
<volume>6</volume>
<fpage>e23785</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0023785</pub-id>
<pub-id pub-id-type="pmid">21915244</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Jeffries</surname>
<given-names>TC</given-names>
</name>
<name>
<surname>Roudnew</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Fitch</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Seymour</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Delpin</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Newton</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Mitchell</surname>
<given-names>JG</given-names>
</name>
</person-group>
<article-title>Metagenomic comparison of microbial communities inhabiting confined and unconfined aquifer ecosystems</article-title>
<source>Environ Microbiol</source>
<year>2012</year>
<volume>14</volume>
<fpage>240</fpage>
<lpage>53</lpage>
<pub-id pub-id-type="doi">10.1111/j.1462-2920.2011.02614.x</pub-id>
<pub-id pub-id-type="pmid">22004107</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bolger</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Lohse</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Usadel</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Trimmomatic: a flexible trimmer for illumina sequence data</article-title>
<source>Bioinformatics</source>
<year>2014</year>
<volume>30</volume>
<issue>15</issue>
<fpage>2114</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btu170</pub-id>
<pub-id pub-id-type="pmid">24695404</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garcia-Etxebarria</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Garcia-Garcerà</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Calafell</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Consistency of metagenomic assignment programs in simulated and real data</article-title>
<source>BMC Bioinformatics</source>
<year>2014</year>
<volume>15</volume>
<fpage>90</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-15-90</pub-id>
<pub-id pub-id-type="pmid">24678591</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wood</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Kraken: ultrafast metagenomic sequence classification using exact alignments</article-title>
<source>Genome Biol</source>
<year>2014</year>
<volume>15</volume>
<fpage>R46</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2014-15-3-r46</pub-id>
<pub-id pub-id-type="pmid">24580807</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ruscheweyh</surname>
<given-names>H-J</given-names>
</name>
<name>
<surname>Weber</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
</person-group>
<article-title>Integrative analysis of environmental sequences using MEGAN4</article-title>
<source>Genome Res</source>
<year>2011</year>
<volume>21</volume>
<fpage>1552</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1101/gr.120618.111</pub-id>
<pub-id pub-id-type="pmid">21690186</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Gibbons</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ghodsi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Treangen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences</article-title>
<source>BMC Genomics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl 2</issue>
<fpage>S4</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-12-S2-S4</pub-id>
<pub-id pub-id-type="pmid">21989143</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<label>31.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghosh</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Monzoorul Haque</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mande</surname>
<given-names>SS</given-names>
</name>
</person-group>
<article-title>DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11 Suppl 7</volume>
<fpage>S14</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-S7-S14</pub-id>
<pub-id pub-id-type="pmid">21106121</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<label>32.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>125</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr595</pub-id>
<pub-id pub-id-type="pmid">22039206</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Diaz</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Krause</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Goesmann</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Niehaus</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Nattkemper</surname>
<given-names>TW</given-names>
</name>
</person-group>
<article-title>TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>56</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-56</pub-id>
<pub-id pub-id-type="pmid">19210774</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<label>34.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fierer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Leff</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>UN</given-names>
</name>
<name>
<surname>Bates</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Lauber</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Owens</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Wall</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Caporaso</surname>
<given-names>JG</given-names>
</name>
</person-group>
<article-title>Cross-biome metagenomic analyses of soil microbial communities and their functional attributes</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>2012</year>
<volume>109</volume>
<fpage>21390</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1215210110</pub-id>
<pub-id pub-id-type="pmid">23236140</pub-id>
</element-citation>
</ref>
<ref id="CR35">
<label>35.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fukushima</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kakinuma</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kawaguchi</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Phylogenetic analysis of salmonella, shigella, and escherichia coli strains on the basis of the gyrB gene sequence</article-title>
<source>J Clin Microbiol</source>
<year>2002</year>
<volume>40</volume>
<fpage>2779</fpage>
<lpage>85</lpage>
<pub-id pub-id-type="doi">10.1128/JCM.40.8.2779-2785.2002</pub-id>
<pub-id pub-id-type="pmid">12149329</pub-id>
</element-citation>
</ref>
<ref id="CR36">
<label>36.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Økstad</surname>
<given-names>OA</given-names>
</name>
<name>
<surname>Kolstø</surname>
<given-names>A-B</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Wiedmann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Genomics of bacillus species</article-title>
<source>Genomics of foodborne bacterial pathogens</source>
<year>2011</year>
<publisher-loc>New York</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>29</fpage>
<lpage>53</lpage>
</element-citation>
</ref>
<ref id="CR37">
<label>37.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Hamada</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Horton</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Parameters for accurate genome alignment</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<fpage>80</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-80</pub-id>
<pub-id pub-id-type="pmid">20144198</pub-id>
</element-citation>
</ref>
<ref id="CR38">
<label>38.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA</article-title>
<source>Bioinformatics</source>
<year>2014</year>
<volume>30</volume>
<fpage>38</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btt254</pub-id>
<pub-id pub-id-type="pmid">23658416</pub-id>
</element-citation>
</ref>
<ref id="CR39">
<label>39.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buchfink</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>Fast and sensitive protein alignment using DIAMOND</article-title>
<source>Nat Methods</source>
<year>2015</year>
<volume>12</volume>
<fpage>59</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.3176</pub-id>
<pub-id pub-id-type="pmid">25402007</pub-id>
</element-citation>
</ref>
<ref id="CR40">
<label>40.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rogozin</surname>
<given-names>IB</given-names>
</name>
<name>
<surname>Makarova</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Natale</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Spiridonov</surname>
<given-names>AN</given-names>
</name>
<name>
<surname>Tatusov</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Wolf</surname>
<given-names>YI</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Congruent evolution of different classes of non‐coding DNA in prokaryotic genomes</article-title>
<source>Nucl Acids Res</source>
<year>2002</year>
<volume>30</volume>
<fpage>4264</fpage>
<lpage>71</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkf549</pub-id>
<pub-id pub-id-type="pmid">12364605</pub-id>
</element-citation>
</ref>
<ref id="CR41">
<label>41.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerlach</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Stoye</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Taxonomic classification of metagenomic shotgun sequences with CARMA3</article-title>
<source>Nucl Acids Res</source>
<year>2011</year>
<volume>39</volume>
<issue>14</issue>
<fpage>e91</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gkr225</pub-id>
<pub-id pub-id-type="pmid">21586583</pub-id>
</element-citation>
</ref>
<ref id="CR42">
<label>42.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fraser-Liggett</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<article-title>The human microbiome project</article-title>
<source>Nature</source>
<year>2007</year>
<volume>449</volume>
<fpage>804</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="doi">10.1038/nature06244</pub-id>
<pub-id pub-id-type="pmid">17943116</pub-id>
</element-citation>
</ref>
<ref id="CR43">
<label>43.</label>
<mixed-citation publication-type="other">The Critical Assessment of Metagenome Interpretation (CAMI) competition : Methagora.
<ext-link ext-link-type="uri" xlink:href="http://blogs.nature.com/methagora/2014/06/the-critical-assessment-of-metagenome-interpretation-camicompetition.html">http://blogs.nature.com/methagora/2014/06/the-critical-assessment-of-metagenome-interpretation-camicompetition.html.</ext-link>
(2014). Accessed 27 Oct 2015. </mixed-citation>
</ref>
<ref id="CR44">
<label>44.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mande</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Mohammed</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Ghosh</surname>
<given-names>TS</given-names>
</name>
</person-group>
<article-title>Classification of metagenomic sequences: methods and challenges</article-title>
<source>Brief Bioinform</source>
<year>2012</year>
<volume>13</volume>
<issue>6</issue>
<fpage>669</fpage>
<lpage>81</lpage>
<pub-id pub-id-type="doi">10.1093/bib/bbs054</pub-id>
<pub-id pub-id-type="pmid">22962338</pub-id>
</element-citation>
</ref>
<ref id="CR45">
<label>45.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Auch</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
</person-group>
<article-title>Methods for comparative metagenomics</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10 Suppl 1</volume>
<fpage>S12</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-S1-S12</pub-id>
<pub-id pub-id-type="pmid">19208111</pub-id>
</element-citation>
</ref>
<ref id="CR46">
<label>46.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Klar</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>Visual and statistical comparison of metagenomes</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>1849</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp341</pub-id>
<pub-id pub-id-type="pmid">19515961</pub-id>
</element-citation>
</ref>
<ref id="CR47">
<label>47.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rupek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Urich</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12 Suppl 1</volume>
<fpage>S21</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-S1-S21</pub-id>
<pub-id pub-id-type="pmid">21342551</pub-id>
</element-citation>
</ref>
<ref id="CR48">
<label>48.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Paarmann</surname>
<given-names>D</given-names>
</name>
<name>
<surname>D’Souza</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Glass</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Kubal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Paczian</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rodriguez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wilkening</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>386</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-386</pub-id>
<pub-id pub-id-type="pmid">18803844</pub-id>
</element-citation>
</ref>
<ref id="CR49">
<label>49.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seshadri</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kravitz</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Smarr</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gilna</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Frazier</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>CAMERA: a community resource for metagenomics</article-title>
<source>PLoS Biol</source>
<year>2007</year>
<volume>5</volume>
<fpage>e75</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pbio.0050075</pub-id>
<pub-id pub-id-type="pmid">17355175</pub-id>
</element-citation>
</ref>
<ref id="CR50">
<label>50.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Altintas</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Peltier</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stocks</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Ellisman</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grethe</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wooley</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>39</volume>
<issue>Database issue</issue>
<fpage>D546</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkq1102</pub-id>
<pub-id pub-id-type="pmid">21045053</pub-id>
</element-citation>
</ref>
<ref id="CR51">
<label>51.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>A probabilistic model of local sequence alignment that simplifies statistical significance estimation</article-title>
<source>PLoS Comput Biol</source>
<year>2008</year>
<volume>4</volume>
<fpage>e1000069</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000069</pub-id>
<pub-id pub-id-type="pmid">18516236</pub-id>
</element-citation>
</ref>
<ref id="CR52">
<label>52.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krause</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Diaz</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Goesmann</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kelley</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nattkemper</surname>
<given-names>TW</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Stoye</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Phylogenetic classification of short environmental DNA fragments</article-title>
<source>Nucleic Acids Res</source>
<year>2008</year>
<volume>36</volume>
<fpage>2230</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn038</pub-id>
<pub-id pub-id-type="pmid">18285365</pub-id>
</element-citation>
</ref>
<ref id="CR53">
<label>53.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerlach</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Jünemann</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tille</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Goesmann</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Stoye</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>430</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-430</pub-id>
<pub-id pub-id-type="pmid">20021646</pub-id>
</element-citation>
</ref>
<ref id="CR54">
<label>54.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Niu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>1704</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr252</pub-id>
<pub-id pub-id-type="pmid">21505035</pub-id>
</element-citation>
</ref>
<ref id="CR55">
<label>55.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Niu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>WebMGA: a customizable web server for fast metagenomic sequence analysis</article-title>
<source>BMC Genomics</source>
<year>2011</year>
<volume>12</volume>
<fpage>444</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-12-444</pub-id>
<pub-id pub-id-type="pmid">21899761</pub-id>
</element-citation>
</ref>
<ref id="CR56">
<label>56.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Monzoorul Haque</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ghosh</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Komanduri</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Mande</surname>
<given-names>SS</given-names>
</name>
</person-group>
<article-title>SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>1722</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp317</pub-id>
<pub-id pub-id-type="pmid">19439565</pub-id>
</element-citation>
</ref>
<ref id="CR57">
<label>57.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boisvert</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Raymond</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Godzaridis</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Laviolette</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Corbeil</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Ray Meta: scalable de novo metagenome assembly and profiling</article-title>
<source>Genome Biol</source>
<year>2012</year>
<volume>13</volume>
<fpage>R122</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2012-13-12-r122</pub-id>
<pub-id pub-id-type="pmid">23259615</pub-id>
</element-citation>
</ref>
<ref id="CR58">
<label>58.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Disz</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pusch</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Vonstein</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Overbeek</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Real time metagenomics: using k-mers to annotate metagenomes</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>3316</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts599</pub-id>
<pub-id pub-id-type="pmid">23047562</pub-id>
</element-citation>
</ref>
<ref id="CR59">
<label>59.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</article-title>
<source>Genome Biol</source>
<year>2009</year>
<volume>10</volume>
<fpage>R25</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2009-10-3-r25</pub-id>
<pub-id pub-id-type="pmid">19261174</pub-id>
</element-citation>
</ref>
<ref id="CR60">
<label>60.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Fast and accurate short read alignment with Burrows-Wheeler transform</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>1754</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp324</pub-id>
<pub-id pub-id-type="pmid">19451168</pub-id>
</element-citation>
</ref>
<ref id="CR61">
<label>61.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davenport</surname>
<given-names>CF</given-names>
</name>
<name>
<surname>Neugebauer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Beckmann</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Friedrich</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kameri</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kokott</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Paetow</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Siekmann</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wieding-Drewes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wienhöfer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wolf</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tümmler</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ahlers</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Sprengel</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Genometa--a fast and accurate classifier for short metagenomic shotgun reads</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<fpage>e41224</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0041224</pub-id>
<pub-id pub-id-type="pmid">22927906</pub-id>
</element-citation>
</ref>
<ref id="CR62">
<label>62.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ames</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Hysom</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Gardner</surname>
<given-names>SN</given-names>
</name>
<name>
<surname>Lloyd</surname>
<given-names>GS</given-names>
</name>
<name>
<surname>Gokhale</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>JE</given-names>
</name>
</person-group>
<article-title>Scalable metagenomic taxonomy classification using a reference genome database</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>2253</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btt389</pub-id>
<pub-id pub-id-type="pmid">23828782</pub-id>
</element-citation>
</ref>
<ref id="CR63">
<label>63.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berendzen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bruno</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Cohn</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Hengartner</surname>
<given-names>NW</given-names>
</name>
<name>
<surname>Kuske</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>McMahon</surname>
<given-names>BH</given-names>
</name>
<name>
<surname>Wolinsky</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Rapid phylogenetic and functional classification of short genomic fragments with signature peptides</article-title>
<source>BMC Res Notes</source>
<year>2012</year>
<volume>5</volume>
<fpage>460</fpage>
<pub-id pub-id-type="doi">10.1186/1756-0500-5-460</pub-id>
<pub-id pub-id-type="pmid">22925230</pub-id>
</element-citation>
</ref>
<ref id="CR64">
<label>64.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>VK</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Prakash</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>TD</given-names>
</name>
</person-group>
<article-title>Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<fpage>e34030</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0034030</pub-id>
<pub-id pub-id-type="pmid">22496776</pub-id>
</element-citation>
</ref>
<ref id="CR65">
<label>65.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>An</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<fpage>e46450</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0046450</pub-id>
<pub-id pub-id-type="pmid">23049702</pub-id>
</element-citation>
</ref>
<ref id="CR66">
<label>66.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Porter</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Beiko</surname>
<given-names>RG</given-names>
</name>
</person-group>
<article-title>SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>1858</fpage>
<lpage>64</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btt313</pub-id>
<pub-id pub-id-type="pmid">23732273</pub-id>
</element-citation>
</ref>
<ref id="CR67">
<label>67.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freitas</surname>
<given-names>TAK</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P-E</given-names>
</name>
<name>
<surname>Scholz</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Chain</surname>
<given-names>PSG</given-names>
</name>
</person-group>
<article-title>Accurate read-based metagenome characterization using a hierarchical suite of unique signatures</article-title>
<source>Nucl Acids Res</source>
<year>2015</year>
<volume>43</volume>
<fpage>e69</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gkv180</pub-id>
<pub-id pub-id-type="pmid">25765641</pub-id>
</element-citation>
</ref>
<ref id="CR68">
<label>68.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ounit</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wanamaker</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Close</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Lonardi</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers</article-title>
<source>BMC Genomics</source>
<year>2015</year>
<volume>16</volume>
<fpage>236</fpage>
<pub-id pub-id-type="doi">10.1186/s12864-015-1419-2</pub-id>
<pub-id pub-id-type="pmid">25879410</pub-id>
</element-citation>
</ref>
<ref id="CR69">
<label>69.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tringe</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Doerks</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Ward</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Quantitative phylogenetic assessment of microbial communities in diverse environments</article-title>
<source>Science</source>
<year>2007</year>
<volume>315</volume>
<fpage>1126</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1126/science.1133420</pub-id>
<pub-id pub-id-type="pmid">17272687</pub-id>
</element-citation>
</ref>
<ref id="CR70">
<label>70.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stark</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>MLTreeMap--accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies</article-title>
<source>BMC Genomics</source>
<year>2010</year>
<volume>11</volume>
<fpage>461</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-11-461</pub-id>
<pub-id pub-id-type="pmid">20687950</pub-id>
</element-citation>
</ref>
<ref id="CR71">
<label>71.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>A simple, fast, and accurate method of phylogenomic inference</article-title>
<source>Genome Biol</source>
<year>2008</year>
<volume>9</volume>
<fpage>R151</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2008-9-10-r151</pub-id>
<pub-id pub-id-type="pmid">18851752</pub-id>
</element-citation>
</ref>
<ref id="CR72">
<label>72.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kerepesi</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bánky</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Grolmusz</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite</article-title>
<source>Gene</source>
<year>2014</year>
<volume>533</volume>
<fpage>538</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="doi">10.1016/j.gene.2013.10.015</pub-id>
<pub-id pub-id-type="pmid">24144838</pub-id>
</element-citation>
</ref>
<ref id="CR73">
<label>73.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Fast gapped-read alignment with Bowtie 2</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>9</volume>
<fpage>357</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id>
<pub-id pub-id-type="pmid">22388286</pub-id>
</element-citation>
</ref>
<ref id="CR74">
<label>74.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brady</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>PhymmBL expanded: confidence scores, custom databases, parallelization and more</article-title>
<source>Nat Methods</source>
<year>2011</year>
<volume>8</volume>
<fpage>367</fpage>
<pub-id pub-id-type="doi">10.1038/nmeth0511-367</pub-id>
<pub-id pub-id-type="pmid">21527926</pub-id>
</element-citation>
</ref>
<ref id="CR75">
<label>75.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parks</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>MacDonald</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Beiko</surname>
<given-names>RG</given-names>
</name>
</person-group>
<article-title>Classifying short genomic fragments from novel lineages using composition and homology</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<fpage>328</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-328</pub-id>
<pub-id pub-id-type="pmid">21827705</pub-id>
</element-citation>
</ref>
<ref id="CR76">
<label>76.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>MacDonald</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Parks</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Beiko</surname>
<given-names>RG</given-names>
</name>
</person-group>
<article-title>Rapid identification of high-confidence taxonomic assignments for metagenomic data</article-title>
<source>Nucleic Acids Res</source>
<year>2012</year>
<volume>40</volume>
<fpage>e111</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gks335</pub-id>
<pub-id pub-id-type="pmid">22532608</pub-id>
</element-citation>
</ref>
<ref id="CR77">
<label>77.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klingenberg</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Aßhauer</surname>
<given-names>KP</given-names>
</name>
<name>
<surname>Lingner</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Meinicke</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Protein signature-based estimation of metagenomic abundances including all domains of life and viruses</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>973</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btt077</pub-id>
<pub-id pub-id-type="pmid">23418187</pub-id>
</element-citation>
</ref>
<ref id="CR78">
<label>78.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reddy</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Mohammed</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Mande</surname>
<given-names>SS</given-names>
</name>
</person-group>
<article-title>TWARIT: an extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences</article-title>
<source>Gene</source>
<year>2012</year>
<volume>505</volume>
<fpage>259</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1016/j.gene.2012.06.014</pub-id>
<pub-id pub-id-type="pmid">22710135</pub-id>
</element-citation>
</ref>
<ref id="CR79">
<label>79.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patil</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Haider</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pope</surname>
<given-names>PB</given-names>
</name>
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Scheffer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
</person-group>
<article-title>Taxonomic metagenome sequence assignment with structured output models</article-title>
<source>Nat Methods</source>
<year>2011</year>
<volume>8</volume>
<fpage>191</fpage>
<lpage>2</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth0311-191</pub-id>
<pub-id pub-id-type="pmid">21358620</pub-id>
</element-citation>
</ref>
<ref id="CR80">
<label>80.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patil</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Roune</surname>
<given-names>L</given-names>
</name>
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
</person-group>
<article-title>The PhyloPythiaS web server for taxonomic assignment of metagenome sequences</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<fpage>e38581</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0038581</pub-id>
<pub-id pub-id-type="pmid">22745671</pub-id>
</element-citation>
</ref>
<ref id="CR81">
<label>81.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Garbarine</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Caseiro</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Polikar</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sokhansanj</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Metagenome fragment classification using N-mer frequency profiles</article-title>
<source>Adv Bioinformatics</source>
<year>2008</year>
<volume>2008</volume>
<fpage>205969</fpage>
<pub-id pub-id-type="doi">10.1155/2008/205969</pub-id>
<pub-id pub-id-type="pmid">19956701</pub-id>
</element-citation>
</ref>
<ref id="CR82">
<label>82.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosen</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Reichenberger</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Rosenfeld</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>127</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq619</pub-id>
<pub-id pub-id-type="pmid">21062764</pub-id>
</element-citation>
</ref>
<ref id="CR83">
<label>83.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nalbantoglu</surname>
<given-names>OU</given-names>
</name>
<name>
<surname>Way</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Sayood</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<fpage>41</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-41</pub-id>
<pub-id pub-id-type="pmid">21281493</pub-id>
</element-citation>
</ref>
<ref id="CR84">
<label>84.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pati</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Heath</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>ClaMS: a Classifier for Metagenomic Sequences</article-title>
<source>Stand Genomic Sci</source>
<year>2011</year>
<volume>5</volume>
<fpage>248</fpage>
<lpage>53</lpage>
<pub-id pub-id-type="doi">10.4056/sigs.2075298</pub-id>
<pub-id pub-id-type="pmid">22180827</pub-id>
</element-citation>
</ref>
<ref id="CR85">
<label>85.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohammed</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Ghosh</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Reddy</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Reddy</surname>
<given-names>CVSK</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>NK</given-names>
</name>
<name>
<surname>Mande</surname>
<given-names>SS</given-names>
</name>
</person-group>
<article-title>INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences</article-title>
<source>BMC Genomics</source>
<year>2011</year>
<volume>12 Suppl 3</volume>
<fpage>S4</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-12-S3-S4</pub-id>
<pub-id pub-id-type="pmid">22369237</pub-id>
</element-citation>
</ref>
<ref id="CR86">
<label>86.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rasheed</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Rangwala</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Metagenomic taxonomic classification using extreme learning machines</article-title>
<source>J Bioinform Comput Biol</source>
<year>2012</year>
<volume>10</volume>
<fpage>1250015</fpage>
<pub-id pub-id-type="doi">10.1142/S0219720012500151</pub-id>
<pub-id pub-id-type="pmid">22849369</pub-id>
</element-citation>
</ref>
<ref id="CR87">
<label>87.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms</article-title>
<source>Nucleic Acids Res</source>
<year>2013</year>
<volume>41</volume>
<fpage>e3</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gks828</pub-id>
<pub-id pub-id-type="pmid">22941634</pub-id>
</element-citation>
</ref>
<ref id="CR88">
<label>88.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Farmerie</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis</article-title>
<source>Bioinformation</source>
<year>2010</year>
<volume>4</volume>
<fpage>46</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.6026/97320630004046</pub-id>
<pub-id pub-id-type="pmid">20011152</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0001599 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0001599 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024