MersV1, Pmc, Corpus, bibRecord, 000965

***** Acces problem to record *****\

Identifieur interne : 000965 ( Pmc/Corpus ); précédent : 0009649; suivant : 0009660 ***** probable Xml problem with record *****

Links to Exploration step

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Ray Meta: scalable <italic>de novo </italic>
metagenome assembly and profiling</title>
<author><name sortKey="Boisvert, Sebastien" sort="Boisvert, Sebastien" uniqKey="Boisvert S" first="Sébastien" last="Boisvert">Sébastien Boisvert</name>
<affiliation><nlm:aff id="I1">Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Raymond, Frederic" sort="Raymond, Frederic" uniqKey="Raymond F" first="Frédéric" last="Raymond">Frédéric Raymond</name>
<affiliation><nlm:aff id="I1">Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Godzaridis, Elenie" sort="Godzaridis, Elenie" uniqKey="Godzaridis E" first="Élénie" last="Godzaridis">Élénie Godzaridis</name>
<affiliation><nlm:aff id="I2">Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Laviolette, Francois" sort="Laviolette, Francois" uniqKey="Laviolette F" first="François" last="Laviolette">François Laviolette</name>
<affiliation><nlm:aff id="I3">Department of Computer Science and Software Engineering, Faculty of Science and Engineering, Laval University, 1065, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Corbeil, Jacques" sort="Corbeil, Jacques" uniqKey="Corbeil J" first="Jacques" last="Corbeil">Jacques Corbeil</name>
<affiliation><nlm:aff id="I1">Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I4">Department of Molecular Medicine, Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">23259615</idno>
<idno type="pmc">4056372</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4056372</idno>
<idno type="RBID">PMC:4056372</idno>
<idno type="doi">10.1186/gb-2012-13-12-r122</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000965</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000965</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Ray Meta: scalable <italic>de novo </italic>
metagenome assembly and profiling</title>
<author><name sortKey="Boisvert, Sebastien" sort="Boisvert, Sebastien" uniqKey="Boisvert S" first="Sébastien" last="Boisvert">Sébastien Boisvert</name>
<affiliation><nlm:aff id="I1">Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Raymond, Frederic" sort="Raymond, Frederic" uniqKey="Raymond F" first="Frédéric" last="Raymond">Frédéric Raymond</name>
<affiliation><nlm:aff id="I1">Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Godzaridis, Elenie" sort="Godzaridis, Elenie" uniqKey="Godzaridis E" first="Élénie" last="Godzaridis">Élénie Godzaridis</name>
<affiliation><nlm:aff id="I2">Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Laviolette, Francois" sort="Laviolette, Francois" uniqKey="Laviolette F" first="François" last="Laviolette">François Laviolette</name>
<affiliation><nlm:aff id="I3">Department of Computer Science and Software Engineering, Faculty of Science and Engineering, Laval University, 1065, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Corbeil, Jacques" sort="Corbeil, Jacques" uniqKey="Corbeil J" first="Jacques" last="Corbeil">Jacques Corbeil</name>
<affiliation><nlm:aff id="I1">Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I4">Department of Molecular Medicine, Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Genome Biology</title>
<idno type="ISSN">1465-6906</idno>
<idno type="eISSN">1465-6914</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for <italic>de novo </italic>
assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at <ext-link ext-link-type="uri" xlink:href="http://denovoassembler.sf.net">http://denovoassembler.sf.net</ext-link>
.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
<author><name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brenner, S" uniqKey="Brenner S">S Brenner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mcpherson, Jd" uniqKey="Mcpherson J">JD McPherson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mardis, E" uniqKey="Mardis E">E Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Compeau, Pec" uniqKey="Compeau P">PEC Compeau</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author><name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Iqbal, Z" uniqKey="Iqbal Z">Z Iqbal</name>
</author>
<author><name sortKey="Caccamo, M" uniqKey="Caccamo M">M Caccamo</name>
</author>
<author><name sortKey="Turner, I" uniqKey="Turner I">I Turner</name>
</author>
<author><name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author><name sortKey="Mcvean, G" uniqKey="Mcvean G">G McVean</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miller, Jr" uniqKey="Miller J">JR Miller</name>
</author>
<author><name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
<author><name sortKey="Sutton, G" uniqKey="Sutton G">G Sutton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lorenz, P" uniqKey="Lorenz P">P Lorenz</name>
</author>
<author><name sortKey="Eck, J" uniqKey="Eck J">J Eck</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Scholz, Mb" uniqKey="Scholz M">MB Scholz</name>
</author>
<author><name sortKey="Lo, Cc" uniqKey="Lo C">CC Lo</name>
</author>
<author><name sortKey="Chain, Psg" uniqKey="Chain P">PSG Chain</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schoenfeld, T" uniqKey="Schoenfeld T">T Schoenfeld</name>
</author>
<author><name sortKey="Patterson, M" uniqKey="Patterson M">M Patterson</name>
</author>
<author><name sortKey="Richardson, Pm" uniqKey="Richardson P">PM Richardson</name>
</author>
<author><name sortKey="Wommack, Ke" uniqKey="Wommack K">KE Wommack</name>
</author>
<author><name sortKey="Young, M" uniqKey="Young M">M Young</name>
</author>
<author><name sortKey="Mead, D" uniqKey="Mead D">D Mead</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Varin, T" uniqKey="Varin T">T Varin</name>
</author>
<author><name sortKey="Lovejoy, C" uniqKey="Lovejoy C">C Lovejoy</name>
</author>
<author><name sortKey="Jungblut, Ad" uniqKey="Jungblut A">AD Jungblut</name>
</author>
<author><name sortKey="Vincent, Wf" uniqKey="Vincent W">WF Vincent</name>
</author>
<author><name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Varin, T" uniqKey="Varin T">T Varin</name>
</author>
<author><name sortKey="Lovejoy, C" uniqKey="Lovejoy C">C Lovejoy</name>
</author>
<author><name sortKey="Jungblut, Ad" uniqKey="Jungblut A">AD Jungblut</name>
</author>
<author><name sortKey="Vincent, Wf" uniqKey="Vincent W">WF Vincent</name>
</author>
<author><name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Narasingarao, P" uniqKey="Narasingarao P">P Narasingarao</name>
</author>
<author><name sortKey="Podell, S" uniqKey="Podell S">S Podell</name>
</author>
<author><name sortKey="Ugalde, Ja" uniqKey="Ugalde J">JA Ugalde</name>
</author>
<author><name sortKey="Brochier Armanet, C" uniqKey="Brochier Armanet C">C Brochier-Armanet</name>
</author>
<author><name sortKey="Emerson, Jb" uniqKey="Emerson J">JB Emerson</name>
</author>
<author><name sortKey="Brocks, Jj" uniqKey="Brocks J">JJ Brocks</name>
</author>
<author><name sortKey="Heidelberg, Kb" uniqKey="Heidelberg K">KB Heidelberg</name>
</author>
<author><name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
<author><name sortKey="Allen, Ee" uniqKey="Allen E">EE Allen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tringe, Sg" uniqKey="Tringe S">SG Tringe</name>
</author>
<author><name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author><name sortKey="Kobayashi, A" uniqKey="Kobayashi A">A Kobayashi</name>
</author>
<author><name sortKey="Salamov, Aa" uniqKey="Salamov A">AA Salamov</name>
</author>
<author><name sortKey="Chen, K" uniqKey="Chen K">K Chen</name>
</author>
<author><name sortKey="Chang, Hw" uniqKey="Chang H">HW Chang</name>
</author>
<author><name sortKey="Podar, M" uniqKey="Podar M">M Podar</name>
</author>
<author><name sortKey="Short, Jm" uniqKey="Short J">JM Short</name>
</author>
<author><name sortKey="Mathur, Ej" uniqKey="Mathur E">EJ Mathur</name>
</author>
<author><name sortKey="Detter, Jc" uniqKey="Detter J">JC Detter</name>
</author>
<author><name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author><name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author><name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tyson, Gw" uniqKey="Tyson G">GW Tyson</name>
</author>
<author><name sortKey="Chapman, J" uniqKey="Chapman J">J Chapman</name>
</author>
<author><name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author><name sortKey="Allen, Ee" uniqKey="Allen E">EE Allen</name>
</author>
<author><name sortKey="Ram, Rj" uniqKey="Ram R">RJ Ram</name>
</author>
<author><name sortKey="Richardson, Pm" uniqKey="Richardson P">PM Richardson</name>
</author>
<author><name sortKey="Solovyev, Vv" uniqKey="Solovyev V">VV Solovyev</name>
</author>
<author><name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author><name sortKey="Rokhsar, Ds" uniqKey="Rokhsar D">DS Rokhsar</name>
</author>
<author><name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Naviaux, Rk" uniqKey="Naviaux R">RK Naviaux</name>
</author>
<author><name sortKey="Good, B" uniqKey="Good B">B Good</name>
</author>
<author><name sortKey="Mcpherson, Jd" uniqKey="Mcpherson J">JD McPherson</name>
</author>
<author><name sortKey="Steffen, Dl" uniqKey="Steffen D">DL Steffen</name>
</author>
<author><name sortKey="Markusic, D" uniqKey="Markusic D">D Markusic</name>
</author>
<author><name sortKey="Ransom, B" uniqKey="Ransom B">B Ransom</name>
</author>
<author><name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cho, I" uniqKey="Cho I">I Cho</name>
</author>
<author><name sortKey="Blaser, Mj" uniqKey="Blaser M">MJ Blaser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gill, Sr" uniqKey="Gill S">SR Gill</name>
</author>
<author><name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author><name sortKey="Deboy, Rt" uniqKey="Deboy R">RT Deboy</name>
</author>
<author><name sortKey="Eckburg, Pb" uniqKey="Eckburg P">PB Eckburg</name>
</author>
<author><name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author><name sortKey="Samuel, Bs" uniqKey="Samuel B">BS Samuel</name>
</author>
<author><name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
<author><name sortKey="Relman, Da" uniqKey="Relman D">DA Relman</name>
</author>
<author><name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
<author><name sortKey="Nelson, Ke" uniqKey="Nelson K">KE Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
<author><name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author><name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author><name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author><name sortKey="Burgdorf, Ks" uniqKey="Burgdorf K">KS Burgdorf</name>
</author>
<author><name sortKey="Manichanh, C" uniqKey="Manichanh C">C Manichanh</name>
</author>
<author><name sortKey="Nielsen, T" uniqKey="Nielsen T">T Nielsen</name>
</author>
<author><name sortKey="Pons, N" uniqKey="Pons N">N Pons</name>
</author>
<author><name sortKey="Levenez, F" uniqKey="Levenez F">F Levenez</name>
</author>
<author><name sortKey="Yamada, T" uniqKey="Yamada T">T Yamada</name>
</author>
<author><name sortKey="Mende, Dr" uniqKey="Mende D">DR Mende</name>
</author>
<author><name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author><name sortKey="Xu, J" uniqKey="Xu J">J Xu</name>
</author>
<author><name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author><name sortKey="Li, D" uniqKey="Li D">D Li</name>
</author>
<author><name sortKey="Cao, J" uniqKey="Cao J">J Cao</name>
</author>
<author><name sortKey="Wang, B" uniqKey="Wang B">B Wang</name>
</author>
<author><name sortKey="Liang, H" uniqKey="Liang H">H Liang</name>
</author>
<author><name sortKey="Zheng, H" uniqKey="Zheng H">H Zheng</name>
</author>
<author><name sortKey="Xie, Y" uniqKey="Xie Y">Y Xie</name>
</author>
<author><name sortKey="Tap, J" uniqKey="Tap J">J Tap</name>
</author>
<author><name sortKey="Lepage, P" uniqKey="Lepage P">P Lepage</name>
</author>
<author><name sortKey="Bertalan, M" uniqKey="Bertalan M">M Bertalan</name>
</author>
<author><name sortKey="Batto, Jm" uniqKey="Batto J">JM Batto</name>
</author>
<author><name sortKey="Hansen, T" uniqKey="Hansen T">T Hansen</name>
</author>
<author><name sortKey="Le Paslier, D" uniqKey="Le Paslier D">D Le Paslier</name>
</author>
<author><name sortKey="Linneberg, A" uniqKey="Linneberg A">A Linneberg</name>
</author>
<author><name sortKey="Nielsen, Hb" uniqKey="Nielsen H">HB Nielsen</name>
</author>
<author><name sortKey="Pelletier, E" uniqKey="Pelletier E">E Pelletier</name>
</author>
<author><name sortKey="Renault, P" uniqKey="Renault P">P Renault</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author><name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author><name sortKey="Pelletier, E" uniqKey="Pelletier E">E Pelletier</name>
</author>
<author><name sortKey="Le Paslier, D" uniqKey="Le Paslier D">D Le Paslier</name>
</author>
<author><name sortKey="Yamada, T" uniqKey="Yamada T">T Yamada</name>
</author>
<author><name sortKey="Mende, Dr" uniqKey="Mende D">DR Mende</name>
</author>
<author><name sortKey="Fernandes, Gr" uniqKey="Fernandes G">GR Fernandes</name>
</author>
<author><name sortKey="Tap, J" uniqKey="Tap J">J Tap</name>
</author>
<author><name sortKey="Bruls, T" uniqKey="Bruls T">T Bruls</name>
</author>
<author><name sortKey="Batto, Jmm" uniqKey="Batto J">JMM Batto</name>
</author>
<author><name sortKey="Bertalan, M" uniqKey="Bertalan M">M Bertalan</name>
</author>
<author><name sortKey="Borruel, N" uniqKey="Borruel N">N Borruel</name>
</author>
<author><name sortKey="Casellas, F" uniqKey="Casellas F">F Casellas</name>
</author>
<author><name sortKey="Fernandez, L" uniqKey="Fernandez L">L Fernandez</name>
</author>
<author><name sortKey="Gautier, L" uniqKey="Gautier L">L Gautier</name>
</author>
<author><name sortKey="Hansen, T" uniqKey="Hansen T">T Hansen</name>
</author>
<author><name sortKey="Hattori, M" uniqKey="Hattori M">M Hattori</name>
</author>
<author><name sortKey="Hayashi, T" uniqKey="Hayashi T">T Hayashi</name>
</author>
<author><name sortKey="Kleerebezem, M" uniqKey="Kleerebezem M">M Kleerebezem</name>
</author>
<author><name sortKey="Kurokawa, K" uniqKey="Kurokawa K">K Kurokawa</name>
</author>
<author><name sortKey="Leclerc, M" uniqKey="Leclerc M">M Leclerc</name>
</author>
<author><name sortKey="Levenez, F" uniqKey="Levenez F">F Levenez</name>
</author>
<author><name sortKey="Manichanh, C" uniqKey="Manichanh C">C Manichanh</name>
</author>
<author><name sortKey="Nielsen, Hb" uniqKey="Nielsen H">HB Nielsen</name>
</author>
<author><name sortKey="Nielsen, T" uniqKey="Nielsen T">T Nielsen</name>
</author>
<author><name sortKey="Pons, N" uniqKey="Pons N">N Pons</name>
</author>
<author><name sortKey="Poulain, J" uniqKey="Poulain J">J Poulain</name>
</author>
<author><name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
<author><name sortKey="Sicheritz Ponten, T" uniqKey="Sicheritz Ponten T">T Sicheritz-Ponten</name>
</author>
<author><name sortKey="Tims, S" uniqKey="Tims S">S Tims</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Consortium, Thmp" uniqKey="Consortium T">THMP Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schloss, Pd" uniqKey="Schloss P">PD Schloss</name>
</author>
<author><name sortKey="Handelsman, J" uniqKey="Handelsman J">J Handelsman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author><name sortKey="Gibbons, T" uniqKey="Gibbons T">T Gibbons</name>
</author>
<author><name sortKey="Ghodsi, M" uniqKey="Ghodsi M">M Ghodsi</name>
</author>
<author><name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
<author><name sortKey="Waldron, L" uniqKey="Waldron L">L Waldron</name>
</author>
<author><name sortKey="Ballarini, A" uniqKey="Ballarini A">A Ballarini</name>
</author>
<author><name sortKey="Narasimhan, V" uniqKey="Narasimhan V">V Narasimhan</name>
</author>
<author><name sortKey="Jousson, O" uniqKey="Jousson O">O Jousson</name>
</author>
<author><name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mcdonald, D" uniqKey="Mcdonald D">D McDonald</name>
</author>
<author><name sortKey="Price, Mn" uniqKey="Price M">MN Price</name>
</author>
<author><name sortKey="Goodrich, J" uniqKey="Goodrich J">J Goodrich</name>
</author>
<author><name sortKey="Nawrocki, Ep" uniqKey="Nawrocki E">EP Nawrocki</name>
</author>
<author><name sortKey="Desantis, Tz" uniqKey="Desantis T">TZ DeSantis</name>
</author>
<author><name sortKey="Probst, A" uniqKey="Probst A">A Probst</name>
</author>
<author><name sortKey="Andersen, Gl" uniqKey="Andersen G">GL Andersen</name>
</author>
<author><name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author><name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author><name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author><name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
<author><name sortKey="Botstein, D" uniqKey="Botstein D">D Botstein</name>
</author>
<author><name sortKey="Butler, H" uniqKey="Butler H">H Butler</name>
</author>
<author><name sortKey="Cherry, Jm" uniqKey="Cherry J">JM Cherry</name>
</author>
<author><name sortKey="Davis, Ap" uniqKey="Davis A">AP Davis</name>
</author>
<author><name sortKey="Dolinski, K" uniqKey="Dolinski K">K Dolinski</name>
</author>
<author><name sortKey="Dwight, Ss" uniqKey="Dwight S">SS Dwight</name>
</author>
<author><name sortKey="Eppig, Jt" uniqKey="Eppig J">JT Eppig</name>
</author>
<author><name sortKey="Harris, Ma" uniqKey="Harris M">MA Harris</name>
</author>
<author><name sortKey="Hill, Dp" uniqKey="Hill D">DP Hill</name>
</author>
<author><name sortKey="Issel Tarver, L" uniqKey="Issel Tarver L">L Issel-Tarver</name>
</author>
<author><name sortKey="Kasarskis, A" uniqKey="Kasarskis A">A Kasarskis</name>
</author>
<author><name sortKey="Lewis, S" uniqKey="Lewis S">S Lewis</name>
</author>
<author><name sortKey="Matese, Jc" uniqKey="Matese J">JC Matese</name>
</author>
<author><name sortKey="Richardson, Je" uniqKey="Richardson J">JE Richardson</name>
</author>
<author><name sortKey="Ringwald, M" uniqKey="Ringwald M">M Ringwald</name>
</author>
<author><name sortKey="Rubin, Gm" uniqKey="Rubin G">GM Rubin</name>
</author>
<author><name sortKey="Sherlock, G" uniqKey="Sherlock G">G Sherlock</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author><name sortKey="Wong, K" uniqKey="Wong K">K Wong</name>
</author>
<author><name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author><name sortKey="Schein, Je" uniqKey="Schein J">JE Schein</name>
</author>
<author><name sortKey="Jones, Sjm" uniqKey="Jones S">SJM Jones</name>
</author>
<author><name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author><name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author><name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
<author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author><name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author><name sortKey="Ruscheweyh, Hj" uniqKey="Ruscheweyh H">HJ Ruscheweyh</name>
</author>
<author><name sortKey="Weber, N" uniqKey="Weber N">N Weber</name>
</author>
<author><name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author><name sortKey="Paarmann, D" uniqKey="Paarmann D">D Paarmann</name>
</author>
<author><name sortKey="D Souza, M" uniqKey="D Souza M">M D'Souza</name>
</author>
<author><name sortKey="Olson, R" uniqKey="Olson R">R Olson</name>
</author>
<author><name sortKey="Glass, Em" uniqKey="Glass E">EM Glass</name>
</author>
<author><name sortKey="Kubal, M" uniqKey="Kubal M">M Kubal</name>
</author>
<author><name sortKey="Paczian, T" uniqKey="Paczian T">T Paczian</name>
</author>
<author><name sortKey="Rodriguez, A" uniqKey="Rodriguez A">A Rodriguez</name>
</author>
<author><name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
<author><name sortKey="Wilke, A" uniqKey="Wilke A">A Wilke</name>
</author>
<author><name sortKey="Wilkening, J" uniqKey="Wilkening J">J Wilkening</name>
</author>
<author><name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dixon, P" uniqKey="Dixon P">P Dixon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Caporaso, Jg" uniqKey="Caporaso J">JG Caporaso</name>
</author>
<author><name sortKey="Kuczynski, J" uniqKey="Kuczynski J">J Kuczynski</name>
</author>
<author><name sortKey="Stombaugh, J" uniqKey="Stombaugh J">J Stombaugh</name>
</author>
<author><name sortKey="Bittinger, K" uniqKey="Bittinger K">K Bittinger</name>
</author>
<author><name sortKey="Bushman, Fd" uniqKey="Bushman F">FD Bushman</name>
</author>
<author><name sortKey="Costello, Ek" uniqKey="Costello E">EK Costello</name>
</author>
<author><name sortKey="Fierer, N" uniqKey="Fierer N">N Fierer</name>
</author>
<author><name sortKey="Pena, Ag" uniqKey="Pena A">AG Pena</name>
</author>
<author><name sortKey="Goodrich, Jk" uniqKey="Goodrich J">JK Goodrich</name>
</author>
<author><name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
<author><name sortKey="Huttley, Ga" uniqKey="Huttley G">GA Huttley</name>
</author>
<author><name sortKey="Kelley, St" uniqKey="Kelley S">ST Kelley</name>
</author>
<author><name sortKey="Knights, D" uniqKey="Knights D">D Knights</name>
</author>
<author><name sortKey="Koenig, Je" uniqKey="Koenig J">JE Koenig</name>
</author>
<author><name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author><name sortKey="Lozupone, Ca" uniqKey="Lozupone C">CA Lozupone</name>
</author>
<author><name sortKey="Mcdonald, D" uniqKey="Mcdonald D">D McDonald</name>
</author>
<author><name sortKey="Muegge, Bd" uniqKey="Muegge B">BD Muegge</name>
</author>
<author><name sortKey="Pirrung, M" uniqKey="Pirrung M">M Pirrung</name>
</author>
<author><name sortKey="Reeder, J" uniqKey="Reeder J">J Reeder</name>
</author>
<author><name sortKey="Sevinsky, Jr" uniqKey="Sevinsky J">JR Sevinsky</name>
</author>
<author><name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author><name sortKey="Walters, Wa" uniqKey="Walters W">WA Walters</name>
</author>
<author><name sortKey="Widmann, J" uniqKey="Widmann J">J Widmann</name>
</author>
<author><name sortKey="Yatsunenko, T" uniqKey="Yatsunenko T">T Yatsunenko</name>
</author>
<author><name sortKey="Zaneveld, J" uniqKey="Zaneveld J">J Zaneveld</name>
</author>
<author><name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krause, L" uniqKey="Krause L">L Krause</name>
</author>
<author><name sortKey="Diaz, Nn" uniqKey="Diaz N">NN Diaz</name>
</author>
<author><name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author><name sortKey="Kelley, S" uniqKey="Kelley S">S Kelley</name>
</author>
<author><name sortKey="Nattkemper, Tw" uniqKey="Nattkemper T">TW Nattkemper</name>
</author>
<author><name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author><name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author><name sortKey="Stoye, J" uniqKey="Stoye J">J Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brady, A" uniqKey="Brady A">A Brady</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Namiki, T" uniqKey="Namiki T">T Namiki</name>
</author>
<author><name sortKey="Hachiya, T" uniqKey="Hachiya T">T Hachiya</name>
</author>
<author><name sortKey="Tanaka, H" uniqKey="Tanaka H">H Tanaka</name>
</author>
<author><name sortKey="Sakakibara, Y" uniqKey="Sakakibara Y">Y Sakakibara</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author><name sortKey="Leung, Hcm" uniqKey="Leung H">HCM Leung</name>
</author>
<author><name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author><name sortKey="Chin, Fyl" uniqKey="Chin F">FYL Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Laserson, J" uniqKey="Laserson J">J Laserson</name>
</author>
<author><name sortKey="Jojic, V" uniqKey="Jojic V">V Jojic</name>
</author>
<author><name sortKey="Koller, D" uniqKey="Koller D">D Koller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, Gd" uniqKey="Wu G">GD Wu</name>
</author>
<author><name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author><name sortKey="Hoffmann, C" uniqKey="Hoffmann C">C Hoffmann</name>
</author>
<author><name sortKey="Bittinger, K" uniqKey="Bittinger K">K Bittinger</name>
</author>
<author><name sortKey="Chen, Yyy" uniqKey="Chen Y">YYY Chen</name>
</author>
<author><name sortKey="Keilbaugh, Sa" uniqKey="Keilbaugh S">SA Keilbaugh</name>
</author>
<author><name sortKey="Bewtra, M" uniqKey="Bewtra M">M Bewtra</name>
</author>
<author><name sortKey="Knights, D" uniqKey="Knights D">D Knights</name>
</author>
<author><name sortKey="Walters, Wa" uniqKey="Walters W">WA Walters</name>
</author>
<author><name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author><name sortKey="Sinha, R" uniqKey="Sinha R">R Sinha</name>
</author>
<author><name sortKey="Gilroy, E" uniqKey="Gilroy E">E Gilroy</name>
</author>
<author><name sortKey="Gupta, K" uniqKey="Gupta K">K Gupta</name>
</author>
<author><name sortKey="Baldassano, R" uniqKey="Baldassano R">R Baldassano</name>
</author>
<author><name sortKey="Nessel, L" uniqKey="Nessel L">L Nessel</name>
</author>
<author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author><name sortKey="Bushman, Fd" uniqKey="Bushman F">FD Bushman</name>
</author>
<author><name sortKey="Lewis, Jd" uniqKey="Lewis J">JD Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kurtz, S" uniqKey="Kurtz S">S Kurtz</name>
</author>
<author><name sortKey="Phillippy, A" uniqKey="Phillippy A">A Phillippy</name>
</author>
<author><name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author><name sortKey="Smoot, M" uniqKey="Smoot M">M Smoot</name>
</author>
<author><name sortKey="Shumway, M" uniqKey="Shumway M">M Shumway</name>
</author>
<author><name sortKey="Antonescu, C" uniqKey="Antonescu C">C Antonescu</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author><name sortKey="Linderman, Md" uniqKey="Linderman M">MD Linderman</name>
</author>
<author><name sortKey="Sorenson, J" uniqKey="Sorenson J">J Sorenson</name>
</author>
<author><name sortKey="Lee, L" uniqKey="Lee L">L Lee</name>
</author>
<author><name sortKey="Nolan, Gp" uniqKey="Nolan G">GP Nolan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Barabasi, Al" uniqKey="Barabasi A">AL Barabasi</name>
</author>
<author><name sortKey="Oltvai, Zn" uniqKey="Oltvai Z">ZN Oltvai</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
<author><name sortKey="Boguski, Ms" uniqKey="Boguski M">MS Boguski</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
<author><name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kulikova, T" uniqKey="Kulikova T">T Kulikova</name>
</author>
<author><name sortKey="Aldebert, P" uniqKey="Aldebert P">P Aldebert</name>
</author>
<author><name sortKey="Althorpe, N" uniqKey="Althorpe N">N Althorpe</name>
</author>
<author><name sortKey="Baker, W" uniqKey="Baker W">W Baker</name>
</author>
<author><name sortKey="Bates, K" uniqKey="Bates K">K Bates</name>
</author>
<author><name sortKey="Browne, P" uniqKey="Browne P">P Browne</name>
</author>
<author><name sortKey="Van Den Broek, A" uniqKey="Van Den Broek A">A van den Broek</name>
</author>
<author><name sortKey="Cochrane, G" uniqKey="Cochrane G">G Cochrane</name>
</author>
<author><name sortKey="Duggan, K" uniqKey="Duggan K">K Duggan</name>
</author>
<author><name sortKey="Eberhardt, R" uniqKey="Eberhardt R">R Eberhardt</name>
</author>
<author><name sortKey="Faruque, N" uniqKey="Faruque N">N Faruque</name>
</author>
<author><name sortKey="Garcia Pastor, M" uniqKey="Garcia Pastor M">M Garcia-Pastor</name>
</author>
<author><name sortKey="Harte, N" uniqKey="Harte N">N Harte</name>
</author>
<author><name sortKey="Kanz, C" uniqKey="Kanz C">C Kanz</name>
</author>
<author><name sortKey="Leinonen, R" uniqKey="Leinonen R">R Leinonen</name>
</author>
<author><name sortKey="Lin, Q" uniqKey="Lin Q">Q Lin</name>
</author>
<author><name sortKey="Lombard, V" uniqKey="Lombard V">V Lombard</name>
</author>
<author><name sortKey="Lopez, R" uniqKey="Lopez R">R Lopez</name>
</author>
<author><name sortKey="Mancuso, R" uniqKey="Mancuso R">R Mancuso</name>
</author>
<author><name sortKey="Mchale, M" uniqKey="Mchale M">M McHale</name>
</author>
<author><name sortKey="Nardone, F" uniqKey="Nardone F">F Nardone</name>
</author>
<author><name sortKey="Silventoinen, V" uniqKey="Silventoinen V">V Silventoinen</name>
</author>
<author><name sortKey="Stoehr, P" uniqKey="Stoehr P">P Stoehr</name>
</author>
<author><name sortKey="Stoesser, G" uniqKey="Stoesser G">G Stoesser</name>
</author>
<author><name sortKey="Ann, M" uniqKey="Ann M">M Ann</name>
</author>
<author><name sortKey="Tzouvara, K" uniqKey="Tzouvara K">K Tzouvara</name>
</author>
<author><name sortKey="Vaughan, R" uniqKey="Vaughan R">R Vaughan</name>
</author>
<author><name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author><name sortKey="Zhu, W" uniqKey="Zhu W">W Zhu</name>
</author>
<author><name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Camon, E" uniqKey="Camon E">E Camon</name>
</author>
<author><name sortKey="Magrane, M" uniqKey="Magrane M">M Magrane</name>
</author>
<author><name sortKey="Barrell, D" uniqKey="Barrell D">D Barrell</name>
</author>
<author><name sortKey="Lee, V" uniqKey="Lee V">V Lee</name>
</author>
<author><name sortKey="Dimmer, E" uniqKey="Dimmer E">E Dimmer</name>
</author>
<author><name sortKey="Maslen, J" uniqKey="Maslen J">J Maslen</name>
</author>
<author><name sortKey="Binns, D" uniqKey="Binns D">D Binns</name>
</author>
<author><name sortKey="Harte, N" uniqKey="Harte N">N Harte</name>
</author>
<author><name sortKey="Lopez, R" uniqKey="Lopez R">R Lopez</name>
</author>
<author><name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gabriel, E" uniqKey="Gabriel E">E Gabriel</name>
</author>
<author><name sortKey="Fagg, G" uniqKey="Fagg G">G Fagg</name>
</author>
<author><name sortKey="Bosilca, G" uniqKey="Bosilca G">G Bosilca</name>
</author>
<author><name sortKey="Angskun, T" uniqKey="Angskun T">T Angskun</name>
</author>
<author><name sortKey="Dongarra, J" uniqKey="Dongarra J">J Dongarra</name>
</author>
<author><name sortKey="Squyres, J" uniqKey="Squyres J">J Squyres</name>
</author>
<author><name sortKey="Sahay, V" uniqKey="Sahay V">V Sahay</name>
</author>
<author><name sortKey="Kambadur, P" uniqKey="Kambadur P">P Kambadur</name>
</author>
<author><name sortKey="Barrett, B" uniqKey="Barrett B">B Barrett</name>
</author>
<author><name sortKey="Lumsdaine, A" uniqKey="Lumsdaine A">A Lumsdaine</name>
</author>
<author><name sortKey="Castain, R" uniqKey="Castain R">R Castain</name>
</author>
<author><name sortKey="Daniel, D" uniqKey="Daniel D">D Daniel</name>
</author>
<author><name sortKey="Graham, R" uniqKey="Graham R">R Graham</name>
</author>
<author><name sortKey="Woodall, T" uniqKey="Woodall T">T Woodall</name>
</author>
<author><name sortKey="Gabriel, E" uniqKey="Gabriel E">E Gabriel</name>
</author>
<author><name sortKey="Fagg, Ge" uniqKey="Fagg G">GE Fagg</name>
</author>
<author><name sortKey="Bosilca, G" uniqKey="Bosilca G">G Bosilca</name>
</author>
<author><name sortKey="Angskun, T" uniqKey="Angskun T">T Angskun</name>
</author>
<author><name sortKey="Dongarra, Jj" uniqKey="Dongarra J">JJ Dongarra</name>
</author>
<author><name sortKey="Squyres, Jm" uniqKey="Squyres J">JM Squyres</name>
</author>
<author><name sortKey="Sahay, V" uniqKey="Sahay V">V Sahay</name>
</author>
<author><name sortKey="Kambadur, P" uniqKey="Kambadur P">P Kambadur</name>
</author>
<author><name sortKey="Barrett, B" uniqKey="Barrett B">B Barrett</name>
</author>
<author><name sortKey="Lumsdaine, A" uniqKey="Lumsdaine A">A Lumsdaine</name>
</author>
<author><name sortKey="Castain, Rh" uniqKey="Castain R">RH Castain</name>
</author>
<author><name sortKey="Daniel, Dj" uniqKey="Daniel D">DJ Daniel</name>
</author>
<author><name sortKey="Graham, Rl" uniqKey="Graham R">RL Graham</name>
</author>
<author><name sortKey="Woodall, Ts" uniqKey="Woodall T">TS Woodall</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gropp, W" uniqKey="Gropp W">W Gropp</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kale, Lv" uniqKey="Kale L">LV Kale</name>
</author>
<author><name sortKey="Krishnan, S" uniqKey="Krishnan S">S Krishnan</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Genome Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">Genome Biol</journal-id>
<journal-title-group><journal-title>Genome Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1465-6906</issn>
<issn pub-type="epub">1465-6914</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">23259615</article-id>
<article-id pub-id-type="pmc">4056372</article-id>
<article-id pub-id-type="publisher-id">gb-2012-13-12-r122</article-id>
<article-id pub-id-type="doi">10.1186/gb-2012-13-12-r122</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Method</subject>
</subj-group>
</article-categories>
<title-group><article-title>Ray Meta: scalable <italic>de novo </italic>
metagenome assembly and profiling</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" corresp="yes" id="A1"><name><surname>Boisvert</surname>
<given-names>Sébastien</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>sebastien.boisvert.3@ulaval.ca</email>
</contrib>
<contrib contrib-type="author" id="A2"><name><surname>Raymond</surname>
<given-names>Frédéric</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>frederic.raymond@crchul.ulaval.ca</email>
</contrib>
<contrib contrib-type="author" id="A3"><name><surname>Godzaridis</surname>
<given-names>Élénie</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>elenie.godzaridis.1@ulaval.ca</email>
</contrib>
<contrib contrib-type="author" id="A4"><name><surname>Laviolette</surname>
<given-names>François</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>francois.laviolette@ift.ulaval.ca</email>
</contrib>
<contrib contrib-type="author" id="A5"><name><surname>Corbeil</surname>
<given-names>Jacques</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I4">4</xref>
<email>jacques.corbeil@crchul.ulaval.ca</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Infectious Diseases Research Center, CHUQ Research Center, 2705, boul. Laurier, Québec (Québec), G1V 4G2, Canada</aff>
<aff id="I2"><label>2</label>
Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</aff>
<aff id="I3"><label>3</label>
Department of Computer Science and Software Engineering, Faculty of Science and Engineering, Laval University, 1065, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</aff>
<aff id="I4"><label>4</label>
Department of Molecular Medicine, Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada</aff>
<pub-date pub-type="ppub"><year>2012</year>
</pub-date>
<pub-date pub-type="epub"><day>22</day>
<month>12</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<issue>12</issue>
<fpage>R122</fpage>
<lpage>R122</lpage>
<history><date date-type="received"><day>1</day>
<month>8</month>
<year>2012</year>
</date>
<date date-type="rev-recd"><day>19</day>
<month>11</month>
<year>2012</year>
</date>
<date date-type="accepted"><day>22</day>
<month>12</month>
<year>2012</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2012 Boisvert et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Boisvert et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://genomebiology.com/2012/13/12/R122"></self-uri>
<abstract><p>Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for <italic>de novo </italic>
assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at <ext-link ext-link-type="uri" xlink:href="http://denovoassembler.sf.net">http://denovoassembler.sf.net</ext-link>
.</p>
</abstract>
<kwd-group><kwd>metagenomics</kwd>
<kwd>message passing</kwd>
<kwd>scalability</kwd>
<kwd><italic>de novo </italic>
assembly</kwd>
<kwd>profiling</kwd>
<kwd>next-generation sequencing</kwd>
<kwd>parallel</kwd>
<kwd>distributed</kwd>
</kwd-group>
</article-meta>
</front>
<body><sec><title>Background</title>
<p>While voluminous datasets from high-throughput sequencing experiments have allowed new biological questions to emerge [<xref ref-type="bibr" rid="B1">1</xref>
,<xref ref-type="bibr" rid="B2">2</xref>
], the technology's speed and scalability are not yet matched by available analysis techniques and the gap between them has been steadily growing [<xref ref-type="bibr" rid="B3">3</xref>
,<xref ref-type="bibr" rid="B4">4</xref>
]. The de Bruijn graph is a structure for storing DNA words - or k-mers - that occur in sequence datasets [<xref ref-type="bibr" rid="B5">5</xref>
,<xref ref-type="bibr" rid="B6">6</xref>
]. Recent work showed that adding colors to a de Bruijn graph can allow variants to be called even in the absence of a complete genome reference [<xref ref-type="bibr" rid="B7">7</xref>
].</p>
<p>The field of metagenomics is concerned with the analysis of communities by sampling the DNA of all species in a given microbial community. The assembly of metagenomes poses greater and more complex challenges than single-genome assembly as the relative abundances of the species in a microbiome are not uniform [<xref ref-type="bibr" rid="B8">8</xref>
]. A compounding factor is the genetic diversity represented by polymorphisms and homologies between strains, which increases the difficulty of the problem for assemblers [<xref ref-type="bibr" rid="B8">8</xref>
]. Moreover, the underlying diversity of the sample increases its complexity and adds to the difficulties of assembly. Last but not least, DNA repeats can produce misassemblies [<xref ref-type="bibr" rid="B9">9</xref>
] in the absence of fine-tuned, accurate computational tools [<xref ref-type="bibr" rid="B10">10</xref>
].</p>
<p>The microbial diversity in microbiomes contains the promise of finding new genes with novel and interesting biological functions [<xref ref-type="bibr" rid="B11">11</xref>
]. While the throughput in metagenomics is increasing fast, bottlenecks in the analyses are becoming more apparent [<xref ref-type="bibr" rid="B12">12</xref>
], indicating that only equally parallel - and perhaps highly distributed - analysis systems can help bridge the scalability gap. Parallel sequencing requires parallel processing for bioprospecting and for making sense of otherwise largely unknown sequences.</p>
<p>Environmental microbiomes have been the subject of several large-scale investigations. Viral genome assemblies have been obtained from samples taken from hot springs [<xref ref-type="bibr" rid="B13">13</xref>
]. Metabolic profiling of microbial communities from Antarctica [<xref ref-type="bibr" rid="B14">14</xref>
] and the Arctic [<xref ref-type="bibr" rid="B15">15</xref>
] provided novel insights into the ecology of these communities. Furthermore, a new Archaea lineage was discovered in a hypersaline environment by means of metagenomic assembly [<xref ref-type="bibr" rid="B16">16</xref>
]. The metabolic capabilities of terrestrial and marine microbial communities have been compared [<xref ref-type="bibr" rid="B17">17</xref>
]. The structure of communities in the environment has been reconstructed [<xref ref-type="bibr" rid="B18">18</xref>
]. All these studies show that environmental microbiomes are reservoirs of genetic novelty [<xref ref-type="bibr" rid="B19">19</xref>
], which bioprospecting aims at discovering.</p>
<p>Through metagenomic analysis, the interplay between host and commensal microbial metabolic activity can be studied, promising to shed light on its role in maintaining human health. Furthermore, precisely profiling the human microbial and viral flora at different taxonomic levels as well as functional profiling may hint at improved new therapeutic options [<xref ref-type="bibr" rid="B20">20</xref>
]. To that end, the human distal gut microbiome of two healthy adults was analyzed by DNA sequencing [<xref ref-type="bibr" rid="B21">21</xref>
], and subsequently the human gut microbiome of 124 European individuals was analyzed by DNA sequencing from fecal samples by the MetaHIT consortium [<xref ref-type="bibr" rid="B22">22</xref>
]. Another study proposed that there are three stable, location-independent, gut microbiome enterotypes [<xref ref-type="bibr" rid="B23">23</xref>
]. Finally, the structure, function and diversity of the healthy human microbiome were investigated by the Human Microbiome Project Consortium [<xref ref-type="bibr" rid="B24">24</xref>
].</p>
<p>With 16S rRNA gene sequencing, species representation can be extracted by taxonomic profiling [<xref ref-type="bibr" rid="B25">25</xref>
]. However, using more than one marker gene produces better taxonomic profiles [<xref ref-type="bibr" rid="B26">26</xref>
,<xref ref-type="bibr" rid="B27">27</xref>
]. Furthermore, a taxonomy based on phylogenetic analyses helps in the process of taxonomic profiling [<xref ref-type="bibr" rid="B28">28</xref>
]. While taxonomic profiles are informative, functional profiling is also required to understand the biology of a system. To that end, gene ontology [<xref ref-type="bibr" rid="B29">29</xref>
] can assign normalized functions to data.</p>
<p>Although not designed for metagenomes, distributed software for single genomes, such as ABySS [<xref ref-type="bibr" rid="B30">30</xref>
] and Ray [<xref ref-type="bibr" rid="B31">31</xref>
], illustrate how leveraging high-performance and parallel computing could greatly speed up the analysis of the large amount of data generated by metagenome projects. Notably, sophisticated parallel tools are easily deployed on cloud computing infrastructures [<xref ref-type="bibr" rid="B32">32</xref>
] or on national computing infrastructures through their use of a cross-platform, scalable method called the message-passing interface.</p>
<p>Taxonomic profiling methods utilize alignments [<xref ref-type="bibr" rid="B26">26</xref>
,<xref ref-type="bibr" rid="B27">27</xref>
,<xref ref-type="bibr" rid="B33">33</xref>
-<xref ref-type="bibr" rid="B36">36</xref>
]or hidden Markov models [<xref ref-type="bibr" rid="B37">37</xref>
] or both[<xref ref-type="bibr" rid="B38">38</xref>
]. Few methods are available for metagenome <italic>de novo </italic>
assembly (MetaVelvet [<xref ref-type="bibr" rid="B39">39</xref>
], Meta-IDBA [<xref ref-type="bibr" rid="B40">40</xref>
] and Genovo [<xref ref-type="bibr" rid="B41">41</xref>
]), none couples taxonomic and ontology profiling with <italic>de novo </italic>
assembly, and none is distributed to provide scalability. Furthermore, none of the existing methods for <italic>de novo </italic>
metagenome assembly distributes memory utilization over more than one compute machine. This additional difficulty plagues current metagenome assembly approaches.</p>
<p>The field of metagenomic urgently needs distributed and scalable processing methods to tackle efficiently the size of samples and the assembly and profiling challenges that this poses. Herein we show that Ray Meta, a distributed processing application, is suited for metagenomics. We present results obtained by <italic>de novo </italic>
metagenome assembly with coupled profiling. With Ray Meta, we show that the method scales for two metagenomes simulated to incorporate sequencing errors: a 100-genome metagenome assembled from 400 × 10<sup>6 </sup>
101-nucleotide reads and a 1,000-genome metagenome assembled from 3 × 10<sup>9 </sup>
100-nucleotide reads. Ray Communities utilizes bacterial genomes to color the assembled de Bruijn graph. The Greengenes taxonomy [<xref ref-type="bibr" rid="B28">28</xref>
] was utilized to obtain the profiles from colored k-mers. Other taxonomies, such as the NCBI taxonomy, can be substituted readily. We also present results obtained by <italic>de novo </italic>
metagenome assembly and taxonomic and functional profiling of 124 gut microbiomes. We compared Ray Meta to MetaVelvet and validated Ray Communities with MetaPhlAn taxonomic profiles.</p>
</sec>
<sec sec-type="results"><title>Results</title>
<sec><title>Scalability</title>
<p>In order to assess the scalability of Ray Meta, we simulated two large datasets. Although a simulation does not capture all genetic variations (and associated complexity) occurring in natural microbial populations, it is a way to validate the correctness of assemblies produced by Ray Meta and the abundances predicted by Ray Communities. The first dataset contained 400 × 10<sup>6 </sup>
reads, with 1% as human contamination. The remaining reads were distributed across 100 bacterial genomes selected randomly from GenBank. The read length was 101 nucleotides, the substitution error rate was 0.25% and reads were paired. Finally, the proportion of bacterial genomes followed a power law (with exponent -0.5) to mimic what is found in nature (see the section on Materials and methods). The number of reads for this 100-genome metagenome roughly corresponds to the number of reads generated by one lane of an Illumina HiSeq 2000 flow cell (Illumina, Inc.). Table S1 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
 lists the number of reads for each bacterial genome and for the human genome. This dataset was assembled by Ray Meta using 128 processor cores in 13 hours, 26 minutes, with an average memory usage of 2 GB per core. The resulting assembly contained 22,162 contigs with at least 100 nucleotides and had an N50 of 152,891. The sum of contig lengths was 345,945,478 nucleotides. This is 93% of the sum of bacterial genome lengths, which was 371,623,377 nucleotides. Therefore, on average there were 3,459,454 assembled nucleotides and 221 contigs per bacterial genome, assuming that the bacterial genomes were roughly of the same size and same complexity and that the coverage depth was not sufficient to assemble incorporated human contamination. Using the known reference sequences, we validated the assembly using MUMmer to assess the quality. There were 11,220 contigs with at least 500 nucleotides. Among these, 152 had misassemblies (1.35%). Any contig that did not align as one single maximum unique match with a breadth of coverage of at least 98.0% was marked as misassembled. The number of mismatches was 1,108 while the number of insertions or deletions was 597.</p>
<p>To further investigate the scalability of our approach for <italic>de novo </italic>
metagenome assembly, we simulated a second metagenome. This one contained 1,000 bacterial genomes randomly selected from GenBank as well as 1% of human sequence contamination. The proportion of the 1,000 bacterial genomes was distributed according to a power law (with exponent -0.3) and the number of reads was 3 × 10<sup>9 </sup>
(Table S2 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
). This number of reads is currently generated by one Illumina HiSeq 2000 flow cell (Illumina, Inc.). This second dataset, which is larger, was assembled <italic>de novo </italic>
by Ray Meta in 15 hours, 46 minutes using 1,024 processor cores with an average memory usage of 1.5 GB per core. It contained 974,249 contigs with at least 100 nucleotides; N50 was 76,095 and the sum of the contig lengths was 2,894,058,833, or 80% of the sum of bacterial genome lengths (3,578,300,288 nucleotides). Assuming a uniform distribution of assembled bases and contigs and that human sequence coverage depth was not sufficient for its <italic>de novo </italic>
assembly, there were, on average, 974 contigs and 2,894,058 nucleotides per bacterial genome. To validate whether or not the produced contigs were of good quality, we compared them to the known references. There were 196,809 contigs with at least 500 nucleotides. Of these, 2,638 were misassembled (1.34%) according to a very stringent test. There were 59,856 mismatches and 13,122 insertions or deletions.</p>
<p>Next, we sought to quantify the breadth of assembly for the bacterial genomes in the 1,000-genome dataset. In other words, the assembled percentage was calculated for each genome present in the 1,000-genome metagenome. Many of these bacterial genomes had a breadth of coverage (in the <italic>de novo </italic>
assembly) greater than 95% (Figure <xref ref-type="fig" rid="F1">1</xref>
).</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p><bold>Assembled proportions of bacterial genomes for a simulated metagenome with sequencing errors</bold>
. 3 × 10<sup>9 </sup>
100-nucleotide reads were simulated with sequencing errors (0.25%) from a simulated metagenome containing 1,000 bacterial genomes with proportions following a power law. Having 1,000 genomes with power law proportions makes it impossible to classify sequences with their coverage. This large metagenomic dataset was assembled using distributed de Bruijn graphs and profiled with colored de Bruijn graphs. Highly similar, but different genomes, are likely to be hard to assemble. This figure shows the proportion of each genome that was assembled <italic>de novo </italic>
within the metagenome. Of the bacterial genomes, 88.2% were assembled with a breadth of coverage of at least 80.0%.</p>
</caption>
<graphic xlink:href="gb-2012-13-12-r122-1"></graphic>
</fig>
</sec>
<sec><title>Estimating bacterial proportions</title>
<p>Another problem that can be solved with de Bruijn graphs is estimating the genome nucleotide proportion within a metagenome. Using Ray Communities, the 100-genome and 1,000-genome datasets <italic>de novo </italic>
assembled de Bruijn graphs were colored using all sequenced bacterial genomes (Table S4 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
) in order to identify contigs and to estimate bacterial proportions in the datasets. Ray Communities estimates proportions by demultiplexing k-mer coverage depth in the distributed de Bruijn graph (see the section on Demultiplexing signals from similar bacterial strains in Materials and methods). Because coloring occurs after <italic>de novo </italic>
assembly has completed, the reference sequences are not needed for assembling metagenomes.</p>
<p>For the 100-genome dataset, only two bacterial genome proportions were not estimated correctly. The first was due to a duplicate in GenBank and the second to two almost identical genomes (Figure <xref ref-type="fig" rid="F2">2A</xref>
). When two identical genomes are provided as a basis to color the de Bruijn graph, no k-mer is uniquely colored for any of these two genomes, and identifying k-mers cannot be found through demultiplexing. This can be solved by using a taxonomy, which allows reference genomes to be similar or identical.</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p><bold>Estimated bacterial genome proportions</bold>
. For the two simulated metagenomes (100 and 1,000 bacterial genomes, respectively), colored de Bruijn graphs were utilized to estimate the nucleotide proportion of each bacterial genome in its containing metagenome. Genome proportions in metagenomes followed a power law. Black lines show the expected nucleotide proportion for bacterial genomes while blue points represent proportions measured by colored de Bruijn graphs. <bold>(A) </bold>
For the 100-genome metagenome, only two bacterial genomes were not correctly measured (2.0%), namely <italic>Methanococcus maripaludis </italic>
X1 and <italic>Serratia </italic>
AS9. <italic>Methanococcus maripaludis </italic>
X1 was not detected because it was duplicated in the dataset as <italic>Methanococcus maripaludis </italic>
XI, thus providing zero uniquely colored k-mers. <italic>Serratia </italic>
AS9 was not detected because it shares almost all its k-mers with <italic>Serratia </italic>
AS12. <bold>(B) </bold>
For the 1,000-genome metagenome, 4 bacterial genomes were overestimated (0.4%) while 20 were underestimated (2.0%). These errors were due to highly similar bacterial genomes, hence they did not provide uniquely colored k-mers. This problem can be alleviated either by using a curated set of reference genomes or by using a taxonomy. The remaining 976 bacterial genomes had a measured proportion near the expected value.</p>
</caption>
<graphic xlink:href="gb-2012-13-12-r122-2"></graphic>
</fig>
<p>In the 1,000-genome dataset, four bacterial genome proportions were overestimated and 20 were underestimated (Figure <xref ref-type="fig" rid="F2">2B</xref>
). In both the 100-genome and 1,000-genome datasets, the proportion of bacterial genomes with incorrect estimates was 2.0%. In both of these, the incorrect estimates were caused by either duplicated genomes, identical genomes or highly similar genomes. The use of a taxonomy alleviates this problem.</p>
<p>The results with the 100-genome and 1,000-genome datasets show that our method can recover bacterial genome proportions when the genome sequences are known. In real microbiome systems, there is a sizable proportion of unknown bacterial species. For this reason, it is important to devise a system that can also accommodate unknown species by using a taxonomy, which allows the classification to occur at higher levels - such as phylum or genus instead of species.</p>
</sec>
<sec><title>Metagenome de novo assembly of real datasets</title>
<p>Here, we present results for 124 fecal samples from a previous study [<xref ref-type="bibr" rid="B22">22</xref>
]. From the 124 samples, 85 were from Denmark (all annotated as being healthy) and 39 were from Spain (14 were healthy, 21 had ulcerative colitis and 4 had Crohn's disease). Each metagenome was assembled independently (Table S3 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
) and the resulting distributed de Bruijn graphs were colored to obtain taxonomic and gene ontology profiles (see Materials and methods and Table S4 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
).</p>
<p>These samples contained paired 75-nucleotide and/or 44-nucleotide reads obtained with Illumina Genome Analyzer sequencers. In about 5 hours, 122 samples were assembled (and profiled) using 32 processor cores and the two remaining samples, namely MH0012 and MH0014, were assembled (and profiled) with 48 and 40 processor cores, respectively (Table S3 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
). These runtime figures include <italic>de novo </italic>
assembly, graph coloring, signal demultiplexing and taxonomic and gene ontology profiling, which are all tightly coupled in the process. In the next section, taxonomic profiles are presented for these 124 gut microbiome samples.</p>
</sec>
<sec><title>Taxonomic profiling</title>
<p>In metagenomic projects, the bacterial genomes that occur in the sample may be unknown at the species level. However, it is possible to profile these samples using a taxonomy. The key concept is to classify colored k-mers in a taxonomy tree: a k-mer is moved to a higher taxon as long as many taxons have the k-mer so it can be classified as the nearest common ancestor of the taxons. For example if a k-mer is not classified at the species level, it can be classified at the genus level and so on. Furthermore, taxonomy profiling does not suffer from similarity issues as seen for proportions present in samples because k-mers can be classified in higher taxons when necessary.</p>
<p>Accordingly, k-mers shared by several bacterial species cannot be assigned to one of them accurately. For this reason, the Greengenes taxonomy [<xref ref-type="bibr" rid="B28">28</xref>
] (version 2011_11) was utilized to classify each colored k-mer in a single taxon with its taxonomic rank being one of the following: kingdom, phylum, class, order, family, genus or species. For each sample, abundances were computed at each taxonomic rank. At the moment, the most recent and accurate taxonomy for profiling taxons in a metagenome is Greengenes [<xref ref-type="bibr" rid="B28">28</xref>
]. We profiled taxons in the 124 gut microbiome samples using this taxonomy. We also incorporated the human genome into this taxonomy to profile the human abundance in the process. At the phylum level, the two most abundant taxons were Firmicutes and Bacteroidetes (Figure <xref ref-type="fig" rid="F3">3A</xref>
). The profile of the phylum Chordata indicated that two samples contained significantly more human sequences than the average (Figure <xref ref-type="fig" rid="F3">3A</xref>
). The most abundant genera in the 124 samples were <italic>Bacteroides </italic>
and <italic>Prevotella </italic>
(Figure <xref ref-type="fig" rid="F3">3B</xref>
). The taxon <italic>Bacteroides </italic>
is reported more than once because several taxons had this name with a different ancestry in the Greengenes taxonomy. The genera <italic>Prevotella </italic>
and <italic>Butyrivibrio </italic>
had numerous samples with higher counts, indicating that the data are bi-modal (Figure <xref ref-type="fig" rid="F3">3B</xref>
). The genus <italic>Homo </italic>
had two samples with significantly more abundance (Figure <xref ref-type="fig" rid="F3">3B</xref>
).</p>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p><bold>Fast and efficient taxonomic profiling with distributed colored de Bruijn graphs</bold>
. From a previous study, 124 metagenomic samples containing short paired reads were assembled <italic>de novo </italic>
and profiled for taxons. The graph coloring occurred once the de Bruijn graph was assembled <italic>de novo</italic>
. <bold>(A) </bold>
The taxonomic profiles are shown for the phylum level. The two most abundant phyla were Firmicutes and Bacteroidetes. This is in agreement with the literature [<xref ref-type="bibr" rid="B22">22</xref>
]. The abundance of human sequences was also measured. The phylum Chordata had two outlier samples. This indicates that two of the samples had more human sequences than the average, which may bias results. <bold>(B) </bold>
At the genus level, the most abundant taxon was <italic>Bacteroides</italic>
. This taxon occurred more than once because it was present at different locations within the Greengenes taxonomic tree. Also abundant is the genus <italic>Prevotella</italic>
. Furthermore, the later had numerous samples with higher counts, which may help in non-parametric clustering. Two samples had higher abundance of human sequences, as indicated by the abundance of the genus <italic>Homo</italic>
.</p>
</caption>
<graphic xlink:href="gb-2012-13-12-r122-3"></graphic>
</fig>
</sec>
<sec><title>Grouping abundance profiles</title>
<p>It has been proposed that the composition of the human gut microbiome of an individual can be classified as one of three enterotypes [<xref ref-type="bibr" rid="B23">23</xref>
]. We profiled genera for each of the 124 gut microbiome samples to reproduce these three enterotypes. The 124 samples (85 from Denmark and 39 from Spain) were analyzed using the two most important principal components (Figure <xref ref-type="fig" rid="F4">4</xref>
; see Materials and methods). Two clear clusters are visible, one enriched for the genus <italic>Bacteroides </italic>
and one for the genus <italic>Prevotella</italic>
. A continuum between two enterotypes has also been reported recently [<xref ref-type="bibr" rid="B42">42</xref>
].</p>
<fig id="F4" position="float"><label>Figure 4</label>
<caption><p><bold>Principal component analysis shows two clusters</bold>
. Principal component analysis (see Materials and methods) with abundances at the genus level yielded two distinct clusters. Abundances were obtained with colored de Bruijn graphs. One was enriched in the genus <italic>Bacteroides </italic>
while the other was enriched in the genus <italic>Prevotella</italic>
. Principal component 1 was linearly correlated with the genus <italic>Prevotella </italic>
while principal component 2 was linearly correlated with the genus <italic>Bacteroides</italic>
. This analysis suggests that there is a continuum between the two abundant genera <italic>Bacteroides </italic>
and <italic>Prevotella</italic>
. This interpretation differs from the original publication in which three human gut enterotypes were reported [<xref ref-type="bibr" rid="B23">23</xref>
]. More recently, it has been proposed that there are only two enterotypes and individuals are distributed in a continuum between the two [<xref ref-type="bibr" rid="B42">42</xref>
].</p>
</caption>
<graphic xlink:href="gb-2012-13-12-r122-4"></graphic>
</fig>
</sec>
<sec><title>Profiling of ontology terms</title>
<p>Gene ontology is a hierarchical classification of normalized terms in three independent domains: biological process, cellular component and molecular function. Some biological datasets are annotated with gene ontology. Here, we used gene ontology to profile the 124 metagenome samples based on a distributed colored de Bruijn graph (see Materials and methods). First, abundances for biological process terms were obtained (Figure <xref ref-type="fig" rid="F5">5A</xref>
). The two most abundant terms were metabolic process and transport. The terms oxidation-reduction process and DNA recombination had numerous sample outliers, which indicates that these samples had different biological complexity for these terms (Figure <xref ref-type="fig" rid="F5">5A</xref>
). Next, we sought to profile cellular component terms in the samples. The most abundant term was membrane, followed by cytoplasm, integral to membrane and plasma membrane. This redundancy is due to the hierarchical structure of gene ontology (Figure <xref ref-type="fig" rid="F5">5B</xref>
). Finally, we measured the abundance for molecular function terms. The most abundant was ATP binding, which had no outliers. The term DNA binding was also abundant. However, the latter had outlier samples (Figure <xref ref-type="fig" rid="F5">5C</xref>
).</p>
<fig id="F5" position="float"><label>Figure 5</label>
<caption><p><bold>Ontology profiling with colored de Bruijn graphs</bold>
. Gene ontology profiles were obtained by coloring of the graph resulting from <italic>de novo </italic>
assembly. Gene ontology has three domains: biological process, cellular component and molecular function. For each domain, only the 15 most abundant terms are displayed. <bold>(A) </bold>
Ontology terms in the biological process domain were profiled. Some of these have several outlier samples, namely oxidation-reduction process and DNA recombination. <bold>(B) </bold>
Ontology profiling for cellular component terms is shown. The most abundant is the membrane term. <bold>(C) </bold>
The profile for molecular function terms is shown. Binding functions are the most abundant with ATP binding, nucleotide binding and DNA binding in the top three. Next is catalytic activity, which is a general term. More specific catalytic activities are listed.</p>
</caption>
<graphic xlink:href="gb-2012-13-12-r122-5"></graphic>
</fig>
</sec>
<sec><title>Comparison of assemblies</title>
<p>Three samples from the MetaHIT Consortium [<xref ref-type="bibr" rid="B22">22</xref>
] - MH0006 (ERS006497), MH0012 (ERS006494) and MH0047 (ERS006592) - and three samples from the Human Microbiome Project Consortium [<xref ref-type="bibr" rid="B24">24</xref>
] - SRS011098, SRS017227 and SRS018661 - were assembled with MetaVelvet [<xref ref-type="bibr" rid="B39">39</xref>
] and Ray Meta to draw a comparison. Assembly metrics are displayed in Table <xref ref-type="table" rid="T1">1</xref>
. The average length is higher for MetaVelvet for samples ERS006494 and ERS006592. For the other samples, the average length is higher for Ray Meta. The N50 length is higher for Ray Meta for all samples. For all samples but ERS006497, the total length is higher for Ray Meta. Although we assembled the 124 samples from [<xref ref-type="bibr" rid="B22">22</xref>
] and 313 samples (out of 764) from the Human Microbiome Project [<xref ref-type="bibr" rid="B24">24</xref>
] with Ray Meta on supercomputers composed of nodes with little memory (24 GB), we only assembled a few samples with MetaVelvet because a single MetaVelvet assembly requires exclusive access to a single computer with a large amount of available memory (at least 128 GB). Ray Meta produced longer contigs and more bases for these six samples. The shared content of assemblies produced by MetaVelvet and Ray Meta is shown in Table <xref ref-type="table" rid="T1">1</xref>
. A majority of sequences assembled by MetaVelvet and Ray Meta are shared. As metagenomic experiments will undoubtedly become more complex, Ray Meta will gain a distinct advantage owing to its distributed implementation.</p>
<table-wrap id="T1" position="float"><label>Table 1</label>
<caption><p>Comparison of assemblies produced by MetaVelvet and Ray Meta</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th></th>
<th align="left">MetaVelvet</th>
<th align="left">Ray Meta</th>
<th align="left">Shared</th>
</tr>
</thead>
<tbody><tr><td align="left">ERS006494</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr><td align="left">Reads</td>
<td align="left" colspan="2">372,147,956</td>
<td></td>
</tr>
<tr><td align="left">Scaffolds<sup>a</sup>
</td>
<td align="left">50,136</td>
<td align="left">56,363</td>
<td></td>
</tr>
<tr><td align="left">Total length (nt)</td>
<td align="left">150,904,880</td>
<td align="left">156,075,852</td>
<td align="left">130,979,321</td>
</tr>
<tr><td align="left">Average length (nt)</td>
<td align="left">3,009</td>
<td align="left">2,769</td>
<td></td>
</tr>
<tr><td align="left">N50 length (nt)</td>
<td align="left">6,141</td>
<td align="left">12,117</td>
<td></td>
</tr>
<tr><td align="left">Longest length (nt)</td>
<td align="left">146,549</td>
<td align="left">570,359</td>
<td></td>
</tr>
<tr><td align="left">ERS006497</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr><td align="left">Reads</td>
<td align="left" colspan="2">322,444,920</td>
<td></td>
</tr>
<tr><td align="left">Scaffolds<sup>a</sup>
</td>
<td align="left">61,093</td>
<td align="left">52,194</td>
<td></td>
</tr>
<tr><td align="left">Total length (nt)</td>
<td align="left">113,403,805</td>
<td align="left">111,187,163</td>
<td align="left">94,649,612</td>
</tr>
<tr><td align="left">Average length (nt)</td>
<td align="left">1,856</td>
<td align="left">2,130</td>
<td></td>
</tr>
<tr><td align="left">N50 length (nt)</td>
<td align="left">2,778</td>
<td align="left">5,430</td>
<td></td>
</tr>
<tr><td align="left">Longest length (nt)</td>
<td align="left">115,684</td>
<td align="left">430,963</td>
<td></td>
</tr>
<tr><td align="left">Running time (h:min)</td>
<td align="left">4:34</td>
<td align="left">10:06</td>
<td></td>
</tr>
<tr><td align="left">ERS006592</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr><td align="left">Reads</td>
<td align="left" colspan="2">53,869,960</td>
<td></td>
</tr>
<tr><td align="left">Scaffolds<sup>a</sup>
</td>
<td align="left">4,358</td>
<td align="left">9,387</td>
<td></td>
</tr>
<tr><td align="left">Total length (nt)</td>
<td align="left">19,501,348</td>
<td align="left">24,687,275</td>
<td align="left">18,061,386</td>
</tr>
<tr><td align="left">Average length (nt)</td>
<td align="left">4,474</td>
<td align="left">2,629</td>
<td></td>
</tr>
<tr><td align="left">N50 length (nt)</td>
<td align="left">8,819</td>
<td align="left">10,277</td>
<td></td>
</tr>
<tr><td align="left">Longest length (nt)</td>
<td align="left">87,983</td>
<td align="left">137,473</td>
<td></td>
</tr>
<tr><td align="left">Running time (h:min)</td>
<td align="left">0:41</td>
<td align="left">4:28</td>
<td></td>
</tr>
<tr><td align="left">SRS011098</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr><td align="left">Reads</td>
<td align="left" colspan="2">202,487,723</td>
<td></td>
</tr>
<tr><td align="left">Scaffolds<sup>a</sup>
</td>
<td align="left">30,458</td>
<td align="left">36,130</td>
<td></td>
</tr>
<tr><td align="left">Total length (nt)</td>
<td align="left">60,574,679</td>
<td align="left">83,736,387</td>
<td align="left">51,938,031</td>
</tr>
<tr><td align="left">Average length (nt)</td>
<td align="left">1,988</td>
<td align="left">2,317</td>
<td></td>
</tr>
<tr><td align="left">N50 length (nt)</td>
<td align="left">3,117</td>
<td align="left">4,961</td>
<td></td>
</tr>
<tr><td align="left">Longest length (nt)</td>
<td align="left">192,898</td>
<td align="left">222,213</td>
<td></td>
</tr>
<tr><td align="left">Running time (h:min)</td>
<td align="left">8:34</td>
<td align="left">6:38</td>
<td></td>
</tr>
<tr><td align="left">SRS017227</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr><td align="left">Reads</td>
<td align="left" colspan="2">139,002,751</td>
<td></td>
</tr>
<tr><td align="left">Scaffolds<sup>a</sup>
</td>
<td align="left">106,957</td>
<td align="left">89,953</td>
<td></td>
</tr>
<tr><td align="left">Total length (nt)</td>
<td align="left">171,200,737</td>
<td align="left">186,958,660</td>
<td align="left">126,068,148</td>
</tr>
<tr><td align="left">Average length (nt)</td>
<td align="left">1,600</td>
<td align="left">2,078</td>
<td></td>
</tr>
<tr><td align="left">N50 length (nt)</td>
<td align="left">2,168</td>
<td align="left">3,771</td>
<td></td>
</tr>
<tr><td align="left">Longest length (nt)</td>
<td align="left">102,749</td>
<td align="left">224,709</td>
<td></td>
</tr>
<tr><td align="left">Running time (h:min)</td>
<td align="left">9:00</td>
<td align="left">7:10</td>
<td></td>
</tr>
<tr><td align="left">SRS018661</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr><td align="left">Reads</td>
<td align="left" colspan="2">288,475,194</td>
<td></td>
</tr>
<tr><td align="left">Scaffolds<sup>a</sup>
</td>
<td align="left">30,709</td>
<td align="left">18,541</td>
<td></td>
</tr>
<tr><td align="left">Total length (nt)</td>
<td align="left">35,281,226</td>
<td align="left">36,891,130</td>
<td align="left">21,659,465</td>
</tr>
<tr><td align="left">Average length (nt)</td>
<td align="left">1,148</td>
<td align="left">1,989</td>
<td></td>
</tr>
<tr><td align="left">N50 length (nt)</td>
<td align="left">1,223</td>
<td align="left">3,794</td>
<td></td>
</tr>
<tr><td align="left">Longest length (nt)</td>
<td align="left">111,404</td>
<td align="left">377,149</td>
<td></td>
</tr>
<tr><td align="left">Running time (h:min)</td>
<td align="left">1:24</td>
<td align="left">4:42</td>
<td></td>
</tr>
</tbody>
</table>
<table-wrap-foot><p><sup>a</sup>
Only scaffolds with a length higher or equal to 500 were considered. nt, nucleotide.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec><title>Validation of taxonomic profiling</title>
<p>We compared Ray Communities to MetaPhlAn in order to validate our methodology. Taxonomic profiles for 313 samples (Additional file <xref ref-type="supplementary-material" rid="S2">2</xref>
) from the Human Microbiome Project [<xref ref-type="bibr" rid="B24">24</xref>
] were generated with Ray Communities and compared to those of MetaPhlAn [<xref ref-type="bibr" rid="B27">27</xref>
]. The correlations are shown in Table <xref ref-type="table" rid="T2">2</xref>
 for various body sites. Correlations are high - for instance the correlations for buccal mucosa (46 samples) were 0.99, 0.98, 0.97, 0.98, 0.95 and 0.91 for the ranks phylum, class, order, family, genus and species, respectively. These results indicate that Ray Communities has an accuracy similar to that of MetaPhlAn [<xref ref-type="bibr" rid="B27">27</xref>
], which was utilized by the Human Microbiome Project Consortium [<xref ref-type="bibr" rid="B24">24</xref>
]. The correlation at the genus rank for the site anterior nares was poor (0.59) because MetaPhlAn classified a high number of reads in the genus <italic>Propionibacterium </italic>
thus yielding a very high abundance while the number of k-mer observations classified this way by Ray Communities was more moderate. For the body site called stool, the correlation at the family rank was weak (0.62) because MetaPhlAn utilizes the NCBI taxonomy whereas Ray Communities utilizes the Greengenes taxonomy, which has been shown to be more accurate [<xref ref-type="bibr" rid="B28">28</xref>
]. Overall, these results indicate that Ray Communities yields accurate taxonomic abundances using a colored de Bruijn graph.</p>
<table-wrap id="T2" position="float"><label>Table 2</label>
<caption><p>Correlation of taxonomic abundances produced by MetaPhlAn and Ray Communities</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Body site</th>
<th align="left">Samples</th>
<th align="left">Phylum</th>
<th align="left">Class</th>
<th align="left">Order</th>
<th align="left">Family</th>
<th align="left">Genus</th>
<th align="left">Species</th>
</tr>
</thead>
<tbody><tr><td align="left">Anterior nares</td>
<td align="left">45</td>
<td align="left">0.91</td>
<td align="left">0.92</td>
<td align="left">0.94</td>
<td align="left">0.94</td>
<td align="left">0.59</td>
<td align="left">0.59</td>
</tr>
<tr><td align="left">Attached keratinized gingival</td>
<td align="left">3</td>
<td align="left">0.99</td>
<td align="left">0.94</td>
<td align="left">0.92</td>
<td align="left">0.94</td>
<td align="left">0.84</td>
<td align="left">0.71</td>
</tr>
<tr><td align="left">Buccal mucosa</td>
<td align="left">46</td>
<td align="left">0.99</td>
<td align="left">0.98</td>
<td align="left">0.97</td>
<td align="left">0.98</td>
<td align="left">0.95</td>
<td align="left">0.91</td>
</tr>
<tr><td align="left">Left retroauricular crease</td>
<td align="left">3</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.72</td>
<td align="left">0.83</td>
</tr>
<tr><td align="left">Mid vagina</td>
<td align="left">1</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.90</td>
</tr>
<tr><td align="left">Palatine tonsils</td>
<td align="left">4</td>
<td align="left">0.90</td>
<td align="left">0.80</td>
<td align="left">0.79</td>
<td align="left">0.83</td>
<td align="left">0.84</td>
<td align="left">0.97</td>
</tr>
<tr><td align="left">Posterior fornix</td>
<td align="left">23</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.97</td>
<td align="left">0.94</td>
</tr>
<tr><td align="left">Right retroauricular crease</td>
<td align="left">6</td>
<td align="left">0.94</td>
<td align="left">0.92</td>
<td align="left">0.93</td>
<td align="left">0.94</td>
<td align="left">0.83</td>
<td align="left">0.91</td>
</tr>
<tr><td align="left">Saliva</td>
<td align="left">3</td>
<td align="left">0.97</td>
<td align="left">0.87</td>
<td align="left">0.88</td>
<td align="left">0.96</td>
<td align="left">0.89</td>
<td align="left">0.95</td>
</tr>
<tr><td align="left">Stool</td>
<td align="left">61</td>
<td align="left">0.80</td>
<td align="left">0.81</td>
<td align="left">0.81</td>
<td align="left">0.62</td>
<td align="left">0.92</td>
<td align="left">0.84</td>
</tr>
<tr><td align="left">Subgingival plaque</td>
<td align="left">5</td>
<td align="left">0.86</td>
<td align="left">0.75</td>
<td align="left">0.76</td>
<td align="left">0.74</td>
<td align="left">0.81</td>
<td align="left">0.93</td>
</tr>
<tr><td align="left">Supragingival plaque</td>
<td align="left">53</td>
<td align="left">0.94</td>
<td align="left">0.93</td>
<td align="left">0.92</td>
<td align="left">0.88</td>
<td align="left">0.89</td>
<td align="left">0.93</td>
</tr>
<tr><td align="left">Throat</td>
<td align="left">6</td>
<td align="left">0.95</td>
<td align="left">0.86</td>
<td align="left">0.87</td>
<td align="left">0.92</td>
<td align="left">0.92</td>
<td align="left">0.80</td>
</tr>
<tr><td align="left">Tongue dorsum</td>
<td align="left">53</td>
<td align="left">0.93</td>
<td align="left">0.80</td>
<td align="left">0.79</td>
<td align="left">0.84</td>
<td align="left">0.85</td>
<td align="left">0.88</td>
</tr>
<tr><td align="left">Vaginal introitus</td>
<td align="left">1</td>
<td align="left">1.00</td>
<td align="left">1.00</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.97</td>
</tr>
<tr><td align="left">Total</td>
<td align="left">313</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Pearson's correlation was utilized to compare taxonomic abundance for 313 samples from various body sites [<xref ref-type="bibr" rid="B24">24</xref>
].</p>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec sec-type="discussion"><title>Discussion</title>
<sec><title>Message passing</title>
<p>Ray Meta is a method for scalable distributed <italic>de novo </italic>
metagenome assembly whereas MetaVelvet runs only on a single computer. Therefore, fetching data with MetaVelvet is fast because only memory accesses occur. On the other hand, Ray Meta runs on many computers. Although this is a benefit at first sight, using many computers requires messages to be sent back and forth in order to fetch data. We used 8 nodes totaling 64 processor cores (8 processor cores per node) for Human Microbiome Project samples and the observed point-to-point latency (within our application, not the hardware latency) was around 37 microseconds - this is much more than the 100 nanoseconds required for main memory accesses. However, by minimizing messages, RayMeta runs in an acceptable time and has a scalability unmatched by MetaVelvet while providing superior assemblies (Table <xref ref-type="table" rid="T1">1</xref>
).</p>
</sec>
<sec><title>From Ray to Ray Meta</title>
<p>For single genomes, peak coverage is required by Ray in the k-mer coverage distribution [<xref ref-type="bibr" rid="B31">31</xref>
]. This is not the case for Ray Meta. Moreover, in Ray for single genomes, read markers are selected using the peak coverage and minimum coverage. This process is local to each read path in Ray Meta. This is in theory less precise because there are fewer coverage values, but in practice it works well as shown in this work. In Ray for single genomes, the unique k-mer coverage for a seed path (similar to a unitig) is simply the peak k-mer coverage for the whole graph whereas in Ray Meta the coverage values are sampled from the seed path only.</p>
</sec>
<sec><title>Algorithms for metagenome assembly</title>
<p>Notwithstanding the non-scalability of all <italic>de novo </italic>
metagenome assemblers except Ray Meta (MetaVelvet [<xref ref-type="bibr" rid="B39">39</xref>
], Meta-IDBA [<xref ref-type="bibr" rid="B40">40</xref>
] and Genovo [<xref ref-type="bibr" rid="B41">41</xref>
]), there are major differences in the algorithms these software tools implement, which are unrelated to scalability.</p>
<p>Genovo is an assembler for 454 reads. It uses a generative probabilistic model and applies a series of hill-climbing steps iteratively until convergence [<xref ref-type="bibr" rid="B41">41</xref>
]. For Genovo, the largest dataset processed had 311,000 reads. Herein, the largest dataset had 3,000,000,000 reads. MetaVelvet and Meta-IDBA both partition the de Bruijn subgraph using k-mer coverage peaks in the k-mer coverage distribution and/or connected components. This process does not work well in theory when there is no peak in the coverage distributions. MetaVelvet and Meta-IDBA both simplify the de Bruijn graph iteratively - this approach, termed equivalent transformations, was introduced by Pevzner and collaborators [<xref ref-type="bibr" rid="B43">43</xref>
]. One of the many advantages of using equivalent transformations is that the assembled sequences grow in length and their number decreases as the algorithm makes its way toward the final equivalent transformation. Equivalent transformations are hard to port to a distributed paradigm because the approach requires a mutable graph.</p>
<p>Ray Meta does not modify the de Bruijn subgraph in order to generate the assembly. We showed that applying a heuristics-guided graph traversal yields excellent assemblies. Furthermore, working with k-mers and their relationships directly is more amenable to distributed computing because unlike k-mers, contigs are neither regular nor small and are hard to load balance on numerous processes.</p>
</sec>
<sec><title>Taxonomic profiling with k-mers</title>
<p>For taxonomic profiling, we have shown that Ray Communities is accurate when compared to MetaPhlAn (Table <xref ref-type="table" rid="T2">2</xref>
). Our approach consists in building a de Bruijn graph from the raw sequencing reads, assembling it <italic>de novo</italic>
, and then coloring it with thousands of bacterial genomes in order to obtain an accurate profile of the sequenced metagenome. By using whole genomes instead of a few selected marker genes, such as the 16S RNA gene, some biases are removed (like the copy number of a gene). Furthermore, amplifications in a whole-genome sequencing protocol are not targeted toward any particular marker genes, which may remove further biases. A limitation of the method presented here is that using k-mers alone to compare sequences is highly stringent. On the other hand, aligner-based approaches can accommodate for an identity as low as 70% between sequences as sequence reads are usually mapped to reference bacterial genomes. At the crux of our method is the use of uniquely colored k-mers for signal demultiplexing (see Materials and methods). Sequencing errors produce erroneous k-mers. One of the advantages of using a de Bruijn graph is that erroneous k-mers have a small probability of being considered in the assembly [<xref ref-type="bibr" rid="B31">31</xref>
], hence sequencing errors do not contribute to taxonomic profiling for assembled sequences. However, alignment-based approaches will likely a higher sensitivity than k-mer based approaches because they are more tolerant to mismatches. Yet, the present work showed that metagenome profiling is efficiently done with k-mer counting, through the use of a colored de Bruijn graph [<xref ref-type="bibr" rid="B7">7</xref>
], and that it is also sensitive (Figure <xref ref-type="fig" rid="F2">2</xref>
) and produces results similar to those of MetaPhlAn (Table <xref ref-type="table" rid="T2">2</xref>
). With this approach, conserved DNA regions captured the biological abundance of bacteria in a sample. A k-mer length of 31 was used to give a high stringency in the coloring process. The low error rate of the sequencing technology enabled the capture of error-free k-mers for most of the genomic regions, meaning that it was unlikely that a given k-mer occurred in the sequence reads, in a known genome, but not in the actual sample.</p>
</sec>
<sec><title>Validation of assemblies</title>
<p>Using MUMmer [<xref ref-type="bibr" rid="B44">44</xref>
], we validated the quality of assemblies produced by Ray Meta. The quality test used was very stringent because any contig not aligning as one single maximum unique match with a breadth of coverage of at least 98% was marked as misassembled. In Table <xref ref-type="table" rid="T1">1</xref>
, the number of shared k-mers between assemblies produced by MetaVelvet and Ray Meta is shown. Although the overlap is significant, the k-mers unique to MetaVelvet or Ray Meta may be due to nucleotide mismatches. Moreover, improvements in sequencing technologies will provide longer reads with higher coverage depths. These advances will further improve <italic>de novo </italic>
assemblies.</p>
</sec>
</sec>
<sec sec-type="conclusions"><title>Conclusions</title>
<p>Scalability is a requirement for analyzing large metagenome datasets. We described a new method to assemble (Ray Meta) and profile (Ray Communities) a metagenome in a distributed fashion to provide unmatched scalability. It computes a metagenome <italic>de novo </italic>
assembly in parallel with a de Bruijn graph. The method also yields taxonomic profiles by coloring the graph with known references and by looking for uniquely colored k-mers to identify taxons at low taxonomic ranks or by using the lowest common ancestor otherwise. Ray Meta surpassed MetaVelvet [<xref ref-type="bibr" rid="B39">39</xref>
] for <italic>de novo </italic>
assemblies and Ray Communities compared favorably to MetaPhlAn [<xref ref-type="bibr" rid="B27">27</xref>
] for taxonomic profiling.</p>
<p>While taxonomic and functional profiling remains a useful approach to obtain a big picture of a particular sample, only <italic>de novo </italic>
metagenome assembly can truly enable discovery of otherwise unknown genes or other important DNA sequences hidden in the data.</p>
</sec>
<sec sec-type="materials|methods"><title>Materials and methods</title>
<p>Thorough documentation and associated scripts to reproduce our studies are available in Additional file <xref ref-type="supplementary-material" rid="S3">3</xref>
 on the publisher website or on <ext-link ext-link-type="uri" xlink:href="https://github.com/sebhtml/Paper-Replication-2012.">https://github.com/sebhtml/Paper-Replication-2012.</ext-link>
</p>
<sec><title>Memory model</title>
<p>Ray Meta uses the message-passing interface. As such, a 1,024-core job has 1,024 processes running on many computers. In the experiments, each node had 8 processor cores and 24 GB, or 3 GB per core. With the message-passing paradigm, each core has its own virtual memory, which is protected from any other process. Because the data are distributed uniformly using a distributed hash table, memory usage for a single process is very low. For the 1,024-core job, the maximum memory usage of any process was on average 1.5 GB.</p>
</sec>
<sec><title>Assemblies</title>
<p>Metagenome assemblies with profiling were computed with Ray v2.0.0 (Additional file <xref ref-type="supplementary-material" rid="S4">4</xref>
) on Colosse, a Compute Canada resource. Ray is open source software - the license is the GNU General Public License, version 3 (GPLv3) - and is freely available from <ext-link ext-link-type="uri" xlink:href="http://denovoassembler.sourceforge.net/">http://denovoassembler.sourceforge.net/</ext-link>
 or <ext-link ext-link-type="uri" xlink:href="http://github.com/sebhtml/ray">http://github.com/sebhtml/ray</ext-link>
. Ray can be deployed on public compute infrastructure or in the cloud (see [<xref ref-type="bibr" rid="B45">45</xref>
] for a review).</p>
<p>The algorithms implemented in the software Ray were heavily modified for metagenome <italic>de novo </italic>
assembly and these changes were called Ray Meta. Namely, the coverage distribution for k-mers in the de Bruijn graph is not utilized to infer the average coverage depth for unique genomic regions. Instead, this value is derived from local coverage distributions during the parallel assembly process. Therefore, unlike MetaVelvet [<xref ref-type="bibr" rid="B39">39</xref>
], Ray Meta does not attempt to calculate or use any global k-mer coverage depth distribution.</p>
</sec>
<sec><title>Simulated metagenomes with a power law</title>
<p>Two metagenomes (100 and 1,000 genomes, respectively) were simulated with abundances following a power law (Tables S1 and S2 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
). Power laws are commonly found in biological systems [<xref ref-type="bibr" rid="B46">46</xref>
]. Simulated sequencing errors were randomly distributed, the error rate was set at 0.25% and the average insert length was 400. The second simulated metagenome was assembled with 128 8-core computers (1,024 processor cores) interconnected with a Mellanox ConnectX QDR Infiniband fabric (Mellanox, Inc.). For the 1,000-genome dataset, messages were routed with a de Bruijn graph of degree 32 and diameter 2 to reduce the latency.</p>
</sec>
<sec><title>Validation of assemblies</title>
<p>Assembled contigs were aligned onto reference genomes using the MUMmer bioinformatics software suite [<xref ref-type="bibr" rid="B44">44</xref>
]. More precisely, deltas were generated with nucmer. Using show-coords, any contig not aligning as one single maximum with at least 98% breadth of coverage was marked as misassembled. Contigs aligning in two parts at the beginning and end of a reference were not counted as misassembled owing to the circular nature of bacterial genomes. Finally, small insertions/deletions and mismatches were obtained with show-SNPs.</p>
</sec>
<sec><title>Colored and distributed de Bruijn graphs</title>
<p>The vertices of a de Bruijn graph are distributed across processes called ranks. Here, graph coloring means labeling the vertices of a graph. A different color is added to the graph for each reference sequence. Each k-mer in any reference sequence is colored with the reference sequence color if it occurs in the distributed de Bruijn graph. Therefore, any k-mer in the graph has zero, one or more colors. First, a k-mer with no colors indicates that the k-mer does not exist in the databases provided. Second, a k-mer with one color means that this k-mer is specific to one and only one reference genome in the databases provided while at least two colors indicates that the k-mer is not specific to one single reference sequence. These reference sequences are assigned to leaves in a taxonomic tree. Reference sequences can be grouped in independent name spaces. Genome assembly is independent of graph coloring.</p>
</sec>
<sec><title>Demultiplexing signals from similar bacterial strains</title>
<p>Biological abundances were estimated using the product of the number of k-mers matched in the distributed de Bruijn graph by the mode coverage of k-mers that were uniquely colored. This number is called the number of k-mer observations. The total number of k-mer observations is the sum of coverage depth values of all colored k-mers. A proportion is calculated by dividing the number of k-mer observations by the total.</p>
</sec>
<sec><title>Taxonomic profiling</title>
<p>All bacterial genomes available in GenBank [<xref ref-type="bibr" rid="B47">47</xref>
] were utilized for coloring the distributed de Bruijn graphs (Table S4 in Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
). Each k-mer was assigned to a taxon in the taxonomic tree. When a k-mer has more than one taxon color, the coverage depth was assigned to the nearest common ancestor.</p>
</sec>
<sec><title>Gene ontology profiling</title>
<p>The de Bruijn graph was colored with coding sequences from the EMBL nucleotide sequence database [<xref ref-type="bibr" rid="B48">48</xref>
] (EMBL_CDS), which are mapped to gene ontology by transitivity using the uniprot mapping to gene ontology [<xref ref-type="bibr" rid="B49">49</xref>
]. For each ontology term, coverage depths of colored k-mers were added to obtain the total number of k-mer observations.</p>
</sec>
<sec><title>Principal component analysis</title>
<p>Principal component analysis was used to group taxonomic profiles to produce enterotypes. Data were prepared in a matrix using the genera as rows and the samples as columns. Singular values and left and right singular vectors of this matrix were obtained using singular value decomposition implemented in R. The right singular vectors were sorted by singular values. The sorted right singular vectors were used as the new base for the re-representation of the genus proportions. The two first dimensions were plotted.</p>
</sec>
<sec><title>Software implementation</title>
<p>Ray Meta is distributed software that runs on connected computers by transmitting messages over a network using the message-passing interface (MPI) and is implemented in C++. The MPI standard is implemented in libraries such as Open-MPI [<xref ref-type="bibr" rid="B50">50</xref>
] and MPICH2 [<xref ref-type="bibr" rid="B51">51</xref>
]. On each processor core, tasks are divided into smaller ones and given to a pool of 32,768 workers (thread pool), which are similar to chares in CHARM++ [<xref ref-type="bibr" rid="B52">52</xref>
]. Each of these sends messages to a virtual communicator. The latter implements a message aggregation strategy in which messages are automatically multiplexed and demultiplexed. The k-mers are stored in a distributed sparse hash table which utilizes open addressing (double hashing) for collisions. Incremental resizing is utilized in this hash table when the occupancy exceeds 90% to grow tables locally. Smart pointers are utilized in this table to perform real-time memory compaction. The software is implemented on top of RayPlatform, a development framework used to ease the creation of massively distributed high-performance computing applications.</p>
</sec>
<sec><title>Comparison with MetaVelvet</title>
<p>Software versions used were: MetaVelvet 1.2.01, Velvet 1.2.07 and Ray 2.0.0 (with Ray Meta). MetaVelvet was run on one processor core. Ray Meta was run on 64 processor cores for Human Microbiome Project samples (SRS011098, SRS017227 and SRS018661) and on 48, 32 and 32 processor cores for MetaHIT samples (ERS006494, ERS006497 and ERS006592), respectively. There were eight processor cores per node. The running time for MetaVelvet is the sum of running times for velveth, velvetg and meta-velvetg. For MetaVelvet, sequence files were filtered to remove any sequence with more than 10<italic>N </italic>
symbols. The resulting files were shuffled to create files with interleaved sequences. The insert size was manually provided to MetaVelvet and the k-mer length was set to 51 as suggested in its documentation. Peak coverages were determined automatically by MetaVelvet. Ray Meta was run with a k-mer length of 31. No other parameters were required for Ray Meta and sequence files were provided without modification to Ray Meta. The overlaps of assemblies produced by MetaVelvet and by Ray Meta were evaluated with Ray using the graph coloring features. No mismatches were allowed in k-mers. Overlaps were computed for scaffolds with at least 500 nucleotides.</p>
</sec>
<sec><title>Comparison with MetaPhlAn</title>
<p>Taxonomic profiles calculated with MetaPhlAn [<xref ref-type="bibr" rid="B27">27</xref>
] for samples from the Human Microbiome Project were obtained [<xref ref-type="bibr" rid="B24">24</xref>
]. Taxonomic profiles were produced by Ray Communities for 313 samples (Additional file <xref ref-type="supplementary-material" rid="S2">2</xref>
). Pearson's correlation was calculated for each body site by combining taxon proportions for both methods for each taxonomic rank.</p>
</sec>
</sec>
<sec><title>Abbreviations</title>
<p>MPI: message-passing interface; nt: nucleotide.</p>
</sec>
<sec><title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec><title>Authors' contributions</title>
<p>SB drafted the manuscript, implemented methods, gathered public data and performed simulations and analyses. SB, JC and FR analyzed results. SB, FL and JC designed <italic>de novo </italic>
assembly algorithms. SB and FR designed graph coloring strategies. EG and SB devised parallel distributed software designs. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material"><title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1"><caption><title>Additional file 1</title>
<p><bold>Tables S1, S2, S3 & S4</bold>
. Table S1: Composition of the simulated 100-genome metagenome. Table S2: Composition of the simulated 1,000-genome metagenome. Table S3: Overlay data on metagenome assembly of 124 gut microbiome samples. Table S4: List of genomes used for coloring de Bruijn graphs.</p>
</caption>
<media xlink:href="gb-2012-13-12-r122-S1.PDF"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2"><caption><title>Additional file 2</title>
<p><bold>List of 313 samples from the Human Microbiome Project</bold>
.</p>
</caption>
<media xlink:href="gb-2012-13-12-r122-S2.TXT"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3"><caption><title>Additional file 3</title>
<p><bold>Documentation and scripts to reproduce all experiments</bold>
.</p>
</caption>
<media xlink:href="gb-2012-13-12-r122-S3.BZ2"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4"><caption><title>Additional file 4</title>
<p><bold>Software source code for Ray Meta and Ray Communities</bold>
.</p>
</caption>
<media xlink:href="gb-2012-13-12-r122-S4.BZ2"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back><sec><title>Acknowledgements</title>
<p>Computations were performed on the Colosse supercomputer at Université Laval and the Guillimin supercomputer at McGill University (resource allocation project: nne-790-ab), under the auspices of Calcul Québec and Compute Canada. The operations on Guillimin and Colosse are funded by the Canada Foundation for Innovation (CFI), the National Science and Engineering Research Council (NSERC), NanoQuébec and the Fonds Québécois de Recherche sur la Nature et les Technologies (FQRNT). Tests were also carried out on the Mammouth-parallèle II super computer at Université de Sherbrooke (Réseau Québécois de calcul de haute performance, RQCHP).</p>
<p>JC is the Canada Research Chair in Medical Genomics. SB is recipient of a Frederick Banting and Charles Best Canada Graduate Scholarship Doctoral Award (200910GSD-226209-172830) from the Canadian Institutes for Health Research (CIHR). FR and JC acknowledge the support of the Consortium Québécois sur la découverte du médicament (CQDM) and of Mitacs through the Mitacs-Accelerate program. This research was supported in part by the Fonds de recherche du Québec - Nature et technologies (grant 2013-PR-166708 to FL and JC) and by the Discovery Grants Program (Individual, Team and Subatomic Physics Project) from the Natural Sciences and Engineering Research Council of Canada (grant 262067 to FL).</p>
</sec>
<ref-list><ref id="B1"><mixed-citation publication-type="journal"><name><surname>Wold</surname>
<given-names>B</given-names>
</name>
<name><surname>Myers</surname>
<given-names>RM</given-names>
</name>
<article-title>Sequence census methods for functional genomics.</article-title>
<source>Nature Methods</source>
<year>2008</year>
<volume>13</volume>
<fpage>19</fpage>
<lpage>21</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth1157</pub-id>
<pub-id pub-id-type="pmid">18165803</pub-id>
</mixed-citation>
</ref>
<ref id="B2"><mixed-citation publication-type="journal"><name><surname>Brenner</surname>
<given-names>S</given-names>
</name>
<article-title>Sequences and consequences.</article-title>
<source>Philosophical Transactions of the Royal Society B: Biological Sciences</source>
<year>2010</year>
<volume>13</volume>
<fpage>207</fpage>
<lpage>212</lpage>
<pub-id pub-id-type="doi">10.1098/rstb.2009.0221</pub-id>
</mixed-citation>
</ref>
<ref id="B3"><mixed-citation publication-type="journal"><name><surname>McPherson</surname>
<given-names>JD</given-names>
</name>
<article-title>Next-generation gap.</article-title>
<source>Nature Methods</source>
<year>2009</year>
<volume>13</volume>
<fpage>S2</fpage>
<lpage>S5</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.f.268</pub-id>
<pub-id pub-id-type="pmid">19844227</pub-id>
</mixed-citation>
</ref>
<ref id="B4"><mixed-citation publication-type="journal"><name><surname>Mardis</surname>
<given-names>E</given-names>
</name>
<article-title>The $1,000 genome, the $100,000 analysis?.</article-title>
<source>Genome Medicine</source>
<year>2010</year>
<volume>13</volume>
<fpage>84</fpage>
<pub-id pub-id-type="doi">10.1186/gm205</pub-id>
<pub-id pub-id-type="pmid">21114804</pub-id>
</mixed-citation>
</ref>
<ref id="B5"><mixed-citation publication-type="journal"><name><surname>Compeau</surname>
<given-names>PEC</given-names>
</name>
<name><surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
<name><surname>Tesler</surname>
<given-names>G</given-names>
</name>
<article-title>How to apply de Bruijn graphs to genome assembly.</article-title>
<source>Nature Biotechnology</source>
<year>2011</year>
<volume>13</volume>
<fpage>987</fpage>
<lpage>991</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2023</pub-id>
<pub-id pub-id-type="pmid">22068540</pub-id>
</mixed-citation>
</ref>
<ref id="B6"><mixed-citation publication-type="journal"><name><surname>Flicek</surname>
<given-names>P</given-names>
</name>
<name><surname>Birney</surname>
<given-names>E</given-names>
</name>
<article-title>Sense from sequence reads: methods for alignment and assembly.</article-title>
<source>Nature Methods</source>
<year>2009</year>
<volume>13</volume>
<fpage>S6</fpage>
<lpage>S12</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1376</pub-id>
<pub-id pub-id-type="pmid">19844229</pub-id>
</mixed-citation>
</ref>
<ref id="B7"><mixed-citation publication-type="journal"><name><surname>Iqbal</surname>
<given-names>Z</given-names>
</name>
<name><surname>Caccamo</surname>
<given-names>M</given-names>
</name>
<name><surname>Turner</surname>
<given-names>I</given-names>
</name>
<name><surname>Flicek</surname>
<given-names>P</given-names>
</name>
<name><surname>McVean</surname>
<given-names>G</given-names>
</name>
<article-title><italic>De novo </italic>
assembly and genotyping of variants using colored de Bruijn graphs.</article-title>
<source>Nature Genetics</source>
<year>2012</year>
<volume>13</volume>
<fpage>226</fpage>
<lpage>232</lpage>
<pub-id pub-id-type="doi">10.1038/ng.1028</pub-id>
<pub-id pub-id-type="pmid">22231483</pub-id>
</mixed-citation>
</ref>
<ref id="B8"><mixed-citation publication-type="journal"><name><surname>Miller</surname>
<given-names>JR</given-names>
</name>
<name><surname>Koren</surname>
<given-names>S</given-names>
</name>
<name><surname>Sutton</surname>
<given-names>G</given-names>
</name>
<article-title>Assembly algorithms for next-generation sequencing data.</article-title>
<source>Genomics</source>
<year>2010</year>
<volume>13</volume>
<fpage>315</fpage>
<lpage>327</lpage>
<pub-id pub-id-type="doi">10.1016/j.ygeno.2010.03.001</pub-id>
<pub-id pub-id-type="pmid">20211242</pub-id>
</mixed-citation>
</ref>
<ref id="B9"><mixed-citation publication-type="journal"><name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>Beware of mis-assembled genomes.</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>13</volume>
<fpage>4320</fpage>
<lpage>4321</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti769</pub-id>
<pub-id pub-id-type="pmid">16332717</pub-id>
</mixed-citation>
</ref>
<ref id="B10"><mixed-citation publication-type="journal"><name><surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>Repetitive DNA and next-generation sequencing: computational challenges and solutions.</article-title>
<source>Nature Reviews Genetics</source>
<year>2011</year>
<volume>13</volume>
<fpage>36</fpage>
<lpage>46</lpage>
<pub-id pub-id-type="pmid">22124482</pub-id>
</mixed-citation>
</ref>
<ref id="B11"><mixed-citation publication-type="journal"><name><surname>Lorenz</surname>
<given-names>P</given-names>
</name>
<name><surname>Eck</surname>
<given-names>J</given-names>
</name>
<article-title>Metagenomics and industrial applications.</article-title>
<source>Nature Reviews Microbiology</source>
<year>2005</year>
<volume>13</volume>
<fpage>510</fpage>
<lpage>516</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro1161</pub-id>
<pub-id pub-id-type="pmid">15931168</pub-id>
</mixed-citation>
</ref>
<ref id="B12"><mixed-citation publication-type="journal"><name><surname>Scholz</surname>
<given-names>MB</given-names>
</name>
<name><surname>Lo</surname>
<given-names>CC</given-names>
</name>
<name><surname>Chain</surname>
<given-names>PSG</given-names>
</name>
<article-title>Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis.</article-title>
<source>Current Opinion in Biotechnology</source>
<year>2012</year>
<volume>13</volume>
<fpage>9</fpage>
<lpage>15</lpage>
<pub-id pub-id-type="doi">10.1016/j.copbio.2011.11.013</pub-id>
<pub-id pub-id-type="pmid">22154470</pub-id>
</mixed-citation>
</ref>
<ref id="B13"><mixed-citation publication-type="journal"><name><surname>Schoenfeld</surname>
<given-names>T</given-names>
</name>
<name><surname>Patterson</surname>
<given-names>M</given-names>
</name>
<name><surname>Richardson</surname>
<given-names>PM</given-names>
</name>
<name><surname>Wommack</surname>
<given-names>KE</given-names>
</name>
<name><surname>Young</surname>
<given-names>M</given-names>
</name>
<name><surname>Mead</surname>
<given-names>D</given-names>
</name>
<article-title>Assembly of viral metagenomes from Yellowstone Hot Springs.</article-title>
<source>Applied and Environmental Microbiology</source>
<year>2008</year>
<volume>13</volume>
<fpage>4164</fpage>
<lpage>4174</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.02598-07</pub-id>
<pub-id pub-id-type="pmid">18441115</pub-id>
</mixed-citation>
</ref>
<ref id="B14"><mixed-citation publication-type="journal"><name><surname>Varin</surname>
<given-names>T</given-names>
</name>
<name><surname>Lovejoy</surname>
<given-names>C</given-names>
</name>
<name><surname>Jungblut</surname>
<given-names>AD</given-names>
</name>
<name><surname>Vincent</surname>
<given-names>WF</given-names>
</name>
<name><surname>Corbeil</surname>
<given-names>J</given-names>
</name>
<article-title>Metagenomic analysis of stress genes in microbial mat communities from Antarctica and the high Arctic.</article-title>
<source>Applied and Environmental Microbiology</source>
<year>2012</year>
<volume>13</volume>
<fpage>549</fpage>
<lpage>559</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.06354-11</pub-id>
<pub-id pub-id-type="pmid">22081564</pub-id>
</mixed-citation>
</ref>
<ref id="B15"><mixed-citation publication-type="journal"><name><surname>Varin</surname>
<given-names>T</given-names>
</name>
<name><surname>Lovejoy</surname>
<given-names>C</given-names>
</name>
<name><surname>Jungblut</surname>
<given-names>AD</given-names>
</name>
<name><surname>Vincent</surname>
<given-names>WF</given-names>
</name>
<name><surname>Corbeil</surname>
<given-names>J</given-names>
</name>
<article-title>Metagenomic profiling of Arctic microbial mat communities as nutrient scavenging and recycling systems.</article-title>
<source>Limnology and Oceanography</source>
<year>2010</year>
<volume>13</volume>
<fpage>1901</fpage>
<lpage>1911</lpage>
<pub-id pub-id-type="doi">10.4319/lo.2010.55.5.1901</pub-id>
</mixed-citation>
</ref>
<ref id="B16"><mixed-citation publication-type="journal"><name><surname>Narasingarao</surname>
<given-names>P</given-names>
</name>
<name><surname>Podell</surname>
<given-names>S</given-names>
</name>
<name><surname>Ugalde</surname>
<given-names>JA</given-names>
</name>
<name><surname>Brochier-Armanet</surname>
<given-names>C</given-names>
</name>
<name><surname>Emerson</surname>
<given-names>JB</given-names>
</name>
<name><surname>Brocks</surname>
<given-names>JJ</given-names>
</name>
<name><surname>Heidelberg</surname>
<given-names>KB</given-names>
</name>
<name><surname>Banfield</surname>
<given-names>JF</given-names>
</name>
<name><surname>Allen</surname>
<given-names>EE</given-names>
</name>
<article-title><italic>De novo </italic>
metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities.</article-title>
<source>The ISME Journal</source>
<year>2011</year>
<volume>13</volume>
<fpage>81</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="pmid">21716304</pub-id>
</mixed-citation>
</ref>
<ref id="B17"><mixed-citation publication-type="journal"><name><surname>Tringe</surname>
<given-names>SG</given-names>
</name>
<name><surname>von Mering</surname>
<given-names>C</given-names>
</name>
<name><surname>Kobayashi</surname>
<given-names>A</given-names>
</name>
<name><surname>Salamov</surname>
<given-names>AA</given-names>
</name>
<name><surname>Chen</surname>
<given-names>K</given-names>
</name>
<name><surname>Chang</surname>
<given-names>HW</given-names>
</name>
<name><surname>Podar</surname>
<given-names>M</given-names>
</name>
<name><surname>Short</surname>
<given-names>JM</given-names>
</name>
<name><surname>Mathur</surname>
<given-names>EJ</given-names>
</name>
<name><surname>Detter</surname>
<given-names>JC</given-names>
</name>
<name><surname>Bork</surname>
<given-names>P</given-names>
</name>
<name><surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name><surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<article-title>Comparative metagenomics of microbial communities.</article-title>
<source>Science</source>
<year>2005</year>
<volume>13</volume>
<fpage>554</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="doi">10.1126/science.1107851</pub-id>
<pub-id pub-id-type="pmid">15845853</pub-id>
</mixed-citation>
</ref>
<ref id="B18"><mixed-citation publication-type="journal"><name><surname>Tyson</surname>
<given-names>GW</given-names>
</name>
<name><surname>Chapman</surname>
<given-names>J</given-names>
</name>
<name><surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name><surname>Allen</surname>
<given-names>EE</given-names>
</name>
<name><surname>Ram</surname>
<given-names>RJ</given-names>
</name>
<name><surname>Richardson</surname>
<given-names>PM</given-names>
</name>
<name><surname>Solovyev</surname>
<given-names>VV</given-names>
</name>
<name><surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name><surname>Rokhsar</surname>
<given-names>DS</given-names>
</name>
<name><surname>Banfield</surname>
<given-names>JF</given-names>
</name>
<article-title>Community structure and metabolism through reconstruction of microbial genomes from the environment.</article-title>
<source>Nature</source>
<year>2004</year>
<volume>13</volume>
<fpage>37</fpage>
<lpage>43</lpage>
<pub-id pub-id-type="doi">10.1038/nature02340</pub-id>
<pub-id pub-id-type="pmid">14961025</pub-id>
</mixed-citation>
</ref>
<ref id="B19"><mixed-citation publication-type="journal"><name><surname>Naviaux</surname>
<given-names>RK</given-names>
</name>
<name><surname>Good</surname>
<given-names>B</given-names>
</name>
<name><surname>McPherson</surname>
<given-names>JD</given-names>
</name>
<name><surname>Steffen</surname>
<given-names>DL</given-names>
</name>
<name><surname>Markusic</surname>
<given-names>D</given-names>
</name>
<name><surname>Ransom</surname>
<given-names>B</given-names>
</name>
<name><surname>Corbeil</surname>
<given-names>J</given-names>
</name>
<article-title>Sand DNA - a genetic library of life at the water's edge.</article-title>
<source>Marine Ecology Progress Series</source>
<year>2005</year>
<volume>13</volume>
<fpage>9</fpage>
<lpage>22</lpage>
</mixed-citation>
</ref>
<ref id="B20"><mixed-citation publication-type="journal"><name><surname>Cho</surname>
<given-names>I</given-names>
</name>
<name><surname>Blaser</surname>
<given-names>MJ</given-names>
</name>
<article-title>The human microbiome: at the interface of health and disease.</article-title>
<source>Nature Reviews Genetics</source>
<year>2012</year>
<volume>13</volume>
<fpage>260</fpage>
<lpage>270</lpage>
<pub-id pub-id-type="pmid">22411464</pub-id>
</mixed-citation>
</ref>
<ref id="B21"><mixed-citation publication-type="journal"><name><surname>Gill</surname>
<given-names>SR</given-names>
</name>
<name><surname>Pop</surname>
<given-names>M</given-names>
</name>
<name><surname>Deboy</surname>
<given-names>RT</given-names>
</name>
<name><surname>Eckburg</surname>
<given-names>PB</given-names>
</name>
<name><surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name><surname>Samuel</surname>
<given-names>BS</given-names>
</name>
<name><surname>Gordon</surname>
<given-names>JI</given-names>
</name>
<name><surname>Relman</surname>
<given-names>DA</given-names>
</name>
<name><surname>Fraser-Liggett</surname>
<given-names>CM</given-names>
</name>
<name><surname>Nelson</surname>
<given-names>KE</given-names>
</name>
<article-title>Metagenomic analysis of the human distal gut microbiome.</article-title>
<source>Science</source>
<year>2006</year>
<volume>13</volume>
<fpage>1355</fpage>
<lpage>1359</lpage>
<pub-id pub-id-type="doi">10.1126/science.1124234</pub-id>
<pub-id pub-id-type="pmid">16741115</pub-id>
</mixed-citation>
</ref>
<ref id="B22"><mixed-citation publication-type="journal"><name><surname>Qin</surname>
<given-names>J</given-names>
</name>
<name><surname>Li</surname>
<given-names>R</given-names>
</name>
<name><surname>Raes</surname>
<given-names>J</given-names>
</name>
<name><surname>Arumugam</surname>
<given-names>M</given-names>
</name>
<name><surname>Burgdorf</surname>
<given-names>KS</given-names>
</name>
<name><surname>Manichanh</surname>
<given-names>C</given-names>
</name>
<name><surname>Nielsen</surname>
<given-names>T</given-names>
</name>
<name><surname>Pons</surname>
<given-names>N</given-names>
</name>
<name><surname>Levenez</surname>
<given-names>F</given-names>
</name>
<name><surname>Yamada</surname>
<given-names>T</given-names>
</name>
<name><surname>Mende</surname>
<given-names>DR</given-names>
</name>
<name><surname>Li</surname>
<given-names>J</given-names>
</name>
<name><surname>Xu</surname>
<given-names>J</given-names>
</name>
<name><surname>Li</surname>
<given-names>S</given-names>
</name>
<name><surname>Li</surname>
<given-names>D</given-names>
</name>
<name><surname>Cao</surname>
<given-names>J</given-names>
</name>
<name><surname>Wang</surname>
<given-names>B</given-names>
</name>
<name><surname>Liang</surname>
<given-names>H</given-names>
</name>
<name><surname>Zheng</surname>
<given-names>H</given-names>
</name>
<name><surname>Xie</surname>
<given-names>Y</given-names>
</name>
<name><surname>Tap</surname>
<given-names>J</given-names>
</name>
<name><surname>Lepage</surname>
<given-names>P</given-names>
</name>
<name><surname>Bertalan</surname>
<given-names>M</given-names>
</name>
<name><surname>Batto</surname>
<given-names>JM</given-names>
</name>
<name><surname>Hansen</surname>
<given-names>T</given-names>
</name>
<name><surname>Le Paslier</surname>
<given-names>D</given-names>
</name>
<name><surname>Linneberg</surname>
<given-names>A</given-names>
</name>
<name><surname>Nielsen</surname>
<given-names>HB</given-names>
</name>
<name><surname>Pelletier</surname>
<given-names>E</given-names>
</name>
<name><surname>Renault</surname>
<given-names>P</given-names>
</name>
<etal></etal>
<article-title>A human gut microbial gene catalogue established by metagenomic sequencing.</article-title>
<source>Nature</source>
<year>2010</year>
<volume>13</volume>
<fpage>59</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1038/nature08821</pub-id>
<pub-id pub-id-type="pmid">20203603</pub-id>
</mixed-citation>
</ref>
<ref id="B23"><mixed-citation publication-type="journal"><name><surname>Arumugam</surname>
<given-names>M</given-names>
</name>
<name><surname>Raes</surname>
<given-names>J</given-names>
</name>
<name><surname>Pelletier</surname>
<given-names>E</given-names>
</name>
<name><surname>Le Paslier</surname>
<given-names>D</given-names>
</name>
<name><surname>Yamada</surname>
<given-names>T</given-names>
</name>
<name><surname>Mende</surname>
<given-names>DR</given-names>
</name>
<name><surname>Fernandes</surname>
<given-names>GR</given-names>
</name>
<name><surname>Tap</surname>
<given-names>J</given-names>
</name>
<name><surname>Bruls</surname>
<given-names>T</given-names>
</name>
<name><surname>Batto</surname>
<given-names>JMM</given-names>
</name>
<name><surname>Bertalan</surname>
<given-names>M</given-names>
</name>
<name><surname>Borruel</surname>
<given-names>N</given-names>
</name>
<name><surname>Casellas</surname>
<given-names>F</given-names>
</name>
<name><surname>Fernandez</surname>
<given-names>L</given-names>
</name>
<name><surname>Gautier</surname>
<given-names>L</given-names>
</name>
<name><surname>Hansen</surname>
<given-names>T</given-names>
</name>
<name><surname>Hattori</surname>
<given-names>M</given-names>
</name>
<name><surname>Hayashi</surname>
<given-names>T</given-names>
</name>
<name><surname>Kleerebezem</surname>
<given-names>M</given-names>
</name>
<name><surname>Kurokawa</surname>
<given-names>K</given-names>
</name>
<name><surname>Leclerc</surname>
<given-names>M</given-names>
</name>
<name><surname>Levenez</surname>
<given-names>F</given-names>
</name>
<name><surname>Manichanh</surname>
<given-names>C</given-names>
</name>
<name><surname>Nielsen</surname>
<given-names>HB</given-names>
</name>
<name><surname>Nielsen</surname>
<given-names>T</given-names>
</name>
<name><surname>Pons</surname>
<given-names>N</given-names>
</name>
<name><surname>Poulain</surname>
<given-names>J</given-names>
</name>
<name><surname>Qin</surname>
<given-names>J</given-names>
</name>
<name><surname>Sicheritz-Ponten</surname>
<given-names>T</given-names>
</name>
<name><surname>Tims</surname>
<given-names>S</given-names>
</name>
<etal></etal>
<article-title>Enterotypes of the human gut microbiome.</article-title>
<source>Nature</source>
<year>2011</year>
<volume>13</volume>
<fpage>174</fpage>
<lpage>180</lpage>
<pub-id pub-id-type="doi">10.1038/nature09944</pub-id>
<pub-id pub-id-type="pmid">21508958</pub-id>
</mixed-citation>
</ref>
<ref id="B24"><mixed-citation publication-type="journal"><name><surname>Consortium</surname>
<given-names>THMP</given-names>
</name>
<article-title>Structure, function and diversity of the healthy human microbiome.</article-title>
<source>Nature</source>
<year>2012</year>
<volume>13</volume>
<fpage>207</fpage>
<lpage>214</lpage>
<pub-id pub-id-type="doi">10.1038/nature11234</pub-id>
<pub-id pub-id-type="pmid">22699609</pub-id>
</mixed-citation>
</ref>
<ref id="B25"><mixed-citation publication-type="journal"><name><surname>Schloss</surname>
<given-names>PD</given-names>
</name>
<name><surname>Handelsman</surname>
<given-names>J</given-names>
</name>
<article-title>Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness.</article-title>
<source>Applied and Environmental Microbiology</source>
<year>2005</year>
<volume>13</volume>
<fpage>1501</fpage>
<lpage>1506</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.71.3.1501-1506.2005</pub-id>
<pub-id pub-id-type="pmid">15746353</pub-id>
</mixed-citation>
</ref>
<ref id="B26"><mixed-citation publication-type="book"><name><surname>Liu</surname>
<given-names>B</given-names>
</name>
<name><surname>Gibbons</surname>
<given-names>T</given-names>
</name>
<name><surname>Ghodsi</surname>
<given-names>M</given-names>
</name>
<name><surname>Pop</surname>
<given-names>M</given-names>
</name>
<article-title>MetaPhyler: taxonomic profiling for metagenomic sequences.</article-title>
<source>2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</source>
<year>2010</year>
<publisher-name>IEEE</publisher-name>
<fpage>95</fpage>
<lpage>100</lpage>
</mixed-citation>
</ref>
<ref id="B27"><mixed-citation publication-type="journal"><name><surname>Segata</surname>
<given-names>N</given-names>
</name>
<name><surname>Waldron</surname>
<given-names>L</given-names>
</name>
<name><surname>Ballarini</surname>
<given-names>A</given-names>
</name>
<name><surname>Narasimhan</surname>
<given-names>V</given-names>
</name>
<name><surname>Jousson</surname>
<given-names>O</given-names>
</name>
<name><surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
<article-title>Metagenomic microbial community profiling using unique clade-specific marker genes.</article-title>
<source>Nature Methods</source>
<year>2012</year>
<volume>13</volume>
<fpage>811</fpage>
<lpage>814</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.2066</pub-id>
<pub-id pub-id-type="pmid">22688413</pub-id>
</mixed-citation>
</ref>
<ref id="B28"><mixed-citation publication-type="journal"><name><surname>McDonald</surname>
<given-names>D</given-names>
</name>
<name><surname>Price</surname>
<given-names>MN</given-names>
</name>
<name><surname>Goodrich</surname>
<given-names>J</given-names>
</name>
<name><surname>Nawrocki</surname>
<given-names>EP</given-names>
</name>
<name><surname>DeSantis</surname>
<given-names>TZ</given-names>
</name>
<name><surname>Probst</surname>
<given-names>A</given-names>
</name>
<name><surname>Andersen</surname>
<given-names>GL</given-names>
</name>
<name><surname>Knight</surname>
<given-names>R</given-names>
</name>
<name><surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<article-title>An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea.</article-title>
<source>The ISME Journal</source>
<year>2011</year>
<volume>13</volume>
<fpage>610</fpage>
<lpage>618</lpage>
<pub-id pub-id-type="pmid">22134646</pub-id>
</mixed-citation>
</ref>
<ref id="B29"><mixed-citation publication-type="journal"><name><surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name><surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name><surname>Blake</surname>
<given-names>JA</given-names>
</name>
<name><surname>Botstein</surname>
<given-names>D</given-names>
</name>
<name><surname>Butler</surname>
<given-names>H</given-names>
</name>
<name><surname>Cherry</surname>
<given-names>JM</given-names>
</name>
<name><surname>Davis</surname>
<given-names>AP</given-names>
</name>
<name><surname>Dolinski</surname>
<given-names>K</given-names>
</name>
<name><surname>Dwight</surname>
<given-names>SS</given-names>
</name>
<name><surname>Eppig</surname>
<given-names>JT</given-names>
</name>
<name><surname>Harris</surname>
<given-names>MA</given-names>
</name>
<name><surname>Hill</surname>
<given-names>DP</given-names>
</name>
<name><surname>Issel-Tarver</surname>
<given-names>L</given-names>
</name>
<name><surname>Kasarskis</surname>
<given-names>A</given-names>
</name>
<name><surname>Lewis</surname>
<given-names>S</given-names>
</name>
<name><surname>Matese</surname>
<given-names>JC</given-names>
</name>
<name><surname>Richardson</surname>
<given-names>JE</given-names>
</name>
<name><surname>Ringwald</surname>
<given-names>M</given-names>
</name>
<name><surname>Rubin</surname>
<given-names>GM</given-names>
</name>
<name><surname>Sherlock</surname>
<given-names>G</given-names>
</name>
<article-title>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.</article-title>
<source>Nature Genetics</source>
<year>2000</year>
<volume>13</volume>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="doi">10.1038/75556</pub-id>
<pub-id pub-id-type="pmid">10802651</pub-id>
</mixed-citation>
</ref>
<ref id="B30"><mixed-citation publication-type="journal"><name><surname>Simpson</surname>
<given-names>JT</given-names>
</name>
<name><surname>Wong</surname>
<given-names>K</given-names>
</name>
<name><surname>Jackman</surname>
<given-names>SD</given-names>
</name>
<name><surname>Schein</surname>
<given-names>JE</given-names>
</name>
<name><surname>Jones</surname>
<given-names>SJM</given-names>
</name>
<name><surname>Birol</surname>
<given-names>I</given-names>
</name>
<article-title>ABySS: a parallel assembler for short read sequence data.</article-title>
<source>Genome Research</source>
<year>2009</year>
<volume>13</volume>
<fpage>1117</fpage>
<lpage>1123</lpage>
<pub-id pub-id-type="doi">10.1101/gr.089532.108</pub-id>
<pub-id pub-id-type="pmid">19251739</pub-id>
</mixed-citation>
</ref>
<ref id="B31"><mixed-citation publication-type="journal"><name><surname>Boisvert</surname>
<given-names>S</given-names>
</name>
<name><surname>Laviolette</surname>
<given-names>F</given-names>
</name>
<name><surname>Corbeil</surname>
<given-names>J</given-names>
</name>
<article-title>Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.</article-title>
<source>Journal of Computational Biology</source>
<year>2010</year>
<volume>13</volume>
<fpage>1519</fpage>
<lpage>1533</lpage>
<pub-id pub-id-type="doi">10.1089/cmb.2009.0238</pub-id>
<pub-id pub-id-type="pmid">20958248</pub-id>
</mixed-citation>
</ref>
<ref id="B32"><mixed-citation publication-type="journal"><name><surname>Schatz</surname>
<given-names>MC</given-names>
</name>
<name><surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>Cloud computing and the DNA data race.</article-title>
<source>Nature Biotechnology</source>
<year>2010</year>
<volume>13</volume>
<fpage>691</fpage>
<lpage>693</lpage>
<pub-id pub-id-type="doi">10.1038/nbt0710-691</pub-id>
<pub-id pub-id-type="pmid">20622843</pub-id>
</mixed-citation>
</ref>
<ref id="B33"><mixed-citation publication-type="journal"><name><surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name><surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name><surname>Ruscheweyh</surname>
<given-names>HJ</given-names>
</name>
<name><surname>Weber</surname>
<given-names>N</given-names>
</name>
<name><surname>Schuster</surname>
<given-names>SC</given-names>
</name>
<article-title>Integrative analysis of environmental sequences using MEGAN4.</article-title>
<source>Genome Research</source>
<year>2011</year>
<volume>13</volume>
<fpage>1552</fpage>
<lpage>1560</lpage>
<pub-id pub-id-type="doi">10.1101/gr.120618.111</pub-id>
<pub-id pub-id-type="pmid">21690186</pub-id>
</mixed-citation>
</ref>
<ref id="B34"><mixed-citation publication-type="journal"><name><surname>Meyer</surname>
<given-names>F</given-names>
</name>
<name><surname>Paarmann</surname>
<given-names>D</given-names>
</name>
<name><surname>D'Souza</surname>
<given-names>M</given-names>
</name>
<name><surname>Olson</surname>
<given-names>R</given-names>
</name>
<name><surname>Glass</surname>
<given-names>EM</given-names>
</name>
<name><surname>Kubal</surname>
<given-names>M</given-names>
</name>
<name><surname>Paczian</surname>
<given-names>T</given-names>
</name>
<name><surname>Rodriguez</surname>
<given-names>A</given-names>
</name>
<name><surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name><surname>Wilke</surname>
<given-names>A</given-names>
</name>
<name><surname>Wilkening</surname>
<given-names>J</given-names>
</name>
<name><surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<article-title>The etagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>13</volume>
<fpage>386</fpage>
<lpage>388</lpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-386</pub-id>
<pub-id pub-id-type="pmid">18803844</pub-id>
</mixed-citation>
</ref>
<ref id="B35"><mixed-citation publication-type="journal"><name><surname>Dixon</surname>
<given-names>P</given-names>
</name>
<article-title>VEGAN, a package of R functions for community ecology.</article-title>
<source>Journal of Vegetation Science</source>
<year>2003</year>
<volume>13</volume>
<fpage>927</fpage>
<lpage>930</lpage>
<pub-id pub-id-type="doi">10.1111/j.1654-1103.2003.tb02228.x</pub-id>
</mixed-citation>
</ref>
<ref id="B36"><mixed-citation publication-type="journal"><name><surname>Caporaso</surname>
<given-names>JG</given-names>
</name>
<name><surname>Kuczynski</surname>
<given-names>J</given-names>
</name>
<name><surname>Stombaugh</surname>
<given-names>J</given-names>
</name>
<name><surname>Bittinger</surname>
<given-names>K</given-names>
</name>
<name><surname>Bushman</surname>
<given-names>FD</given-names>
</name>
<name><surname>Costello</surname>
<given-names>EK</given-names>
</name>
<name><surname>Fierer</surname>
<given-names>N</given-names>
</name>
<name><surname>Pena</surname>
<given-names>AG</given-names>
</name>
<name><surname>Goodrich</surname>
<given-names>JK</given-names>
</name>
<name><surname>Gordon</surname>
<given-names>JI</given-names>
</name>
<name><surname>Huttley</surname>
<given-names>GA</given-names>
</name>
<name><surname>Kelley</surname>
<given-names>ST</given-names>
</name>
<name><surname>Knights</surname>
<given-names>D</given-names>
</name>
<name><surname>Koenig</surname>
<given-names>JE</given-names>
</name>
<name><surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name><surname>Lozupone</surname>
<given-names>CA</given-names>
</name>
<name><surname>McDonald</surname>
<given-names>D</given-names>
</name>
<name><surname>Muegge</surname>
<given-names>BD</given-names>
</name>
<name><surname>Pirrung</surname>
<given-names>M</given-names>
</name>
<name><surname>Reeder</surname>
<given-names>J</given-names>
</name>
<name><surname>Sevinsky</surname>
<given-names>JR</given-names>
</name>
<name><surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name><surname>Walters</surname>
<given-names>WA</given-names>
</name>
<name><surname>Widmann</surname>
<given-names>J</given-names>
</name>
<name><surname>Yatsunenko</surname>
<given-names>T</given-names>
</name>
<name><surname>Zaneveld</surname>
<given-names>J</given-names>
</name>
<name><surname>Knight</surname>
<given-names>R</given-names>
</name>
<article-title>QIIME allows analysis of high-throughput community sequencing data.</article-title>
<source>Nature Methods</source>
<year>2010</year>
<volume>13</volume>
<fpage>335</fpage>
<lpage>336</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.f.303</pub-id>
<pub-id pub-id-type="pmid">20383131</pub-id>
</mixed-citation>
</ref>
<ref id="B37"><mixed-citation publication-type="journal"><name><surname>Krause</surname>
<given-names>L</given-names>
</name>
<name><surname>Diaz</surname>
<given-names>NN</given-names>
</name>
<name><surname>Goesmann</surname>
<given-names>A</given-names>
</name>
<name><surname>Kelley</surname>
<given-names>S</given-names>
</name>
<name><surname>Nattkemper</surname>
<given-names>TW</given-names>
</name>
<name><surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name><surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name><surname>Stoye</surname>
<given-names>J</given-names>
</name>
<article-title>Phylogenetic classification of short environmental DNA fragments.</article-title>
<source>Nucleic Acids Research</source>
<year>2008</year>
<volume>13</volume>
<fpage>2230</fpage>
<lpage>2239</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn038</pub-id>
<pub-id pub-id-type="pmid">18285365</pub-id>
</mixed-citation>
</ref>
<ref id="B38"><mixed-citation publication-type="journal"><name><surname>Brady</surname>
<given-names>A</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.</article-title>
<source>Nature Methods</source>
<year>2009</year>
<volume>13</volume>
<fpage>673</fpage>
<lpage>676</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1358</pub-id>
<pub-id pub-id-type="pmid">19648916</pub-id>
</mixed-citation>
</ref>
<ref id="B39"><mixed-citation publication-type="journal"><name><surname>Namiki</surname>
<given-names>T</given-names>
</name>
<name><surname>Hachiya</surname>
<given-names>T</given-names>
</name>
<name><surname>Tanaka</surname>
<given-names>H</given-names>
</name>
<name><surname>Sakakibara</surname>
<given-names>Y</given-names>
</name>
<article-title>MetaVelvet: an extension of Velvet assembler to <italic>de novo </italic>
metagenome assembly from short sequence reads.</article-title>
<source>Nucleic Acids Research</source>
<year>2012</year>
<volume>13</volume>
<fpage>e155</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gks678</pub-id>
<pub-id pub-id-type="pmid">22821567</pub-id>
</mixed-citation>
</ref>
<ref id="B40"><mixed-citation publication-type="journal"><name><surname>Peng</surname>
<given-names>Y</given-names>
</name>
<name><surname>Leung</surname>
<given-names>HCM</given-names>
</name>
<name><surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<name><surname>Chin</surname>
<given-names>FYL</given-names>
</name>
<article-title>Meta-IDBA: a <italic>de novo </italic>
assembler for metagenomic data.</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>13</volume>
<fpage>i94</fpage>
<lpage>i101</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr216</pub-id>
<pub-id pub-id-type="pmid">21685107</pub-id>
</mixed-citation>
</ref>
<ref id="B41"><mixed-citation publication-type="journal"><name><surname>Laserson</surname>
<given-names>J</given-names>
</name>
<name><surname>Jojic</surname>
<given-names>V</given-names>
</name>
<name><surname>Koller</surname>
<given-names>D</given-names>
</name>
<article-title>Genovo: de novo assembly for metagenomes.</article-title>
<source>Journal of Computational Biology</source>
<year>2011</year>
<volume>13</volume>
<fpage>429</fpage>
<lpage>443</lpage>
<pub-id pub-id-type="doi">10.1089/cmb.2010.0244</pub-id>
<pub-id pub-id-type="pmid">21385045</pub-id>
</mixed-citation>
</ref>
<ref id="B42"><mixed-citation publication-type="journal"><name><surname>Wu</surname>
<given-names>GD</given-names>
</name>
<name><surname>Chen</surname>
<given-names>J</given-names>
</name>
<name><surname>Hoffmann</surname>
<given-names>C</given-names>
</name>
<name><surname>Bittinger</surname>
<given-names>K</given-names>
</name>
<name><surname>Chen</surname>
<given-names>YYY</given-names>
</name>
<name><surname>Keilbaugh</surname>
<given-names>SA</given-names>
</name>
<name><surname>Bewtra</surname>
<given-names>M</given-names>
</name>
<name><surname>Knights</surname>
<given-names>D</given-names>
</name>
<name><surname>Walters</surname>
<given-names>WA</given-names>
</name>
<name><surname>Knight</surname>
<given-names>R</given-names>
</name>
<name><surname>Sinha</surname>
<given-names>R</given-names>
</name>
<name><surname>Gilroy</surname>
<given-names>E</given-names>
</name>
<name><surname>Gupta</surname>
<given-names>K</given-names>
</name>
<name><surname>Baldassano</surname>
<given-names>R</given-names>
</name>
<name><surname>Nessel</surname>
<given-names>L</given-names>
</name>
<name><surname>Li</surname>
<given-names>H</given-names>
</name>
<name><surname>Bushman</surname>
<given-names>FD</given-names>
</name>
<name><surname>Lewis</surname>
<given-names>JD</given-names>
</name>
<article-title>Linking long-term dietary patterns with gut microbial enterotypes.</article-title>
<source>Science (New York, NY)</source>
<year>2011</year>
<volume>13</volume>
<fpage>105</fpage>
<lpage>108</lpage>
<pub-id pub-id-type="doi">10.1126/science.1208344</pub-id>
</mixed-citation>
</ref>
<ref id="B43"><mixed-citation publication-type="journal"><name><surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
<name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<name><surname>Waterman</surname>
<given-names>MS</given-names>
</name>
<article-title>An Eulerian path approach to DNA fragment assembly.</article-title>
<source>Proceedings of the National Academy of Sciences</source>
<year>2001</year>
<volume>13</volume>
<fpage>9748</fpage>
<lpage>9753</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.171285098</pub-id>
</mixed-citation>
</ref>
<ref id="B44"><mixed-citation publication-type="journal"><name><surname>Kurtz</surname>
<given-names>S</given-names>
</name>
<name><surname>Phillippy</surname>
<given-names>A</given-names>
</name>
<name><surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name><surname>Smoot</surname>
<given-names>M</given-names>
</name>
<name><surname>Shumway</surname>
<given-names>M</given-names>
</name>
<name><surname>Antonescu</surname>
<given-names>C</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>Versatile and open software for comparing large genomes.</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>13</volume>
<fpage>R12</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2004-5-2-r12</pub-id>
<pub-id pub-id-type="pmid">14759262</pub-id>
</mixed-citation>
</ref>
<ref id="B45"><mixed-citation publication-type="journal"><name><surname>Schadt</surname>
<given-names>EE</given-names>
</name>
<name><surname>Linderman</surname>
<given-names>MD</given-names>
</name>
<name><surname>Sorenson</surname>
<given-names>J</given-names>
</name>
<name><surname>Lee</surname>
<given-names>L</given-names>
</name>
<name><surname>Nolan</surname>
<given-names>GP</given-names>
</name>
<article-title>Computational solutions to large-scale data management and analysis.</article-title>
<source>Nature Reviews Genetics</source>
<year>2010</year>
<volume>13</volume>
<fpage>647</fpage>
<lpage>657</lpage>
<pub-id pub-id-type="pmid">20717155</pub-id>
</mixed-citation>
</ref>
<ref id="B46"><mixed-citation publication-type="journal"><name><surname>Barabasi</surname>
<given-names>AL</given-names>
</name>
<name><surname>Oltvai</surname>
<given-names>ZN</given-names>
</name>
<article-title>Network biology: understanding the cell's functional organization.</article-title>
<source>Nature Reviews Genetics</source>
<year>2004</year>
<volume>13</volume>
<fpage>101</fpage>
<lpage>113</lpage>
<pub-id pub-id-type="doi">10.1038/nrg1272</pub-id>
<pub-id pub-id-type="pmid">14735121</pub-id>
</mixed-citation>
</ref>
<ref id="B47"><mixed-citation publication-type="journal"><name><surname>Benson</surname>
<given-names>DA</given-names>
</name>
<name><surname>Boguski</surname>
<given-names>MS</given-names>
</name>
<name><surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
<name><surname>Ostell</surname>
<given-names>J</given-names>
</name>
<article-title>GenBank.</article-title>
<source>Nucleic Acids Research</source>
<year>1997</year>
<volume>13</volume>
<fpage>1</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1093/nar/25.1.1</pub-id>
<pub-id pub-id-type="pmid">9016491</pub-id>
</mixed-citation>
</ref>
<ref id="B48"><mixed-citation publication-type="journal"><name><surname>Kulikova</surname>
<given-names>T</given-names>
</name>
<name><surname>Aldebert</surname>
<given-names>P</given-names>
</name>
<name><surname>Althorpe</surname>
<given-names>N</given-names>
</name>
<name><surname>Baker</surname>
<given-names>W</given-names>
</name>
<name><surname>Bates</surname>
<given-names>K</given-names>
</name>
<name><surname>Browne</surname>
<given-names>P</given-names>
</name>
<name><surname>van den Broek</surname>
<given-names>A</given-names>
</name>
<name><surname>Cochrane</surname>
<given-names>G</given-names>
</name>
<name><surname>Duggan</surname>
<given-names>K</given-names>
</name>
<name><surname>Eberhardt</surname>
<given-names>R</given-names>
</name>
<name><surname>Faruque</surname>
<given-names>N</given-names>
</name>
<name><surname>Garcia-Pastor</surname>
<given-names>M</given-names>
</name>
<name><surname>Harte</surname>
<given-names>N</given-names>
</name>
<name><surname>Kanz</surname>
<given-names>C</given-names>
</name>
<name><surname>Leinonen</surname>
<given-names>R</given-names>
</name>
<name><surname>Lin</surname>
<given-names>Q</given-names>
</name>
<name><surname>Lombard</surname>
<given-names>V</given-names>
</name>
<name><surname>Lopez</surname>
<given-names>R</given-names>
</name>
<name><surname>Mancuso</surname>
<given-names>R</given-names>
</name>
<name><surname>McHale</surname>
<given-names>M</given-names>
</name>
<name><surname>Nardone</surname>
<given-names>F</given-names>
</name>
<name><surname>Silventoinen</surname>
<given-names>V</given-names>
</name>
<name><surname>Stoehr</surname>
<given-names>P</given-names>
</name>
<name><surname>Stoesser</surname>
<given-names>G</given-names>
</name>
<name><surname>Ann</surname>
<given-names>M</given-names>
</name>
<name><surname>Tzouvara</surname>
<given-names>K</given-names>
</name>
<name><surname>Vaughan</surname>
<given-names>R</given-names>
</name>
<name><surname>Wu</surname>
<given-names>D</given-names>
</name>
<name><surname>Zhu</surname>
<given-names>W</given-names>
</name>
<name><surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<article-title>The EMBL nucleotide sequence database.</article-title>
<source>Nucleic Acids Research</source>
<year>2004</year>
<volume>13</volume>
<fpage>D27</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh120</pub-id>
<pub-id pub-id-type="pmid">14681351</pub-id>
</mixed-citation>
</ref>
<ref id="B49"><mixed-citation publication-type="journal"><name><surname>Camon</surname>
<given-names>E</given-names>
</name>
<name><surname>Magrane</surname>
<given-names>M</given-names>
</name>
<name><surname>Barrell</surname>
<given-names>D</given-names>
</name>
<name><surname>Lee</surname>
<given-names>V</given-names>
</name>
<name><surname>Dimmer</surname>
<given-names>E</given-names>
</name>
<name><surname>Maslen</surname>
<given-names>J</given-names>
</name>
<name><surname>Binns</surname>
<given-names>D</given-names>
</name>
<name><surname>Harte</surname>
<given-names>N</given-names>
</name>
<name><surname>Lopez</surname>
<given-names>R</given-names>
</name>
<name><surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<article-title>The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology.</article-title>
<source>Nucleic Acids Research</source>
<year>2004</year>
<volume>13</volume>
<fpage>D262</fpage>
<lpage>266</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh021</pub-id>
<pub-id pub-id-type="pmid">14681408</pub-id>
</mixed-citation>
</ref>
<ref id="B50"><mixed-citation publication-type="book"><name><surname>Gabriel</surname>
<given-names>E</given-names>
</name>
<name><surname>Fagg</surname>
<given-names>G</given-names>
</name>
<name><surname>Bosilca</surname>
<given-names>G</given-names>
</name>
<name><surname>Angskun</surname>
<given-names>T</given-names>
</name>
<name><surname>Dongarra</surname>
<given-names>J</given-names>
</name>
<name><surname>Squyres</surname>
<given-names>J</given-names>
</name>
<name><surname>Sahay</surname>
<given-names>V</given-names>
</name>
<name><surname>Kambadur</surname>
<given-names>P</given-names>
</name>
<name><surname>Barrett</surname>
<given-names>B</given-names>
</name>
<name><surname>Lumsdaine</surname>
<given-names>A</given-names>
</name>
<name><surname>Castain</surname>
<given-names>R</given-names>
</name>
<name><surname>Daniel</surname>
<given-names>D</given-names>
</name>
<name><surname>Graham</surname>
<given-names>R</given-names>
</name>
<name><surname>Woodall</surname>
<given-names>T</given-names>
</name>
<name><surname>Gabriel</surname>
<given-names>E</given-names>
</name>
<name><surname>Fagg</surname>
<given-names>GE</given-names>
</name>
<name><surname>Bosilca</surname>
<given-names>G</given-names>
</name>
<name><surname>Angskun</surname>
<given-names>T</given-names>
</name>
<name><surname>Dongarra</surname>
<given-names>JJ</given-names>
</name>
<name><surname>Squyres</surname>
<given-names>JM</given-names>
</name>
<name><surname>Sahay</surname>
<given-names>V</given-names>
</name>
<name><surname>Kambadur</surname>
<given-names>P</given-names>
</name>
<name><surname>Barrett</surname>
<given-names>B</given-names>
</name>
<name><surname>Lumsdaine</surname>
<given-names>A</given-names>
</name>
<name><surname>Castain</surname>
<given-names>RH</given-names>
</name>
<name><surname>Daniel</surname>
<given-names>DJ</given-names>
</name>
<name><surname>Graham</surname>
<given-names>RL</given-names>
</name>
<name><surname>Woodall</surname>
<given-names>TS</given-names>
</name>
<person-group person-group-type="editor">Kranzlmüller D, Kacsuk P, Dongarra J. Berlin, Heidelberg</person-group>
<article-title>Open MPI: goals, concept, and design of a next generation MPI implementation recent advances in parallel virtual machine and message massing interface.</article-title>
<source>Recent Advances in Parallel Virtual Machine and Message Passing Interface, Volume 3241 of Lecture Notes in Computer Science</source>
<volume>13</volume>
<publisher-name>Springer Berlin/Heidelberg</publisher-name>
<fpage>353</fpage>
<lpage>377</lpage>
</mixed-citation>
</ref>
<ref id="B51"><mixed-citation publication-type="book"><name><surname>Gropp</surname>
<given-names>W</given-names>
</name>
<person-group person-group-type="editor">Kranzlmüller D, Volkert J, Kacsuk P, Dongarra J. Berlin, Heidelberg</person-group>
<article-title>MPICH2: A new start for MPI implementations.</article-title>
<source>Recent Advances in Parallel Virtual Machine and Message Passing Interface, Volume 2474 of Lecture Notes in Computer Science</source>
<year>2002</year>
<publisher-name>Springer Berlin/Heidelberg</publisher-name>
<fpage>37</fpage>
<lpage>42</lpage>
</mixed-citation>
</ref>
<ref id="B52"><mixed-citation publication-type="book"><name><surname>Kale</surname>
<given-names>LV</given-names>
</name>
<name><surname>Krishnan</surname>
<given-names>S</given-names>
</name>
<article-title>CHARM++: a portable concurrent object oriented system based on C++.</article-title>
<source>Proceedings of the 8th Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA '93, New York, NY, USA</source>
<year>1993</year>
<publisher-name>ACM</publisher-name>
<fpage>91</fpage>
<lpage>108</lpage>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000965  | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000965  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri