Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Primer on Metagenomics

Identifieur interne : 000714 ( Pmc/Corpus ); précédent : 000713; suivant : 000715

A Primer on Metagenomics

Auteurs : John C. Wooley ; Adam Godzik ; Iddo Friedberg

Source :

RBID : PMC:2829047

Abstract

Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.


Url:
DOI: 10.1371/journal.pcbi.1000667
PubMed: 20195499
PubMed Central: 2829047

Links to Exploration step

PMC:2829047

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A Primer on Metagenomics</title>
<author>
<name sortKey="Wooley, John C" sort="Wooley, John C" uniqKey="Wooley J" first="John C." last="Wooley">John C. Wooley</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Godzik, Adam" sort="Godzik, Adam" uniqKey="Godzik A" first="Adam" last="Godzik">Adam Godzik</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Program in Bioinformatics and Systems Biology, Burnham Institute for Medical Research, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Friedberg, Iddo" sort="Friedberg, Iddo" uniqKey="Friedberg I" first="Iddo" last="Friedberg">Iddo Friedberg</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Microbiology, Miami University, Oxford, Ohio, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20195499</idno>
<idno type="pmc">2829047</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829047</idno>
<idno type="RBID">PMC:2829047</idno>
<idno type="doi">10.1371/journal.pcbi.1000667</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000714</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A Primer on Metagenomics</title>
<author>
<name sortKey="Wooley, John C" sort="Wooley, John C" uniqKey="Wooley J" first="John C." last="Wooley">John C. Wooley</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Godzik, Adam" sort="Godzik, Adam" uniqKey="Godzik A" first="Adam" last="Godzik">Adam Godzik</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Program in Bioinformatics and Systems Biology, Burnham Institute for Medical Research, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Friedberg, Iddo" sort="Friedberg, Iddo" uniqKey="Friedberg I" first="Iddo" last="Friedberg">Iddo Friedberg</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Microbiology, Miami University, Oxford, Ohio, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS Computational Biology</title>
<idno type="ISSN">1553-734X</idno>
<idno type="eISSN">1553-7358</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitman, Wb" uniqKey="Whitman W">WB Whitman</name>
</author>
<author>
<name sortKey="Coleman, Dc" uniqKey="Coleman D">DC Coleman</name>
</author>
<author>
<name sortKey="Wiebe, Wj" uniqKey="Wiebe W">WJ Wiebe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Savage, Dc" uniqKey="Savage D">DC Savage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berg, R" uniqKey="Berg R">R Berg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collins, Fs" uniqKey="Collins F">FS Collins</name>
</author>
<author>
<name sortKey="Mckusick, Va" uniqKey="Mckusick V">VA McKusick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaput, J" uniqKey="Kaput J">J Kaput</name>
</author>
<author>
<name sortKey="Cotton, Rghg" uniqKey="Cotton R">RGHG Cotton</name>
</author>
<author>
<name sortKey="Hardman, L" uniqKey="Hardman L">L Hardman</name>
</author>
<author>
<name sortKey="Watson, M" uniqKey="Watson M">M Watson</name>
</author>
<author>
<name sortKey="Al Aqeel, Aii" uniqKey="Al Aqeel A">AII Al Aqeel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Hara, Am" uniqKey="O Hara A">AM O'Hara</name>
</author>
<author>
<name sortKey="Shanahan, F" uniqKey="Shanahan F">F Shanahan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fiers, W" uniqKey="Fiers W">W Fiers</name>
</author>
<author>
<name sortKey="Contreras, R" uniqKey="Contreras R">R Contreras</name>
</author>
<author>
<name sortKey="Duerinck, F" uniqKey="Duerinck F">F Duerinck</name>
</author>
<author>
<name sortKey="Haegeman, G" uniqKey="Haegeman G">G Haegeman</name>
</author>
<author>
<name sortKey="Iserentant, D" uniqKey="Iserentant D">D Iserentant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanger, F" uniqKey="Sanger F">F Sanger</name>
</author>
<author>
<name sortKey="Coulson, Ar" uniqKey="Coulson A">AR Coulson</name>
</author>
<author>
<name sortKey="Friedmann, T" uniqKey="Friedmann T">T Friedmann</name>
</author>
<author>
<name sortKey="Air, Gm" uniqKey="Air G">GM Air</name>
</author>
<author>
<name sortKey="Barrell, Bg" uniqKey="Barrell B">BG Barrell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fleischmann, Rd" uniqKey="Fleischmann R">RD Fleischmann</name>
</author>
<author>
<name sortKey="Adams, Md" uniqKey="Adams M">MD Adams</name>
</author>
<author>
<name sortKey="White, O" uniqKey="White O">O White</name>
</author>
<author>
<name sortKey="Clayton, Ra" uniqKey="Clayton R">RA Clayton</name>
</author>
<author>
<name sortKey="Kirkness, Ef" uniqKey="Kirkness E">EF Kirkness</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amann, Ri" uniqKey="Amann R">RI Amann</name>
</author>
<author>
<name sortKey="Ludwig, W" uniqKey="Ludwig W">W Ludwig</name>
</author>
<author>
<name sortKey="Schleifer, Kh" uniqKey="Schleifer K">KH Schleifer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pace, Nr" uniqKey="Pace N">NR Pace</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rappe, Ms" uniqKey="Rappe M">MS Rappé</name>
</author>
<author>
<name sortKey="Giovannoni, Sj" uniqKey="Giovannoni S">SJ Giovannoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Handelsman, J" uniqKey="Handelsman J">J Handelsman</name>
</author>
<author>
<name sortKey="Rondon, Mr" uniqKey="Rondon M">MR Rondon</name>
</author>
<author>
<name sortKey="Brady, Sf" uniqKey="Brady S">SF Brady</name>
</author>
<author>
<name sortKey="Clardy, J" uniqKey="Clardy J">J Clardy</name>
</author>
<author>
<name sortKey="Goodman, Rm" uniqKey="Goodman R">RM Goodman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rondon, Mr" uniqKey="Rondon M">MR Rondon</name>
</author>
<author>
<name sortKey="August, Pr" uniqKey="August P">PR August</name>
</author>
<author>
<name sortKey="Bettermann, Ad" uniqKey="Bettermann A">AD Bettermann</name>
</author>
<author>
<name sortKey="Brady, Sf" uniqKey="Brady S">SF Brady</name>
</author>
<author>
<name sortKey="Grossman, Th" uniqKey="Grossman T">TH Grossman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Field, D" uniqKey="Field D">D Field</name>
</author>
<author>
<name sortKey="Garrity, G" uniqKey="Garrity G">G Garrity</name>
</author>
<author>
<name sortKey="Gray, T" uniqKey="Gray T">T Gray</name>
</author>
<author>
<name sortKey="Morrison, N" uniqKey="Morrison N">N Morrison</name>
</author>
<author>
<name sortKey="Selengut, J" uniqKey="Selengut J">J Selengut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kottmann, R" uniqKey="Kottmann R">R Kottmann</name>
</author>
<author>
<name sortKey="Gray, T" uniqKey="Gray T">T Gray</name>
</author>
<author>
<name sortKey="Murphy, S" uniqKey="Murphy S">S Murphy</name>
</author>
<author>
<name sortKey="Kagan, L" uniqKey="Kagan L">L Kagan</name>
</author>
<author>
<name sortKey="Kravitz, S" uniqKey="Kravitz S">S Kravitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brazma, A" uniqKey="Brazma A">A Brazma</name>
</author>
<author>
<name sortKey="Hingamp, P" uniqKey="Hingamp P">P Hingamp</name>
</author>
<author>
<name sortKey="Quackenbush, J" uniqKey="Quackenbush J">J Quackenbush</name>
</author>
<author>
<name sortKey="Sherlock, G" uniqKey="Sherlock G">G Sherlock</name>
</author>
<author>
<name sortKey="Spellman, P" uniqKey="Spellman P">P Spellman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Westbrook, Jd" uniqKey="Westbrook J">JD Westbrook</name>
</author>
<author>
<name sortKey="Fitzgerald, Pm" uniqKey="Fitzgerald P">PM Fitzgerald</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanger, F" uniqKey="Sanger F">F Sanger</name>
</author>
<author>
<name sortKey="Coulson, Ar" uniqKey="Coulson A">AR Coulson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanger, F" uniqKey="Sanger F">F Sanger</name>
</author>
<author>
<name sortKey="Nicklen, S" uniqKey="Nicklen S">S Nicklen</name>
</author>
<author>
<name sortKey="Coulson, Ar" uniqKey="Coulson A">AR Coulson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sorek, R" uniqKey="Sorek R">R Sorek</name>
</author>
<author>
<name sortKey="Zhu, Y" uniqKey="Zhu Y">Y Zhu</name>
</author>
<author>
<name sortKey="Creevey, Cj" uniqKey="Creevey C">CJ Creevey</name>
</author>
<author>
<name sortKey="Francino, Mp" uniqKey="Francino M">MP Francino</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pedros Alio, C" uniqKey="Pedros Alio C">C Pedros-Alio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sogin, Ml" uniqKey="Sogin M">ML Sogin</name>
</author>
<author>
<name sortKey="Morrison, Hg" uniqKey="Morrison H">HG Morrison</name>
</author>
<author>
<name sortKey="Huber, Ja" uniqKey="Huber J">JA Huber</name>
</author>
<author>
<name sortKey="Mark Welch, D" uniqKey="Mark Welch D">D Mark Welch</name>
</author>
<author>
<name sortKey="Huse, Sm" uniqKey="Huse S">SM Huse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hamp, Tj" uniqKey="Hamp T">TJ Hamp</name>
</author>
<author>
<name sortKey="Jones, Wj" uniqKey="Jones W">WJ Jones</name>
</author>
<author>
<name sortKey="Fodor, Aa" uniqKey="Fodor A">AA Fodor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Neufeld, Jd" uniqKey="Neufeld J">JD Neufeld</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Mohn, Ww" uniqKey="Mohn W">WW Mohn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitra, Rd" uniqKey="Mitra R">RD Mitra</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Porreca, Gj" uniqKey="Porreca G">GJ Porreca</name>
</author>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nyren, P" uniqKey="Nyren P">P Nyrén</name>
</author>
<author>
<name sortKey="Pettersson, B" uniqKey="Pettersson B">B Pettersson</name>
</author>
<author>
<name sortKey="Uhlen, M" uniqKey="Uhlen M">M Uhlén</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ronaghi, M" uniqKey="Ronaghi M">M Ronaghi</name>
</author>
<author>
<name sortKey="Uhlen, M" uniqKey="Uhlen M">M Uhlén</name>
</author>
<author>
<name sortKey="Nyren, P" uniqKey="Nyren P">P Nyrén</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Margulies, M" uniqKey="Margulies M">M Margulies</name>
</author>
<author>
<name sortKey="Egholm, M" uniqKey="Egholm M">M Egholm</name>
</author>
<author>
<name sortKey="Altman, We" uniqKey="Altman W">WE Altman</name>
</author>
<author>
<name sortKey="Attiya, S" uniqKey="Attiya S">S Attiya</name>
</author>
<author>
<name sortKey="Bader, Js" uniqKey="Bader J">JS Bader</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holt, Ra" uniqKey="Holt R">RA Holt</name>
</author>
<author>
<name sortKey="Jones, Sjm" uniqKey="Jones S">SJM Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
<author>
<name sortKey="Ji, H" uniqKey="Ji H">H Ji</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harismendy, O" uniqKey="Harismendy O">O Harismendy</name>
</author>
<author>
<name sortKey="Ng, P" uniqKey="Ng P">P Ng</name>
</author>
<author>
<name sortKey="Strausberg, R" uniqKey="Strausberg R">R Strausberg</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Stockwell, T" uniqKey="Stockwell T">T Stockwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcpherson, Jd" uniqKey="Mcpherson J">JD McPherson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clarke, J" uniqKey="Clarke J">J Clarke</name>
</author>
<author>
<name sortKey="Wu, Hc" uniqKey="Wu H">HC Wu</name>
</author>
<author>
<name sortKey="Jayasinghe, L" uniqKey="Jayasinghe L">L Jayasinghe</name>
</author>
<author>
<name sortKey="Patel, A" uniqKey="Patel A">A Patel</name>
</author>
<author>
<name sortKey="Reid, S" uniqKey="Reid S">S Reid</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eid, J" uniqKey="Eid J">J Eid</name>
</author>
<author>
<name sortKey="Fehr, A" uniqKey="Fehr A">A Fehr</name>
</author>
<author>
<name sortKey="Gray, J" uniqKey="Gray J">J Gray</name>
</author>
<author>
<name sortKey="Luong, K" uniqKey="Luong K">K Luong</name>
</author>
<author>
<name sortKey="Lyle, J" uniqKey="Lyle J">J Lyle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Branton, D" uniqKey="Branton D">D Branton</name>
</author>
<author>
<name sortKey="Deamer, Dw" uniqKey="Deamer D">DW Deamer</name>
</author>
<author>
<name sortKey="Marziali, A" uniqKey="Marziali A">A Marziali</name>
</author>
<author>
<name sortKey="Bayley, H" uniqKey="Bayley H">H Bayley</name>
</author>
<author>
<name sortKey="Benner, Sa" uniqKey="Benner S">SA Benner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Torsvik, V" uniqKey="Torsvik V">V Torsvik</name>
</author>
<author>
<name sortKey="Goksoyr, J" uniqKey="Goksoyr J">J Goksoyr</name>
</author>
<author>
<name sortKey="Daae, Fl" uniqKey="Daae F">FL Daae</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Youssef, Nh" uniqKey="Youssef N">NH Youssef</name>
</author>
<author>
<name sortKey="Elshahed, Ms" uniqKey="Elshahed M">MS Elshahed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fierer, N" uniqKey="Fierer N">N Fierer</name>
</author>
<author>
<name sortKey="Jackson, Rb" uniqKey="Jackson R">RB Jackson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Countway, Pd" uniqKey="Countway P">PD Countway</name>
</author>
<author>
<name sortKey="Gast, Rj" uniqKey="Gast R">RJ Gast</name>
</author>
<author>
<name sortKey="Pratik Sava, I" uniqKey="Pratik Sava I">I Pratik Sava</name>
</author>
<author>
<name sortKey="Caron, Da" uniqKey="Caron D">DA Caron</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Korbel, Jo" uniqKey="Korbel J">JO Korbel</name>
</author>
<author>
<name sortKey="Lercher, Mj" uniqKey="Lercher M">MJ Lercher</name>
</author>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Richter, Dc" uniqKey="Richter D">DC Richter</name>
</author>
<author>
<name sortKey="Ott, F" uniqKey="Ott F">F Ott</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Schmid, R" uniqKey="Schmid R">R Schmid</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author>
<name sortKey="Stanley, K" uniqKey="Stanley K">K Stanley</name>
</author>
<author>
<name sortKey="Butler, J" uniqKey="Butler J">J Butler</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aparicio, S" uniqKey="Aparicio S">S Aparicio</name>
</author>
<author>
<name sortKey="Chapman, J" uniqKey="Chapman J">J Chapman</name>
</author>
<author>
<name sortKey="Stupka, E" uniqKey="Stupka E">E Stupka</name>
</author>
<author>
<name sortKey="Putnam, N" uniqKey="Putnam N">N Putnam</name>
</author>
<author>
<name sortKey="Ming Chia, J" uniqKey="Ming Chia J">J ming Chia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author>
<name sortKey="Dew, Im" uniqKey="Dew I">IM Dew</name>
</author>
<author>
<name sortKey="Fasulo, Dp" uniqKey="Fasulo D">DP Fasulo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mavromatis, K" uniqKey="Mavromatis K">K Mavromatis</name>
</author>
<author>
<name sortKey="Ivanova, N" uniqKey="Ivanova N">N Ivanova</name>
</author>
<author>
<name sortKey="Barry, K" uniqKey="Barry K">K Barry</name>
</author>
<author>
<name sortKey="Shapiro, H" uniqKey="Shapiro H">H Shapiro</name>
</author>
<author>
<name sortKey="Goltsman, E" uniqKey="Goltsman E">E Goltsman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, M" uniqKey="Chaisson M">M Chaisson</name>
</author>
<author>
<name sortKey="Pevzner, P" uniqKey="Pevzner P">P Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sundquist, A" uniqKey="Sundquist A">A Sundquist</name>
</author>
<author>
<name sortKey="Ronaghi, M" uniqKey="Ronaghi M">M Ronaghi</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Pevzner, P" uniqKey="Pevzner P">P Pevzner</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warren, Rl" uniqKey="Warren R">RL Warren</name>
</author>
<author>
<name sortKey="Sutton, Gg" uniqKey="Sutton G">GG Sutton</name>
</author>
<author>
<name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author>
<name sortKey="Holt, Ra" uniqKey="Holt R">RA Holt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ye, Y" uniqKey="Ye Y">Y Ye</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yooseph, S" uniqKey="Yooseph S">S Yooseph</name>
</author>
<author>
<name sortKey="Sutton, G" uniqKey="Sutton G">G Sutton</name>
</author>
<author>
<name sortKey="Rusch, Db" uniqKey="Rusch D">DB Rusch</name>
</author>
<author>
<name sortKey="Halpern, Al" uniqKey="Halpern A">AL Halpern</name>
</author>
<author>
<name sortKey="Williamson, Sj" uniqKey="Williamson S">SJ Williamson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author>
<name sortKey="Sch Ffer, Aa" uniqKey="Sch Ffer A">AA Schäffer</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Azad, Rk" uniqKey="Azad R">RK Azad</name>
</author>
<author>
<name sortKey="Borodovsky, M" uniqKey="Borodovsky M">M Borodovsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yooseph, S" uniqKey="Yooseph S">S Yooseph</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Sutton, G" uniqKey="Sutton G">G Sutton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoff, Kj" uniqKey="Hoff K">KJ Hoff</name>
</author>
<author>
<name sortKey="Tech, M" uniqKey="Tech M">M Tech</name>
</author>
<author>
<name sortKey="Lingner, T" uniqKey="Lingner T">T Lingner</name>
</author>
<author>
<name sortKey="Daniel, R" uniqKey="Daniel R">R Daniel</name>
</author>
<author>
<name sortKey="Morgenstern, B" uniqKey="Morgenstern B">B Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schouls, Lm" uniqKey="Schouls L">LM Schouls</name>
</author>
<author>
<name sortKey="Schot, Cs" uniqKey="Schot C">CS Schot</name>
</author>
<author>
<name sortKey="Jacobs, Ja" uniqKey="Jacobs J">JA Jacobs</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Desantis, Tz" uniqKey="Desantis T">TZ DeSantis</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Larsen, N" uniqKey="Larsen N">N Larsen</name>
</author>
<author>
<name sortKey="Rojas, M" uniqKey="Rojas M">M Rojas</name>
</author>
<author>
<name sortKey="Brodie, El" uniqKey="Brodie E">EL Brodie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Case, Rj" uniqKey="Case R">RJ Case</name>
</author>
<author>
<name sortKey="Boucher, Y" uniqKey="Boucher Y">Y Boucher</name>
</author>
<author>
<name sortKey="Dahllof, I" uniqKey="Dahllof I">I Dahllof</name>
</author>
<author>
<name sortKey="Holmstrom, C" uniqKey="Holmstrom C">C Holmstrom</name>
</author>
<author>
<name sortKey="Doolittle, Fw" uniqKey="Doolittle F">FW Doolittle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klappenbach, Ja" uniqKey="Klappenbach J">JA Klappenbach</name>
</author>
<author>
<name sortKey="Saxman, Pr" uniqKey="Saxman P">PR Saxman</name>
</author>
<author>
<name sortKey="Cole, Jr" uniqKey="Cole J">JR Cole</name>
</author>
<author>
<name sortKey="Schmidt, Tm" uniqKey="Schmidt T">TM Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Walsh, Da" uniqKey="Walsh D">DA Walsh</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
<author>
<name sortKey="Kamekura, M" uniqKey="Kamekura M">M Kamekura</name>
</author>
<author>
<name sortKey="Doolittle, Fw" uniqKey="Doolittle F">FW Doolittle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Achenbach, La" uniqKey="Achenbach L">LA Achenbach</name>
</author>
<author>
<name sortKey="Carey, J" uniqKey="Carey J">J Carey</name>
</author>
<author>
<name sortKey="Madigan, Mt" uniqKey="Madigan M">MT Madigan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Tringe, Sg" uniqKey="Tringe S">SG Tringe</name>
</author>
<author>
<name sortKey="Doerks, T" uniqKey="Doerks T">T Doerks</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Enright, Mc" uniqKey="Enright M">MC Enright</name>
</author>
<author>
<name sortKey="Spratt, Bg" uniqKey="Spratt B">BG Spratt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maiden, Mcj" uniqKey="Maiden M">MCJ Maiden</name>
</author>
<author>
<name sortKey="Bygraves, Ja" uniqKey="Bygraves J">JA Bygraves</name>
</author>
<author>
<name sortKey="Feil, E" uniqKey="Feil E">E Feil</name>
</author>
<author>
<name sortKey="Morelli, G" uniqKey="Morelli G">G Morelli</name>
</author>
<author>
<name sortKey="Russell, Je" uniqKey="Russell J">JE Russell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahenthiralingam, E" uniqKey="Mahenthiralingam E">E Mahenthiralingam</name>
</author>
<author>
<name sortKey="Baldwin, A" uniqKey="Baldwin A">A Baldwin</name>
</author>
<author>
<name sortKey="Drevinek, P" uniqKey="Drevinek P">P Drevinek</name>
</author>
<author>
<name sortKey="Vanlaere, E" uniqKey="Vanlaere E">E Vanlaere</name>
</author>
<author>
<name sortKey="Vandamme, P" uniqKey="Vandamme P">P Vandamme</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, F" uniqKey="Zhu F">F Zhu</name>
</author>
<author>
<name sortKey="Massana, R" uniqKey="Massana R">R Massana</name>
</author>
<author>
<name sortKey="Not, F" uniqKey="Not F">F Not</name>
</author>
<author>
<name sortKey="Marie, D" uniqKey="Marie D">D Marie</name>
</author>
<author>
<name sortKey="Vaulot, D" uniqKey="Vaulot D">D Vaulot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loram, Je" uniqKey="Loram J">JE Loram</name>
</author>
<author>
<name sortKey="Boonham, N" uniqKey="Boonham N">N Boonham</name>
</author>
<author>
<name sortKey="O Toole, P" uniqKey="O Toole P">P O'Toole</name>
</author>
<author>
<name sortKey="Trapido Rosenthal, Hg" uniqKey="Trapido Rosenthal H">HG Trapido-Rosenthal</name>
</author>
<author>
<name sortKey="Douglas, Ae" uniqKey="Douglas A">AE Douglas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Colwell, Rk" uniqKey="Colwell R">RK Colwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schloss, Pk" uniqKey="Schloss P">PK Schloss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Maxwell, P" uniqKey="Maxwell P">P Maxwell</name>
</author>
<author>
<name sortKey="Birmingham, A" uniqKey="Birmingham A">A Birmingham</name>
</author>
<author>
<name sortKey="Carnes, J" uniqKey="Carnes J">J Carnes</name>
</author>
<author>
<name sortKey="Caporaso, Jg" uniqKey="Caporaso J">JG Caporaso</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angly, F" uniqKey="Angly F">F Angly</name>
</author>
<author>
<name sortKey="Rodriguez Brito, B" uniqKey="Rodriguez Brito B">B Rodriguez-Brito</name>
</author>
<author>
<name sortKey="Bangor, D" uniqKey="Bangor D">D Bangor</name>
</author>
<author>
<name sortKey="Mcnairnie, P" uniqKey="Mcnairnie P">P McNairnie</name>
</author>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M Breitbart</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schbath, S" uniqKey="Schbath S">S Schbath</name>
</author>
<author>
<name sortKey="Prum, B" uniqKey="Prum B">B Prum</name>
</author>
<author>
<name sortKey="De Turckheim, E" uniqKey="De Turckheim E">E de Turckheim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teeling, H" uniqKey="Teeling H">H Teeling</name>
</author>
<author>
<name sortKey="Waldmann, J" uniqKey="Waldmann J">J Waldmann</name>
</author>
<author>
<name sortKey="Lombardot, T" uniqKey="Lombardot T">T Lombardot</name>
</author>
<author>
<name sortKey="Bauer, M" uniqKey="Bauer M">M Bauer</name>
</author>
<author>
<name sortKey="Glockner, Fo" uniqKey="Glockner F">FO Glöckner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mchardy, Ac" uniqKey="Mchardy A">AC Mchardy</name>
</author>
<author>
<name sortKey="Martin, Hg" uniqKey="Martin H">HG Martín</name>
</author>
<author>
<name sortKey="Tsirigos, A" uniqKey="Tsirigos A">A Tsirigos</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Rigoutsos, I" uniqKey="Rigoutsos I">I Rigoutsos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Ckkk" uniqKey="Chan C">CKKK Chan</name>
</author>
<author>
<name sortKey="Hsu, Al" uniqKey="Hsu A">AL Hsu</name>
</author>
<author>
<name sortKey="Tang, Sll" uniqKey="Tang S">SLL Tang</name>
</author>
<author>
<name sortKey="Halgamuge, Sk" uniqKey="Halgamuge S">SK Halgamuge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Ckk" uniqKey="Chan C">CKK Chan</name>
</author>
<author>
<name sortKey="Hsu, Al" uniqKey="Hsu A">AL Hsu</name>
</author>
<author>
<name sortKey="Halgamuge, Sk" uniqKey="Halgamuge S">SK Halgamuge</name>
</author>
<author>
<name sortKey="Tang, Sl" uniqKey="Tang S">SL Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tzahor, S" uniqKey="Tzahor S">S Tzahor</name>
</author>
<author>
<name sortKey="Aharonovich, Dm" uniqKey="Aharonovich D">DM Aharonovich</name>
</author>
<author>
<name sortKey="Kirkup, B" uniqKey="Kirkup B">B Kirkup</name>
</author>
<author>
<name sortKey="Yogev, T" uniqKey="Yogev T">T Yogev</name>
</author>
<author>
<name sortKey="Frank, Ib" uniqKey="Frank I">IB Frank</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Auch, Af" uniqKey="Auch A">AF Auch</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krause, L" uniqKey="Krause L">L Krause</name>
</author>
<author>
<name sortKey="Diaz, Nn" uniqKey="Diaz N">NN Diaz</name>
</author>
<author>
<name sortKey="Goesmann, A" uniqKey="Goesmann A">A Goesmann</name>
</author>
<author>
<name sortKey="Kelley, S" uniqKey="Kelley S">S Kelley</name>
</author>
<author>
<name sortKey="Nattkemper, Tw" uniqKey="Nattkemper T">TW Nattkemper</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
<author>
<name sortKey="Tate, J" uniqKey="Tate J">J Tate</name>
</author>
<author>
<name sortKey="Mistry, J" uniqKey="Mistry J">J Mistry</name>
</author>
<author>
<name sortKey="Coggill, Pc" uniqKey="Coggill P">PC Coggill</name>
</author>
<author>
<name sortKey="Sammut, Sjj" uniqKey="Sammut S">SJJ Sammut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brady, A" uniqKey="Brady A">A Brady</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Friedberg, I" uniqKey="Friedberg I">I Friedberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kunik, V" uniqKey="Kunik V">V Kunik</name>
</author>
<author>
<name sortKey="Meroz, Y" uniqKey="Meroz Y">Y Meroz</name>
</author>
<author>
<name sortKey="Solan, Z" uniqKey="Solan Z">Z Solan</name>
</author>
<author>
<name sortKey="Sandbank, B" uniqKey="Sandbank B">B Sandbank</name>
</author>
<author>
<name sortKey="Weingart, U" uniqKey="Weingart U">U Weingart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharon, I" uniqKey="Sharon I">I Sharon</name>
</author>
<author>
<name sortKey="Tzahor, S" uniqKey="Tzahor S">S Tzahor</name>
</author>
<author>
<name sortKey="Williamson, S" uniqKey="Williamson S">S Williamson</name>
</author>
<author>
<name sortKey="Shmoish, M" uniqKey="Shmoish M">M Shmoish</name>
</author>
<author>
<name sortKey="Man Aharonovich, D" uniqKey="Man Aharonovich D">D Man-Aharonovich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meroz, Y" uniqKey="Meroz Y">Y Meroz</name>
</author>
<author>
<name sortKey="Horn, D" uniqKey="Horn D">D Horn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Paarmann, D" uniqKey="Paarmann D">D Paarmann</name>
</author>
<author>
<name sortKey="D Souza, M" uniqKey="D Souza M">M D'Souza</name>
</author>
<author>
<name sortKey="Olson, Rd" uniqKey="Olson R">RD Olson</name>
</author>
<author>
<name sortKey="Glass, Em" uniqKey="Glass E">EM Glass</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haft, Dh" uniqKey="Haft D">DH Haft</name>
</author>
<author>
<name sortKey="Selengut, Jd" uniqKey="Selengut J">JD Selengut</name>
</author>
<author>
<name sortKey="White, O" uniqKey="White O">O White</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Mahowald, Ma" uniqKey="Mahowald M">MA Mahowald</name>
</author>
<author>
<name sortKey="Magrini, V" uniqKey="Magrini V">V Magrini</name>
</author>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brulc, Jm" uniqKey="Brulc J">JM Brulc</name>
</author>
<author>
<name sortKey="Antonopoulos, Da" uniqKey="Antonopoulos D">DA Antonopoulos</name>
</author>
<author>
<name sortKey="Berg Miller, Me" uniqKey="Berg Miller M">ME Berg Miller</name>
</author>
<author>
<name sortKey="Wilson, Mk" uniqKey="Wilson M">MK Wilson</name>
</author>
<author>
<name sortKey="Yannarell, Ac" uniqKey="Yannarell A">AC Yannarell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Willner, D" uniqKey="Willner D">D Willner</name>
</author>
<author>
<name sortKey="Furlan, M" uniqKey="Furlan M">M Furlan</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
<author>
<name sortKey="Schmieder, R" uniqKey="Schmieder R">R Schmieder</name>
</author>
<author>
<name sortKey="Angly, Fe" uniqKey="Angly F">FE Angly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Klar, B" uniqKey="Klar B">B Klar</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, D" uniqKey="Huson D">D Huson</name>
</author>
<author>
<name sortKey="Richter, D" uniqKey="Richter D">D Richter</name>
</author>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Auch, A" uniqKey="Auch A">A Auch</name>
</author>
<author>
<name sortKey="Schuster, S" uniqKey="Schuster S">S Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Markowitz, Vm" uniqKey="Markowitz V">VM Markowitz</name>
</author>
<author>
<name sortKey="Ivanova, Nn" uniqKey="Ivanova N">NN Ivanova</name>
</author>
<author>
<name sortKey="Szeto, E" uniqKey="Szeto E">E Szeto</name>
</author>
<author>
<name sortKey="Palaniappan, K" uniqKey="Palaniappan K">K Palaniappan</name>
</author>
<author>
<name sortKey="Chu, K" uniqKey="Chu K">K Chu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lozupone, C" uniqKey="Lozupone C">C Lozupone</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="White, Jr" uniqKey="White J">JR White</name>
</author>
<author>
<name sortKey="Nagarajan, N" uniqKey="Nagarajan N">N Nagarajan</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giardine, B" uniqKey="Giardine B">B Giardine</name>
</author>
<author>
<name sortKey="Riemer, C" uniqKey="Riemer C">C Riemer</name>
</author>
<author>
<name sortKey="Hardison, Rc" uniqKey="Hardison R">RC Hardison</name>
</author>
<author>
<name sortKey="Burhans, R" uniqKey="Burhans R">R Burhans</name>
</author>
<author>
<name sortKey="Elnitski, L" uniqKey="Elnitski L">L Elnitski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kristiansson, E" uniqKey="Kristiansson E">E Kristiansson</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Dalevi, D" uniqKey="Dalevi D">D Dalevi</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bohnebeck, U" uniqKey="Bohnebeck U">U Bohnebeck</name>
</author>
<author>
<name sortKey="Lombardot, T" uniqKey="Lombardot T">T Lombardot</name>
</author>
<author>
<name sortKey="Kottmann, R" uniqKey="Kottmann R">R Kottmann</name>
</author>
<author>
<name sortKey="Glockner, F" uniqKey="Glockner F">F Glockner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lombardot, T" uniqKey="Lombardot T">T Lombardot</name>
</author>
<author>
<name sortKey="Kottmann, R" uniqKey="Kottmann R">R Kottmann</name>
</author>
<author>
<name sortKey="Giuliani, G" uniqKey="Giuliani G">G Giuliani</name>
</author>
<author>
<name sortKey="De Bono, A" uniqKey="De Bono A">A de Bono</name>
</author>
<author>
<name sortKey="Addor, N" uniqKey="Addor N">N Addor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Egerton, Fn" uniqKey="Egerton F">FN Egerton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gianoulis, Ta" uniqKey="Gianoulis T">TA Gianoulis</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Patel, Pv" uniqKey="Patel P">PV Patel</name>
</author>
<author>
<name sortKey="Bjornson, R" uniqKey="Bjornson R">R Bjornson</name>
</author>
<author>
<name sortKey="Korbel, Jo" uniqKey="Korbel J">JO Korbel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author>
<name sortKey="Daugherty, Sc" uniqKey="Daugherty S">SC Daugherty</name>
</author>
<author>
<name sortKey="Van Aken, Se" uniqKey="Van Aken S">SE Van Aken</name>
</author>
<author>
<name sortKey="Pai, Gh" uniqKey="Pai G">GH Pai</name>
</author>
<author>
<name sortKey="Watkins, Kl" uniqKey="Watkins K">KL Watkins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Teeling, H" uniqKey="Teeling H">H Teeling</name>
</author>
<author>
<name sortKey="Ivanova, Nn" uniqKey="Ivanova N">NN Ivanova</name>
</author>
<author>
<name sortKey="Huntemann, M" uniqKey="Huntemann M">M Huntemann</name>
</author>
<author>
<name sortKey="Richter, M" uniqKey="Richter M">M Richter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kannan, N" uniqKey="Kannan N">N Kannan</name>
</author>
<author>
<name sortKey="Taylor, Sss" uniqKey="Taylor S">SSS Taylor</name>
</author>
<author>
<name sortKey="Zhai, Y" uniqKey="Zhai Y">Y Zhai</name>
</author>
<author>
<name sortKey="Venter, Jcc" uniqKey="Venter J">JCC Venter</name>
</author>
<author>
<name sortKey="Manning, G" uniqKey="Manning G">G Manning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brussow, H" uniqKey="Brussow H">H Brussow</name>
</author>
<author>
<name sortKey="Hendrix, Rw" uniqKey="Hendrix R">RW Hendrix</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mann, Nh" uniqKey="Mann N">NH Mann</name>
</author>
<author>
<name sortKey="Cook, A" uniqKey="Cook A">A Cook</name>
</author>
<author>
<name sortKey="Millard, A" uniqKey="Millard A">A Millard</name>
</author>
<author>
<name sortKey="Bailey, S" uniqKey="Bailey S">S Bailey</name>
</author>
<author>
<name sortKey="Clokie, M" uniqKey="Clokie M">M Clokie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Millard, A" uniqKey="Millard A">A Millard</name>
</author>
<author>
<name sortKey="Clokie, Mrj" uniqKey="Clokie M">MRJ Clokie</name>
</author>
<author>
<name sortKey="Shub, Da" uniqKey="Shub D">DA Shub</name>
</author>
<author>
<name sortKey="Mann, Nh" uniqKey="Mann N">NH Mann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharon, I" uniqKey="Sharon I">I Sharon</name>
</author>
<author>
<name sortKey="Alperovitch, A" uniqKey="Alperovitch A">A Alperovitch</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
<author>
<name sortKey="Glaser, F" uniqKey="Glaser F">F Glaser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Delwart, El" uniqKey="Delwart E">EL Delwart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nishizawa, T" uniqKey="Nishizawa T">T Nishizawa</name>
</author>
<author>
<name sortKey="Okamoto, H" uniqKey="Okamoto H">H Okamoto</name>
</author>
<author>
<name sortKey="Konishi, K" uniqKey="Konishi K">K Konishi</name>
</author>
<author>
<name sortKey="Yoshizawa, H" uniqKey="Yoshizawa H">H Yoshizawa</name>
</author>
<author>
<name sortKey="Miyakawa, Y" uniqKey="Miyakawa Y">Y Miyakawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simons, Jn" uniqKey="Simons J">JN Simons</name>
</author>
<author>
<name sortKey="Leary, Tp" uniqKey="Leary T">TP Leary</name>
</author>
<author>
<name sortKey="Dawson, Gj" uniqKey="Dawson G">GJ Dawson</name>
</author>
<author>
<name sortKey="Pilot Matias, Tj" uniqKey="Pilot Matias T">TJ Pilot-Matias</name>
</author>
<author>
<name sortKey="Muerhoff, As" uniqKey="Muerhoff A">AS Muerhoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yin, Y" uniqKey="Yin Y">Y Yin</name>
</author>
<author>
<name sortKey="Fischer, D" uniqKey="Fischer D">D Fischer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hambly, E" uniqKey="Hambly E">E Hambly</name>
</author>
<author>
<name sortKey="Suttle, C" uniqKey="Suttle C">C Suttle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boyer, Hw" uniqKey="Boyer H">HW Boyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kass, S" uniqKey="Kass S">S Kass</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tost, J" uniqKey="Tost J">J Tost</name>
</author>
<author>
<name sortKey="Gut, Ig" uniqKey="Gut I">IG Gut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Batley, J" uniqKey="Batley J">J Batley</name>
</author>
<author>
<name sortKey="Edwards, D" uniqKey="Edwards D">D Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Richter, Bg" uniqKey="Richter B">BG Richter</name>
</author>
<author>
<name sortKey="Sexton, Dp" uniqKey="Sexton D">DP Sexton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailly, J" uniqKey="Bailly J">J Bailly</name>
</author>
<author>
<name sortKey="Fraissinet Tachet, L" uniqKey="Fraissinet Tachet L">L Fraissinet-Tachet</name>
</author>
<author>
<name sortKey="Verner, Mc" uniqKey="Verner M">MC Verner</name>
</author>
<author>
<name sortKey="Debaud, Jc" uniqKey="Debaud J">JC Debaud</name>
</author>
<author>
<name sortKey="Lemaire, M" uniqKey="Lemaire M">M Lemaire</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilmes, P" uniqKey="Wilmes P">P Wilmes</name>
</author>
<author>
<name sortKey="Bond, Pl" uniqKey="Bond P">PL Bond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilmes, P" uniqKey="Wilmes P">P Wilmes</name>
</author>
<author>
<name sortKey="Bond, Pl" uniqKey="Bond P">PL Bond</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="review-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id>
<journal-title-group>
<journal-title>PLoS Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">20195499</article-id>
<article-id pub-id-type="pmc">2829047</article-id>
<article-id pub-id-type="publisher-id">09-PLCB-RV-0525R3</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1000667</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Review</subject>
</subj-group>
<subj-group subj-group-type="Discipline">
<subject>Computational Biology/Genomics</subject>
<subject>Computational Biology/Metagenomics</subject>
<subject>Computational Biology/Sequence Motif Analysis</subject>
<subject>Microbiology</subject>
<subject>Microbiology/Environmental Microbiology</subject>
<subject>Microbiology/Microbial Evolution and Genomics</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Primer on Metagenomics</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wooley</surname>
<given-names>John C.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Godzik</surname>
<given-names>Adam</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Friedberg</surname>
<given-names>Iddo</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Program in Bioinformatics and Systems Biology, Burnham Institute for Medical Research, La Jolla, California, United States of America</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Department of Microbiology, Miami University, Oxford, Ohio, United States of America</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Bourne</surname>
<given-names>Philip E.</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">University of California San Diego, United States of America</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>i.friedberg@muohio.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<month>2</month>
<year>2010</year>
</pub-date>
<pmc-comment> Fake ppub added to accomodate plos workflow change from 03/2008 and 03/2009 </pmc-comment>
<pub-date pub-type="ppub">
<month>2</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="epub">
<day>26</day>
<month>2</month>
<year>2010</year>
</pub-date>
<volume>6</volume>
<issue>2</issue>
<elocation-id>e1000667</elocation-id>
<permissions>
<copyright-statement>Wooley et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</copyright-statement>
</permissions>
<abstract>
<p>Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.</p>
</abstract>
<counts>
<page-count count="13"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>For most of its history, life on Earth consisted solely of microscopic life forms, and microbial life still dominates Earth in many aspects. The estimated 5×10
<sup>30</sup>
prokaryotic cells inhabiting our planet sequester some 350–550 Petagrams (1 Pg = 10
<sup>15</sup>
g) of carbon, 85–130 Pg of nitrogen, and 9–14 Pg of phosphorous making them the largest reservoir of those nutrients on Earth
<xref ref-type="bibr" rid="pcbi.1000667-Whitman1">[1]</xref>
. Bacteria and archaea live in all environments capable of sustaining other life and in many cases are the sole inhabitants of extreme environments: from deep sea vents with temperatures of 340°C to rocks found in boreholes 6 km beneath the Earth's surface. Bacteria, archea, and microeukaryotes dominate Earth's habitats, compound recycling, nutrient sequestration, and, according to some estimates, biomass. Microbes are not only ubiquitous, they are essential to all life, as they are the primary source for nutrients, and the primary recyclers of dead matter back to available organic form. Along with all other animals and plants, the human condition is profoundly affected by microbes, from the scourges of human, farm animal, and crop pandemics, to the benefits in agriculture, food industry, and medicine to name a few. We humans have more bacterial cells (10
<sup>14</sup>
) inhabiting our body than our own cells (10
<sup>13</sup>
)
<xref ref-type="bibr" rid="pcbi.1000667-Savage1">[2]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Berg1">[3]</xref>
. It has been stated that the key to understanding the human condition lies in understanding the human genome
<xref ref-type="bibr" rid="pcbi.1000667-Collins1">[4]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Kaput1">[5]</xref>
. But given our intimate relationship with microbes
<xref ref-type="bibr" rid="pcbi.1000667-OHara1">[6]</xref>
, researching the human genome is now understood to be a necessary though insufficient condition: sequencing the genomes of our own microbes would be necessary too. Also, to better understand the role of microbes in the biosphere, it would be necessary to undertake a genomic study of them as well.</p>
<p>The study of microbial genomes started in the late 1970s, with the sequencing of the genomes of bacteriophages MS2
<xref ref-type="bibr" rid="pcbi.1000667-Fiers1">[7]</xref>
and ϕ-X174
<xref ref-type="bibr" rid="pcbi.1000667-Sanger1">[8]</xref>
. In 1995 microbiology took a major step with the sequencing of the first bacterial genome
<italic>Haemophilus influenza</italic>
<xref ref-type="bibr" rid="pcbi.1000667-Fleischmann1">[9]</xref>
. The genomes of 916 bacterial, 1,987 viral, and 67 archaeal species are deposited in GenBank release 2.2.6. Having on hand such a large number of microbial genomes has changed the nature of microbiology and of microbial evolution studies. By providing the ability to examine the relationship of genome structure and function across many different species, these data have also opened up the fields of comparative genomics and of systems biology. Nevertheless, single organism genome studies have limits. First, technology limitations mean that an organism must first be clonally cultured to sequence its entire genome. However, only a small percentage of the microbes in nature can be cultured, which means that extant genomic data are highly biased and do not represent a true picture of the genomes of microbial species
<xref ref-type="bibr" rid="pcbi.1000667-Amann1">[10]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-Rapp1">[12]</xref>
. Second, very rarely do microbes live in single species communities: species interact both with each other and with their habitats, which may also include host organisms. Therefore, a clonal culture also fails to represent the true state of affairs in nature with respect to organism interaction, and the resulting population genomic variance and biological functions.</p>
<p>New sequencing technologies and the drastic reduction in the cost of sequencing are helping us overcome these limits. We now have the ability to obtain genomic information directly from microbial communities in their natural habitats. Suddenly, instead of looking at a few species individually, we are able to study tens of thousands all together. Sequence data taken directly from the environment were dubbed the metagenome
<xref ref-type="bibr" rid="pcbi.1000667-Handelsman1">[13]</xref>
, and the study of sequence data directly from the environment—metagenomics
<xref ref-type="bibr" rid="pcbi.1000667-Rondon1">[14]</xref>
.</p>
<p>However, environmental sequencing comes with its own information-restricting price tag. In single organism genomics practically all of the microbe's genome is sequenced, providing a complete picture of the genome. We know from which species the DNA or RNA originated. After assembly, the location of genes, operons, and transcriptional units can be computationally inferred. Control elements and other cues can be identified to infer transcriptional and translational units. Consequently, we achieve a nearly complete and well-ordered picture of all the genomic elements in the sequenced organism. We may not recognize all the elements for what they are, and some errors may creep in, but we can gauge the breadth of our knowledge and properly annotate those areas of the genome we manage to decipher.</p>
<p>In contrast, the sequences obtained from environmental genomic studies are fragmented. Each fragment was obviously sequenced from a specific species, but there can be many different species in a single sample, for most of which a full genome is not available. In many cases it is impossible to determine the true species of origin. The length of each fragment can be anywhere between 20 base pairs (bp) and 700 bp, depending on the sequencing method used. Short sequence reads that are dissociated from their original species can be assembled to lengths usually not exceeding 5,000 bp; consequently, the reconstruction of a whole genome is generally not possible. Even the reconstruction of an entire transcriptional unit can be problematic. In addition to being fragmented and incomplete, the volume of sequence data acquired by environmental sequencing is several orders of magnitude larger than that acquired in single organism genomics.</p>
<p>For these reasons, computational biologists have been developing new algorithms to analyze metagenomic data. These computational challenges are new and very exciting. We are entering an era akin to that of the first genomic revolution almost two decades ago. Whole organism genomics allows us to examine the evolution not only of single genes, but of whole transcriptional units, chromosomes, and cellular networks. But more recently, metagenomics gave us the ability to study, on the most fundamental genomic level, the relationship between microbes and the communities and habitats in which they live. How does the adaptation of microbes to different environments, including host animals and other microbes, manifest itself in their genomes?</p>
<p>For us humans, this question can strike very close to home, when those habitats are our own bodies and the microbes are associated with our own well-being and illnesses: almost every aspect of human life, as well as the life of every other living being on the planet, is affected by microbes. We now have the experimental technology to understand microbial communities and how they affect us, but the sheer volume and fragmentary nature of the data challenge computational biologists to distill all these data into useful information.</p>
<p>In this article we shall briefly outline some experimental, technological, and computational achievements and challenges associated with metagenomic data, from sequence generation and assembly through the various levels of metagenomic annotation. We will also discuss computational issues that are unique to environmental genomics, such as estimating the metagenome size and the handling of associated metadata. Finally, we will review some studies highlighting the advantages of metagenomic-based research, and some of the insights it has enabled.</p>
</sec>
<sec id="s2">
<title>Sampling</title>
<sec id="s2a">
<title>Sample Size and Number of Samples</title>
<p>The first step in a metagenomic study is to obtain the environmental sample. Samples should represent the population from which they are taken. The problem in microbial ecology is that we are unable to see the organisms we are trying to capture. How many samples are enough?</p>
<p>To estimate the fraction of species sequenced, rarefaction curves are typically used. A rarefaction curve plots the number of species as a function of the number of individuals sampled. The curve usually begins with a steep slope, which at some point begins to flatten as fewer species are being discovered per sample: the gentler the slope, the less contribution of the sampling to the total number of operational taxonomic units or OTUs. For microbial samples, different OTUs are typically characterized by 16S (prokaryotic) or 18S (eukaryotic) rDNA, and are also referred to as ribotypes. Classification is rarely done in the field, so some initial estimate of species diversity by a pilot study or previous studies is desirable to gauge the number of samples needed to get a comprehensive picture of the OTUs in the sampled habitat. More of this will be discussed in the “
<xref ref-type="sec" rid="s6">Species Diversity</xref>
” section below.</p>
</sec>
<sec id="s2b">
<title>Filtering</title>
<p>When filtering an environmental sample, as with any kind of filtering, the goals are: (1) get as much as you can of what you want and (2) leave out as much as you can from what you do not want. So if we are interested in bacteria only, our goal would be to filter out the smaller viroid particles, and the usually larger protists. Of course, this process will leave in the lysogenic phages and prophages, which are integrated in bacterial genetic material, as well as mimivirus particles, which are as large as some bacteria. On the other side of the size scale, small protists and large bacteria may overlap in size, making a full size-based separation impossible. Also, filamentous forms of bacteria that grow in multicellular colonies may also be filtered out owing to colony size exceeding that of the filter's pores.</p>
<p>Computational filtering can be used after sequencing. Genomic material that is obviously within the clades of interest can be filtered in using similarity searches against annotated sequence databases. Care must be taken, though, with false negatives: relevant genomic material may be filtered out in this fashion simply because homologs have never been deposited in existing databases. Another option would be to search for obviously false-positive sequence motifs, e.g., eukaryote material when only prokaryote material is to be analyzed. This technique can also be used to detect sample contamination.</p>
</sec>
<sec id="s2c">
<title>Recording Metadata</title>
<p>Keeping strict and comprehensive records of metadata is as important as the sequence data. Metadata are the “data about the data”: where the samples were taken from, when, and under which conditions. In microbial ecology, this commonly refers to physical, chemical, and other environmental characteristics of the sample's location. For example, an ocean sample metadata will typically include sampling date and time, depth, salinity, light intensity, geographical coordinates, pH, soluble gases, etc. In clinical microbiology, metadata would refer to the pathology, medical history, and vital statistics of the patient as well as the exact location and tissue from which the sample was taken, the sampling conditions, and so on.</p>
<p>Many metagenomic studies are driven by discovery and data mining, rather than by hypothesis. These studies seek statistically significant correlations between the metagenomic data and the habitat-associated metadata, which may lead to biologically significant discoveries. There is therefore a need to provide metadata in a form that is standard, comprehensive, and amenable to computation. For example, semantic information should be provided, wherever possible, in ontological form. A description of the environmental context and the experimental methods used is vital to enable comparative studies. As we shall see, genes or even “gene-less” sequence signatures are linked to habitats rather than to species. Finally, sequencing technology is rapidly improving, and the adoption of new sequencing methods will require the adoption of descriptors of those methods such as sequence coverage, quality, assembly programs that were used, and so on.</p>
<p>The Genomic Standards Consortium (
<ext-link ext-link-type="uri" xlink:href="http://gensc.org/">http://gensc.org/</ext-link>
) is an international group working to standardize the description of genomes and metagenomes and the exchange of genomic data and metadata. In a recent publication, a standard for the Minimum Information of Genomic and Metagenomic (MIGS/MIMS) metadata was suggested for adoption
<xref ref-type="bibr" rid="pcbi.1000667-Field1">[15]</xref>
, and an associated markup language, the Genomics Contextual Data Markup Language or GCDML is under active development
<xref ref-type="bibr" rid="pcbi.1000667-Kottmann1">[16]</xref>
. It is the consortium's aim that the MIGS/MIMS shall be adopted by journals as a publication requirement when genomic or metagenomic data are being deposited, akin to standards such as MIAME for microarray data
<xref ref-type="bibr" rid="pcbi.1000667-Brazma1">[17]</xref>
or PDB/mmCIF for structural biology data
<xref ref-type="bibr" rid="pcbi.1000667-Westbrook1">[18]</xref>
.</p>
</sec>
</sec>
<sec id="s3">
<title>Sequencing</title>
<sec id="s3a">
<title>First, Second, and Third Generation Sequencing</title>
<p>Until recently, prokaryotic genomes have been typically sequenced using Sanger shotgun
<xref ref-type="bibr" rid="pcbi.1000667-Sanger2">[19]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Sanger3">[20]</xref>
sequencing. The first step is shearing the DNA content of a genomic clone into random fragments, hence the “shotgun.” The fragments are then cloned into plasmid vectors that are grown in monoclonal libraries to produce enough genomic material for sequencing. The DNA is then sequenced using dye-termination methods. Repetition of this process ensures that all parts of the studied genome are sequenced, several times over. Assembly software is then used to assemble the sequence fragments into the whole genome. Theoretically any genome shorter than 5 Mbp can be assembled this way, although regions with large repeats tend to frustrate assembly algorithms. Therefore, regions with large repeats are often not incorporated into the whole genomic picture, leaving some gaps. Another disadvantage of shotgun sequencing is the “cloning bias.” Some genes cannot be incorporated into the library vector, usually because of toxicity to the vector expressing them
<xref ref-type="bibr" rid="pcbi.1000667-Sorek1">[21]</xref>
. This inability to be incorporated is typically mitigated by using more than one organism for cloning, or by using sequencing techniques that do not require cloning (see below) in second generation sequencing.</p>
<p>In metagenomics, shotgun sequencing is done in the same manner as in clonal culture genomics. However, the raw genomic material does not come from a single organism: it comes from a community of microbes, hence the name environmental shotgun sequencing or ESS. Depending on our ability to sample, this DNA may provide only a partial genomic picture of the organisms in the environment, since the genomic material from the more abundant species dominates the sample. To obtain a better picture of the species composing the community, 16S rDNA or 18S rDNA for prokaryotic and eukaryotic samples, respectively, are sequenced separately using universal primers, see
<xref ref-type="fig" rid="pcbi-1000667-g001">Figure 1</xref>
. It should be noted that when using primers for rDNA to classify OTUs in an environmental sample, there are choices to be made regarding the primer sequence, especially when the studied OTU composition is expected to differ significantly from most known species, the so-called rare biosphere
<xref ref-type="bibr" rid="pcbi.1000667-PedrosAlio1">[22]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Sogin1">[23]</xref>
. In this case, there is the possibility that the primers used will be too different from the rDNA in the sample, which would result in many OTUs not being identified
<xref ref-type="bibr" rid="pcbi.1000667-Hamp1">[24]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Neufeld1">[25]</xref>
.</p>
<fig id="pcbi-1000667-g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1000667.g001</object-id>
<label>Figure 1</label>
<caption>
<title>Environmental Shotgun Sequencing (ESS).</title>
<p>(A) Sampling from habitat; (B) filtering particles, typically by size; (C) DNA extraction and lysis; (D) cloning and library; (E) sequence the clones; (F) sequence assembly.</p>
</caption>
<graphic xlink:href="pcbi.1000667.g001"></graphic>
</fig>
<p>Second generation sequencing methods have been rapidly gaining ground and are replacing Sanger sequencing for small sized genomes and environmental genomics. A common denominator among second generation methods is the generation of “polymerase colonies” or polonies
<xref ref-type="bibr" rid="pcbi.1000667-Mitra1">[26]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Porreca1">[27]</xref>
. Polonies are PCR amplicons derived from a single molecule of nucleic acid. Thousands to millions of polonies, each with an effective reaction size of 10
<sup>−9</sup>
l to 10
<sup>−12</sup>
l can be amplified simultaneously, generating templates for sequencing. Following that, enzymatic reactions can be performed in parallel to sequence the nucleic acid material in the polonies. Polony-based methods produce considerably more sequences than Sanger sequencing, but those sequences are much shorter. Furthermore, each polony-based method has its own anomalies that should be accounted for when processing the data. See
<xref ref-type="table" rid="pcbi-1000667-t001">Table 1</xref>
for a comparison between the yield, fragment length, and run times of the different sequencers.</p>
<table-wrap id="pcbi-1000667-t001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1000667.t001</object-id>
<label>Table 1</label>
<caption>
<title>Comparison of different sequencing technologies, taken from
<xref ref-type="bibr" rid="pcbi.1000667-McPherson1">[34]</xref>
.</title>
</caption>
<alternatives>
<graphic id="pcbi-1000667-t001-1" xlink:href="pcbi.1000667.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Sequencer</td>
<td align="left" rowspan="1" colspan="1">ABI 3730</td>
<td align="left" rowspan="1" colspan="1">Roche 454</td>
<td align="left" rowspan="1" colspan="1">Solexa
<xref ref-type="table-fn" rid="nt101">a</xref>
</td>
<td align="left" rowspan="1" colspan="1">SOLiD (mp, frag)
<xref ref-type="table-fn" rid="nt102">b</xref>
</td>
<td align="left" rowspan="1" colspan="1">HeliScope
<xref ref-type="table-fn" rid="nt103">c</xref>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Read length</bold>
</td>
<td align="left" rowspan="1" colspan="1">600–900</td>
<td align="left" rowspan="1" colspan="1">400–500</td>
<td align="left" rowspan="1" colspan="1">75–100</td>
<td align="left" rowspan="1" colspan="1">50</td>
<td align="left" rowspan="1" colspan="1">25–35</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Run time</bold>
</td>
<td align="left" rowspan="1" colspan="1">6–10 h</td>
<td align="left" rowspan="1" colspan="1">10 h</td>
<td align="left" rowspan="1" colspan="1">2–10 d</td>
<td align="left" rowspan="1" colspan="1">(4–7 d,8–14 d)</td>
<td align="left" rowspan="1" colspan="1">h</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Yield (Mbp)</bold>
</td>
<td align="left" rowspan="1" colspan="1">0.01</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">2,300–3,500/d</td>
<td align="left" rowspan="1" colspan="1">(500, 1,000)</td>
<td align="left" rowspan="1" colspan="1">105–140/h</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Cloning bias</bold>
</td>
<td align="left" rowspan="1" colspan="1">Yes</td>
<td align="left" rowspan="1" colspan="1">No</td>
<td align="left" rowspan="1" colspan="1">No</td>
<td align="left" rowspan="1" colspan="1">No</td>
<td align="left" rowspan="1" colspan="1">No</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Mate pair information</bold>
</td>
<td align="left" rowspan="1" colspan="1">Yes</td>
<td align="left" rowspan="1" colspan="1">No</td>
<td align="left" rowspan="1" colspan="1">Yes</td>
<td align="left" rowspan="1" colspan="1">Yes</td>
<td align="left" rowspan="1" colspan="1">No</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="nt101">
<label>a</label>
<p>Based on the GA IIx. See full specifications at:
<ext-link ext-link-type="uri" xlink:href="http://www.illumina.com/systems/genome_analyzer.ilmn">http://www.illumina.com/systems/genome_analyzer.ilmn</ext-link>
.</p>
</fn>
<fn id="nt102">
<label>b</label>
<p>mp, mate pair; frag, fragment. See
<ext-link ext-link-type="uri" xlink:href="https://products.appliedbiosystems.com/">https://products.appliedbiosystems.com/</ext-link>
SOLiD 3 Plus System.</p>
</fn>
<fn id="nt103">
<label>c</label>
<p>See:
<ext-link ext-link-type="uri" xlink:href="http://www.helicosbio.com/Products/HelicosregGeneticAnalysisSystem/HeliScopetradeSequencer/tabid/87/Default.aspx">http://www.helicosbio.com/Products/HelicosregGeneticAnalysisSystem/HeliScopetradeSequencer/tabid/87/Default.aspx</ext-link>
.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In pyrosequencing (
<xref ref-type="fig" rid="pcbi-1000667-g002">Figure 2</xref>
)
<xref ref-type="bibr" rid="pcbi.1000667-Nyrn1">[28]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Ronaghi1">[29]</xref>
, methods such as Roche 454
<xref ref-type="bibr" rid="pcbi.1000667-Margulies1">[30]</xref>
sequencing is performed by polymerase extension of a primed template. Single nucleotide species are added at each cycle. If the particular nucleotide species added to the polymerase reaction pairs with the one on the template, the incorporation causes luciferase-based light reaction. The reaction chamber is then washed, and the cycle repeated. Several hundreds of thousands of wells containing material for sequencing are typically used in a single reaction. Second is the inability to read long mononucleotide repeats correctly.</p>
<fig id="pcbi-1000667-g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1000667.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Pyrosequencing.</title>
<p>Single stranded DNA template is first hybridized with the sequencing primer and mixed with the enzymes along with the two substrates adenosine 5′-phosphosulfate (APS) and luciferin. In each cycle, (1) one of the four nucleotides (dTTPi, in this case) is then added to the reaction. (2) If the nucleotide is complementary to the base in the template strand then the DNA polymerase incorporates it into the growing strand. (3) Pyrophosphate (PPi)—in an amount equal in molarity to that of the incorporated nucleotide—is released and converted to ATP by sulfurylase in the presence of APS. (4) ATP then serves as a substrate to luciferase, causing a light reaction. Photon emission is in equimolar quanta to the amount of nucleotide incorporated in a given cycle. (5) The excess nucleotides are degraded by apyrase.</p>
</caption>
<graphic xlink:href="pcbi.1000667.g002"></graphic>
</fig>
<p>ABI SOLiD and Illumina GAII sequencers produce even shorter reads: 25–100 bp, but very large volumes of DNA per sequencing run. As we shall see in the “
<xref ref-type="sec" rid="s4">Assembly</xref>
” section below, despite the individual short read lengths, these technologies provide a viable alternative for sequencing whole genomes, by sheer volume of DNA sequenced. For further reading on second generation sequencing see
<xref ref-type="bibr" rid="pcbi.1000667-Holt1">[31]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-McPherson1">[34]</xref>
.</p>
<p>Third generation sequencing, loosely defined as technology that is capable of sequencing long sequences without amplification, is in advanced development. There are encouraging signs that this technology might be available as early as 2011
<xref ref-type="bibr" rid="pcbi.1000667-Clarke1">[35]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-Branton1">[37]</xref>
.</p>
</sec>
</sec>
<sec id="s4">
<title>Assembly</title>
<p>When sequencing a whole genome, the reads are assembled into progressively longer contiguous sequences or contigs, and finally to the whole genome. Dealing with genomic data, we are used to analyzing long stretches of contiguous sequence data. This analysis lets us find not only open reading frames, but also operons, operational transcriptional units, their associated promoter elements, and transcription factor binding sites. Longer elements such as pathogenicity islands, and other mobile genetic elements are evident only when large fractions of the genome are assembled. The gain of information correlates with the length of the genomic elements.
<xref ref-type="table" rid="pcbi-1000667-t002">Table 2</xref>
shows the length of a genomic sequence, and the information that may be gleaned from it.</p>
<table-wrap id="pcbi-1000667-t002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1000667.t002</object-id>
<label>Table 2</label>
<caption>
<title>The information contained in different lengths of genomic DNA.</title>
</caption>
<alternatives>
<graphic id="pcbi-1000667-t002-2" xlink:href="pcbi.1000667.t002"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Sequence Length (bp)</td>
<td align="left" rowspan="1" colspan="1">Genome Element</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">25–75</td>
<td align="left" rowspan="1" colspan="1">SNPs, short frameshift mutations</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">100–400</td>
<td align="left" rowspan="1" colspan="1">Short functional signatures</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">500–1,000</td>
<td align="left" rowspan="1" colspan="1">Whole domains, single domain genes</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1,000–5,000</td>
<td align="left" rowspan="1" colspan="1">Short operons, multidomain genes</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">5,000–10,000</td>
<td align="left" rowspan="1" colspan="1">Longer operons, some
<italic>cis</italic>
-control elements</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">>100,000</td>
<td align="left" rowspan="1" colspan="1">Prophages, pathogenicity islands, various mobile insertion elements</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">>1,000,000</td>
<td align="left" rowspan="1" colspan="1">Whole prokaryotic chromosome organization</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>In contrast, in all but the most species-poor metagenome, a full assembly is not possible—first, because the sampling is incomplete, and many if not all species' genomes are partially sampled, if at all; second, because the species information itself is incomplete, and it is difficult to map individual reads to their species of origin. Therefore, the analysis of genomic elements using metagenomic data is generally limited to the first three or four rows in
<xref ref-type="table" rid="pcbi-1000667-t002">Table 2</xref>
.</p>
<p>In this section, we will discuss assembly of metagenomic data, how information is extracted from partial assemblies, and how the extent of information gained can be estimated.</p>
<sec id="s4a">
<title>Metagenomic Sample Coverage</title>
<sec id="s4a1">
<title>Coverage</title>
<p>Coverage of a genome is defined as the mean number of times a nucleotide is being sequenced. Thus, 5× coverage means that each nucleotide in the genome is sequenced a mean number of five times. If we could sequence a genome in a single read, then 1× coverage would suffice for sequencing. Shorter read lengths (25–700, depending on sequencing technologies, see
<xref ref-type="table" rid="pcbi-1000667-t001">Table 1</xref>
), necessitate more coverage, to ensure all reads overlap, and that those overlaps are unique enough to reconstruct the genome by assembling the fragments. If we treat DNA shearing and sequencing as random events, and our ability to detect and overlap between two truly overlapping reads does not vary between clones (when those are used), then we can use a Poisson distribution model to estimate the number of reads required to sequence an entire genome. This model is given by the Lander-Waterman equation
<xref ref-type="bibr" rid="pcbi.1000667-Lander1">[38]</xref>
:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e001"></graphic>
</disp-formula>
</p>
<p>Where
<italic>L</italic>
is the read length,
<italic>N</italic>
is the number of reads,
<italic>G</italic>
is the genome length, and
<italic>C</italic>
is coverage as described above. The fraction of sequence covered would be given as:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e002"></graphic>
</disp-formula>
</p>
<p>To get the number of reads sequencing fraction
<italic>P</italic>
<sub>0</sub>
of the genome
<disp-formula>
<graphic xlink:href="pcbi.1000667.e003"></graphic>
</disp-formula>
</p>
<p>In an environmental sample containing
<italic>l</italic>
species, the metagenome size
<italic>G</italic>
<sub>m</sub>
is:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e004"></graphic>
</disp-formula>
</p>
<p>Where
<italic>G</italic>
<sub>i</sub>
is the size of any given genome in a sample containing
<italic>l</italic>
genomes, and
<italic>n</italic>
<sub>i</sub>
the number of copies of genome
<italic>g</italic>
<sub>i</sub>
.</p>
<p>However the species that constitute the sample appear in different frequencies in the metagenome. Therefore a metagenome of size
<italic>G</italic>
<sub>m</sub>
composed of genomes of sizes
<italic>G</italic>
<sub>1</sub>
through
<italic>G</italic>
<sub>k</sub>
can be viewed as a sum of fractions. Each component genome of size
<italic>G</italic>
<sub>i</sub>
constitutes a fraction of
<italic>G</italic>
<sub>m</sub>
:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e005"></graphic>
</disp-formula>
and:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e006"></graphic>
</disp-formula>
</p>
<p>Where
<italic>p</italic>
<sub>i</sub>
is the fraction of copies of the genome of species
<italic>i</italic>
in the sample and
<italic>G</italic>
<sub>i</sub>
is the size of the genome of species
<italic>i</italic>
.</p>
<p>Using species-specific gene markers, usually small ribosomal subunit rDNA, it is possible to estimate the species diversity in the sample, and provide an estimate of the different
<italic>p</italic>
<sub>i</sub>
values. Nevertheless, full or sometimes even adequate coverage (as judged by the rarefaction curve) of a species-rich environmental sample may be unattainable, especially for the genomes of the less represented species
<xref ref-type="bibr" rid="pcbi.1000667-Torsvik1">[39]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-Countway1">[42]</xref>
. We expand upon this subject in the “
<xref ref-type="sec" rid="s6">Species Diversity</xref>
” section.</p>
<p>Jeroen Raes and his colleagues have suggested an effective genome size or EGS measure that includes multiple plasmid copies, inserted sequences, and associated phages and viruses
<xref ref-type="bibr" rid="pcbi.1000667-Raes1">[43]</xref>
. EGS uses the density (counts per megabase) of single copy marker genes to extrapolate the EGS.
<disp-formula>
<graphic xlink:href="pcbi.1000667.e007"></graphic>
</disp-formula>
</p>
<p>Where
<italic>L</italic>
is the read length,
<italic>x</italic>
is the marker gene density,
<italic>a</italic>
,
<italic>b</italic>
, and
<italic>c</italic>
are empirical parameters empirically derived from 154 simulated metagenomes and found to be 21.2, 4,230, and 0.733, respectively. Raes and colleagues derived this formula from several different metagenomes, providing a useful measure of central tendency for genome size using a metagenomic sample. Note that
<italic>a</italic>
,
<italic>b</italic>
, and
<italic>c</italic>
were derived from simulated metagenomes, Therefore, care must be taken in using the EGS formula above, since the parameters given only provide a snapshot of a particular simulation. It is probably better to use EGS as a framework, in conjunction with a metagenomic simulator such as MetaSim
<xref ref-type="bibr" rid="pcbi.1000667-Richter1">[44]</xref>
to generate parameters more compatible with population estimates in one's own research. MetaSim enables the creation of a simulated genome from regular genomic files; this makes it useful for testing and assessing the performance of other programs that manipulate and analyze metagenomic data, such as assembly or annotation programs.</p>
</sec>
</sec>
<sec id="s4b">
<title>Metagenome Assembly</title>
<p>In a genome project of a single organism or clone we can be certain that all extracted DNA fragments belong to the same genome, barring contaminants and extrachromosomal DNA. That is not the case when a metagenome is concerned. As we have just seen, coverage is usually incomplete, since environmental sequence sampling rarely produces all the sequences required for assembly. Furthermore, there is also the danger of assembling sequences from different OTUs, creating interspecies chimeras. Phrap, Forge, Arachne
<xref ref-type="bibr" rid="pcbi.1000667-Batzoglou1">[45]</xref>
, JAZZ
<xref ref-type="bibr" rid="pcbi.1000667-Aparicio1">[46]</xref>
, and the Celera Assembler
<xref ref-type="bibr" rid="pcbi.1000667-Myers1">[47]</xref>
are all assembly programs that were developed for single genome assembly from Sanger sequencing. They seem to provide good results even when assembling metagenomic sequence data from Sanger sequencing
<xref ref-type="bibr" rid="pcbi.1000667-Mavromatis1">[48]</xref>
. Most of these algorithms use mate-pair information for the assemblies. This information is used in assembly to check the scaffolds or the assembled intermediaries between raw reads and whole chromosomes. These assembly algorithms represent each read as a vertex and each detected overlap as an edge between the overlapping vertices. Finding the correct assembly is cast as a Hamiltonian path finding problem, for finding a path in a graph where each vertex is visited once (see
<xref ref-type="fig" rid="pcbi-1000667-g003">Figure 3A–3C</xref>
).</p>
<fig id="pcbi-1000667-g003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1000667.g003</object-id>
<label>Figure 3</label>
<caption>
<title>Fragment assembly.</title>
<p>(A–C) Hamiltonian. (A) A sequence with overlapping reads; (B) Each read is represented as a vertex, with edges connecting the overlapping vertices; (C) the assembly solution is a Hamiltonian path (all vertices are visited, no vertex is visited more than once) through the resulting graph; (D) For short reads assembly, each vertex is a
<italic>k</italic>
-mer (or a hashed collection of
<italic>k</italic>
-mers), and the reads are threaded between vertices as edges. The solution is a Eulerian path, where each edge is visited once. Repeats are merged into a single edge. For detailed algorithms see
<xref ref-type="bibr" rid="pcbi.1000667-Pevzner1">[49]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Chaisson1">[50]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Zerbino1">[53]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-Warren1">[55]</xref>
.</p>
</caption>
<graphic xlink:href="pcbi.1000667.g003"></graphic>
</fig>
<p>For short reads, however, this technique is not suitable. To establish adequate coverage, short reads need to be produced in large quantities, and their short lengths means that there are many identical, or nearly identical, reads. The plethora of reads makes representing the vertices as single reads impossible. Another problem is that the sheer volume of reads makes the graph large and unmanageable. The solution to a Hamiltonian path is an NP-complete problem, meaning that the time necessary for a solution grows exponentially with the number of nodes. So while it is possible to solve for a relatively low number of reads as are produced using Sanger sequencing, the problem becomes intractable with the large amounts of sequence data from second generation sequencers.</p>
<p>One solution is for the vertices to represent
<italic>k</italic>
-mer words with the reads themselves being the edges connecting the vertices. Since the vertices represent
<italic>k</italic>
-mers rather than reads, the high number of reads and their redundancy does not affect the number of nodes. Repeats exist in the graph only once, with links to the different start and end points. Searches for overlaps are simplified, as overlapping reads are mapped onto the same edge and can easily be followed simultaneously. Finally, since the reads are represented as edges rather than vertices, the solution is a Eulerian path, where each edge is visited once. Unlike a Hamiltonian path, a linear-time algorithm to solve a Eulerian path does exist, making the assembly problem tractable for large number of reads.</p>
<p>The EULER assembler
<xref ref-type="bibr" rid="pcbi.1000667-Pevzner1">[49]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Chaisson1">[50]</xref>
was the first to present this technique using de Bruijn graphs. De Bruijn graphs are
<italic>n</italic>
-dimensional graphs of
<italic>m</italic>
symbols. For metagenomic assembly,
<italic>m</italic>
 = 4 (A,T,G,C) and
<inline-formula>
<inline-graphic xlink:href="pcbi.1000667.e008.jpg" mimetype="image"></inline-graphic>
</inline-formula>
length. Theoretically, there are
<italic>m</italic>
<sup>n</sup>
vertices, but the dimensionality can be greatly reduced by hashing the reads in the dataset to be assembled (see
<xref ref-type="fig" rid="pcbi-1000667-g003">Figure 3</xref>
). Other variations have since been published, adapting to short (100–200)
<xref ref-type="bibr" rid="pcbi.1000667-Myers2">[51]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Chaisson2">[52]</xref>
and very short read lengths
<xref ref-type="bibr" rid="pcbi.1000667-Zerbino1">[53]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-Warren1">[55]</xref>
. EULER and VELVET are available for download. Recently, Ye and Tang developed an assembly method that finds putative open reading frame (ORF) regions first, and then assembles those regions. This method, dubbed ORFome assembly, increases assembly accuracy for ORF regions at the expense of losing noncoding regions. Nevertheless, for many practical purposes this method is very useful, because it appears to have a better recovery rate, for coding regions only, than regular, whole genome assemblers
<xref ref-type="bibr" rid="pcbi.1000667-Ye1">[56]</xref>
. For recent reviews on computational assembly methods see
<xref ref-type="bibr" rid="pcbi.1000667-Pop1">[57]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Flicek1">[58]</xref>
.</p>
</sec>
</sec>
<sec id="s5">
<title>Gene Calling</title>
<p>Genes are the basic functional unit in the genome, which may constitute larger functional units such as operons, transcriptional units, and functional networks. Again, the incomplete and fragmentary nature of metagenomic data presents challenges to identifying genes. With Sanger random shotgun sequencing, whole genomes are rarely assembled, and in species-rich environments, many reads remain as singletons rather than being joined in contigs. In the Global Ocean Sampling (GOS) data, which were Sanger-sequenced, the mean number of whole reading frames per assembly is 4.7
<xref ref-type="bibr" rid="pcbi.1000667-Yooseph1">[59]</xref>
.</p>
<p>Gene finding algorithms are trained to find whole ORFs and take into account information gleaned from large genomic stretches. For metagenomic data, however, this information is unavailable. Despite such drawbacks, Mavromatis and colleagues have shown that for a high complexity metagenomic dataset, gene prediction on assemblies can be as accurate as 85% of the originally predicted genes in the constituting genomes. For a low complexity set this goes up to 90%
<xref ref-type="bibr" rid="pcbi.1000667-Mavromatis1">[48]</xref>
.</p>
<p>For genes with known homologs, BLASTing (using the Basic Local Alignment Search Tool)
<xref ref-type="bibr" rid="pcbi.1000667-Altschul1">[60]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Altschul2">[61]</xref>
against known databases is a common approach. This approach informs of the existence of gene family members within a metagenome. BLAST cannot be used to find new families and new genes that have no homologs in known databases. For that, ab initio gene prediction tools are used. Those tools are mostly based on supervised learning and statistical pattern recognition methods. Most models use Markov models or Hidden markov models. Genemark.hmm is a program that uses inhomogeneous Markov models based on monocodon frequency analysis for gene calling
<xref ref-type="bibr" rid="pcbi.1000667-Azad1">[62]</xref>
. When applied to metagenomic data, however, those methods lose sensitivity, because they often fail to identify partial ORFs that may be part of true genes. This is especially true when conventional gene calling methods are applied to raw Sanger fragments rather than to assemblies. Unsupervised methods are therefore required.</p>
<p>Yooseph and colleagues
<xref ref-type="bibr" rid="pcbi.1000667-Yooseph1">[59]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Yooseph2">[63]</xref>
have used a different approach to gene finding when analyzing the global ocean survey data. They began with simple ORF identification of consecutive translatable regions that translate to at least 60 amino acids (aa). They then clustered those sequences using an all-against-all BLAST search, identifying clusters containing nonredundant sequences. In the next step, shadow ORFs were eliminated. Shadow ORFs are false ORFs in a different reading frame than the true ORF, but they overlap the true ORF and hence may be mistaken for a coding region. Yooseph and colleagues handled this by clustering all ORF candidates in the same reading frame and selecting the larger cluster as the one containing true ORFs, discarding the other ones as shadow ORFs. Finally, they removed ORF families with a KaKs
<italic>K</italic>
<sub>a</sub>
/
<italic>K</italic>
<sub>s</sub>
ratio that is close to 1. The rationale for this step is that putative proteins that are seemingly under no selective pressure (positive or negative) are probably falsely identified. Gene families coding for proteins under selective pressure are expected to have a
<inline-formula>
<inline-graphic xlink:href="pcbi.1000667.e009.jpg" mimetype="image"></inline-graphic>
</inline-formula>
or
<inline-formula>
<inline-graphic xlink:href="pcbi.1000667.e010.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>It has been argued that one drawback of the incremental clustering method is that it increases specificity at the expense of sensitivity; that is, it may have an excess of false negatives due to the removal of putative ORFs that do not cluster well or do not cluster at all in the database
<xref ref-type="bibr" rid="pcbi.1000667-Hoff1">[64]</xref>
. As of today, however, there has not been a thorough comparative evaluation of gene calling methods on first or second generation sequence data.</p>
</sec>
<sec id="s6">
<title>Species Diversity</title>
<sec id="s6a">
<title>Measuring Diversity</title>
<p>In the “Sample Size” we discussed using 16S/18S rDNA for phylotyping and assessing species coverage using a rarefaction curve. Microbial ecology has many tools for assessing species diversity. Rarefaction curves are used to estimate the coverage obtained from sampling, see
<xref ref-type="fig" rid="pcbi-1000667-g004">Figure 4</xref>
. α-diversity, β-diversity, and γ-diversity are all well-established diversity indices used in ecology, including microbial ecology. α-diversity is the biodiversity in a defined habitat or ecosystem; β-diversity compares species diversity between habitats; γ-diversity is the total biodiversity over a large region containing several ecosystems. Here we will discuss the application of these indices to metagenomic data.</p>
<fig id="pcbi-1000667-g004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1000667.g004</object-id>
<label>Figure 4</label>
<caption>
<title>Rarefaction curves.</title>
<p>Green, most or all species have been sampled; blue, this habitat has not been exhaustively sampled; red, species rich habitat, only a small fraction has been sampled.</p>
</caption>
<graphic xlink:href="pcbi.1000667.g004"></graphic>
</fig>
<p>One way to calculate α-diversity is by using Shannon's index:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e011"></graphic>
</disp-formula>
where:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e012"></graphic>
</disp-formula>
</p>
<p>Where
<italic>S</italic>
is the total number of OTUs,
<italic>n</italic>
<sub>i</sub>
is the number of clones in each OUT, and
<italic>N</italic>
is the total number of individuals.
<italic>p</italic>
<sub>i</sub>
is the relative abundance of each OTU.
<inline-formula>
<inline-graphic xlink:href="pcbi.1000667.e013.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<sec id="s6a1">
<title>Using different sequence markers for OTU identification</title>
<p>It should be noted that using 16S/18S rDNA as a proxy for OTU identification and counting is not without problems. First, rDNA has been criticized as an OTU marker, and evidence of horizontal gene transfer involving rDNA may confound its reliability even more
<xref ref-type="bibr" rid="pcbi.1000667-Schouls1">[65]</xref>
. Second, 16S rDNA may exist in multiple different sequence copies in a single bacterium: this would cause a variance in both the estimated individual bacterial count, and OTU numbers. It is commonly accepted that the mean number of bacterial ribosomal operons per genome is 4.1
<xref ref-type="bibr" rid="pcbi.1000667-DeSantis1">[66]</xref>
, but in a recent publication it has been shown that 16S rDNA gene copy numbers may vary between 1 and 15
<xref ref-type="bibr" rid="pcbi.1000667-Case1">[67]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Klappenbach1">[68]</xref>
. Alternative markers, such as single copy housekeeping genes have been suggested as alternative or complementary species and population tally markers for bacterial genomes. The
<italic>rpoB</italic>
gene is a strong candidate
<xref ref-type="bibr" rid="pcbi.1000667-Walsh1">[69]</xref>
, but
<italic>amoA</italic>
,
<italic>pmoA</italic>
,
<italic>nirS</italic>
,
<italic>nirK</italic>
,
<italic>nosZ</italic>
, and
<italic>pufM</italic>
have also been suggested in different contexts
<xref ref-type="bibr" rid="pcbi.1000667-Case1">[67]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Achenbach1">[70]</xref>
. The housekeeping functionality of these genes makes them less susceptible to horizontal gene transfer. However, these studies have shown that on a finer level the use of housekeeping genes does improve upon 16S rDNA alone, the use of 16S rDNA as a marker for OTU identification and count is still sufficiently accurate for many purposes. The use of housekeeping genes for OTU classification is primarily for those cases when 16S rDNA provides a lower resolution than when a high diversity of species is expected. Another case where a housekeeping gene is preferable to 16S rDNA is when the variation in the housekeeping gene matches the acceptable taxonomy better than the variation in the rDNA sequences. The use of non-rDNA phylogenetic markers has been applied to metagenomic data, showing that certain microbial communities evolve faster than others
<xref ref-type="bibr" rid="pcbi.1000667-vonMering1">[71]</xref>
.</p>
<p>Epidemiologists classify bacterial serovars for pathogen verification using Multilocus Sequence Typing (MLST)
<xref ref-type="bibr" rid="pcbi.1000667-Enright1">[72]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Maiden1">[73]</xref>
. MLST is a technique by which several standardized housekeeping genes are selected for OTU typification. There is an online resource for MLST, including a database for OTU identification (
<ext-link ext-link-type="uri" xlink:href="http://www.mlst.net/">http://www.mlst.net/</ext-link>
). MLST has been used successfully in some metagenomic studies
<xref ref-type="bibr" rid="pcbi.1000667-Mahenthiralingam1">[74]</xref>
. However, MLST appears to be more useful for a finer level substrain typification, rather than OTUs.</p>
<p>In the same vein, 18S rDNA can have different count numbers in microeukaryotes, with an even larger copy number variation between species than 16S rDNA counts in prokaryotes. Care must be taken to account for this copy number variation when assessing the cell count in eukaryotic samples
<xref ref-type="bibr" rid="pcbi.1000667-Zhu1">[75]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Loram1">[76]</xref>
.</p>
<p>There are several software packages we found very useful for biodiversity analysis. The first is a general purpose population analysis software, EstimateS (8.0)
<xref ref-type="bibr" rid="pcbi.1000667-Colwell1">[77]</xref>
. EstimateS contains a rich set of biodiversity analysis modules, but for microbial analysis it requires preprocessing of sequence data to transform it into generic population data. MOTHUR
<xref ref-type="bibr" rid="pcbi.1000667-Schloss1">[78]</xref>
is tailored towards microbial diversity analysis and provides tools for transforming sequence data to population data. It is not as rich in functional modules as EstimateS, but for most diversity analyses (rarefaction curves, standard estimate indices) it is more than adequate. QIIME, an extension of PyCogent
<xref ref-type="bibr" rid="pcbi.1000667-Knight1">[79]</xref>
, is in beta, but testing by one of us (IF) has shown it to be a very powerful and versatile package for analysis of genomic and metagenomic microbial ecology data (
<ext-link ext-link-type="uri" xlink:href="http://qiime.sourceforge.net">http://qiime.sourceforge.net</ext-link>
). A more specialized software geared to the analysis of viral metagenomic data is PHACCS
<xref ref-type="bibr" rid="pcbi.1000667-Angly1">[80]</xref>
.</p>
</sec>
</sec>
<sec id="s6b">
<title>Binning</title>
<p>We wish to know not only who populates the sample, but also what the different OTUs are doing. We must therefore associate sequence data with the OTU of its origin. This analysis is called binning (placing the sequence in its correct “bin” or OTU). In many cases, suitable phylogenetic marker genes are missing either because rDNA sequences may be unsuitable (as in virus analyses), or may have been undersampled.</p>
<p>Here we will examine two binning strategies: composition-based binning and phylogenetic binning.</p>
<sec id="s6b1">
<title>Composition-based binning</title>
<p>The GC content of bacterial genomes is being used routinely for higher-level systematics
<xref ref-type="bibr" rid="pcbi.1000667-1">[81]</xref>
. With the advent of ESS data, a finer resolution for classifying or binning sequences is called for. Markov models based on
<italic>k</italic>
-mer frequencies have shown to be quite powerful for statistical analyses of DNA sequences
<xref ref-type="bibr" rid="pcbi.1000667-Schbath1">[82]</xref>
. For example, tetranucleotides are being used by the TETRA
<xref ref-type="bibr" rid="pcbi.1000667-Teeling1">[83]</xref>
program in the following fashion. There are 4
<sup>4</sup>
 = 256 possible DNA tetranucleotides. For each tetranucleotide
<inline-formula>
<inline-graphic xlink:href="pcbi.1000667.e014.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, an expected frequency
<italic>E</italic>
(
<italic>t</italic>
<sub>i</sub>
) can be calculated by means of a maximal-order Markov model:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e015"></graphic>
</disp-formula>
</p>
<p>Where
<italic>O</italic>
is the observed count of the sub-trimers and dimer of the tetramer.</p>
<p>The level of over- and underrepresentation of each tetranucleotide is evaluated using
<italic>z</italic>
-scores:
<disp-formula>
<graphic xlink:href="pcbi.1000667.e016"></graphic>
</disp-formula>
<disp-formula>
<graphic xlink:href="pcbi.1000667.e017"></graphic>
</disp-formula>
</p>
<p>Where σ(
<italic>O</italic>
(
<italic>t</italic>
<sub>i</sub>
)) is the variance in the tetranucleotide
<italic>t</italic>
<sub>i</sub>
.</p>
<p>Composition-based binning is not error-free. The closer the OTUs in the studied metagenome and the more numerous they are, the higher is the frequency of misclassification errors. The strength of
<italic>k</italic>
-mer–based binning is that there are no reference sequences required for the actual binning: all the information is intrinsic. This makes
<italic>k</italic>
-mer a powerful tool for binning ORFan sequences: sequences that have few or no homologs and therefore no known function. Therefore, TETRA is independent of existing genomic data, since it does not require any training. PhyloPythia
<xref ref-type="bibr" rid="pcbi.1000667-Mchardy1">[84]</xref>
is a supervised method that trains a set of support vector machines (SVMs) to bin sequences of a length greater than 1 kb, and thus not suitable for binning second generation sequences. It performs best when a training set is similar in phylotypic composition to the training set. Growing Self Organizing Maps or GSOM
<xref ref-type="bibr" rid="pcbi.1000667-Chan1">[85]</xref>
and Seeded GSOM or S-GSOM
<xref ref-type="bibr" rid="pcbi.1000667-Chan2">[86]</xref>
use a variant of the machine learning algorithm self-organizing maps. S-GSOM improves upon GSOM by extracting the flanking sequences of highly conserved 16S rDNA from the metagenome and using them as seeds to assign other reads on the basis of their compositional similarity. Both use frequencies of di- to penta-nucleotides for binning assignment.</p>
<p>Another composition-based method is codon-usage. An old technique in genomics, codon usage, can also be used for binning metagenomic data. Different species use different codon frequencies to encode the same amino acids, and this observation can be exploited to classify ORF sequences. Shani Tzahor and colleagues have developed a composite supervised method that uses both TETRA and codon usage statistics to classify fragments in the 100–300-bp range
<xref ref-type="bibr" rid="pcbi.1000667-Tzahor1">[87]</xref>
.</p>
<p>TETRA is available for download, and PhyloPyhtia is available as a Web site, with a downloadable version available by request. GSOM/S-GSOM does not seem to be available at this time.</p>
</sec>
<sec id="s6b2">
<title>Similarity-based binning</title>
<p>Another way to bin sequences is to find similarities to reference sequences that can be used to build a tree. This technique is useful when most sequences in the sample have significant similarities to reference sequences from known OTUs. Given an unannotated sequence
<italic>A</italic>
, and two annotated reference sequences
<italic>B</italic>
and
<italic>C</italic>
, and using the similarity function
<italic>sim</italic>
, let us consider the case where we have
<inline-formula>
<inline-graphic xlink:href="pcbi.1000667.e018.jpg" mimetype="image"></inline-graphic>
</inline-formula>
; then, the sequence
<italic>A</italic>
will be placed on a node in the tree between
<italic>B</italic>
and
<italic>C</italic>
, and, in the case considered, closer to
<italic>B</italic>
. MEGAN
<xref ref-type="bibr" rid="pcbi.1000667-Huson1">[88]</xref>
implements this method by reading a BLAST file output. Typically, the output is from the metagenomic reads or assemblies against nr, or any other sequence database that has a phylogenetic tree associated with it. MEGAN then assigns each read to the lowest common ancestor on the phylogenetic tree. This allows all sequences that have a homolog in nr to be assigned. Predicted gene sequences, having no homologs, are aggregated into their own single node on the tree. CARMA
<xref ref-type="bibr" rid="pcbi.1000667-Krause1">[89]</xref>
is somewhat similar to MEGAN, but uses Pfam
<xref ref-type="bibr" rid="pcbi.1000667-Finn1">[90]</xref>
as its source for taxonomic classification. It should be noted that a precise assignment to an OTU may not be possible in many cases. Nevertheless, unless it is an ORFan, the sequence can be placed in the species tree. The resulting picture of sequences on the species tree can provide an overview of the dominant species in the sample. Phymm
<xref ref-type="bibr" rid="pcbi.1000667-Brady1">[91]</xref>
uses interpolated Markov models to characterize variable length DNA sequences by their phylogenetic grouping, unlike other methods. Phymm is trained on existing OTUs and learns which nucleotide length is best for classification. Also, Phymm does not leave reads unclassified, although that may impact its overall accuracy if there are many reads that cannot be accurately binned to any phylogenetic group.</p>
<p>As far as the usability of these software, CARMA will run on Unix-like environments, and its installation requires some third party software, and a rudimentary knowledge of Perl and MySQL. MEGAN runs in a Java virtual machine, and thus runs on almost out of the box Java-enabled platforms; it does require an installation of National Center for Biotechnology Information (NCBI)-formatted taxonomic reference database for lowest common ancestor mapping. Also, CARMA can run its own BLAST, whereas MEGAN requires a previously generated BLAST output as its input.</p>
</sec>
</sec>
</sec>
<sec id="s7">
<title>Functional Annotation</title>
<p>Having assembled the metagenome and identified putative ORFs we would now like to understand the functional potential of the microbial community from where we derived the metagenome: what are these microbes capable of doing as a community? The first level of functional annotation is assigning biological functions to the ORFs. This task is highly challenging when applied to regular genomic data
<xref ref-type="bibr" rid="pcbi.1000667-Friedberg1">[92]</xref>
, and the challenge is compounded in metagenomic data where many ORFs are partial, and a large fraction have no annotated homologs. The second level would be discovering genes that constitute biological networks, such as metabolic pathways, in the data. The latter task is hampered by our inability to accurately associate each annotated ORF with a single species, which means it is sometimes hard to determine which component of a network comes from which organism. Nevertheless, binning can help to some extent. As we shall see in the “Case Studies” section below, several studies have been carried out and led to the successful discovery of complementary metabolic pathways from microbes that constitute a community.</p>
<p>In metagenomic samples the probability of not calling all genes is higher than in a fully assembled genome, since many ORFs may be partial, and thus invisible to regular gene calling software that require a full ORF. Therefore, one strategy for functional annotation would be to skip the gene calling step altogether. Instead, simply use six-frame translations on the reads provided. If the translations are reasonably long they may be ORFs. Even if they are short, but they are cut short because of being at the edge of a contig, they may still be partial ORFs. Now these putative partial ORFs can be searched for motifs, HMM profiles, and other sequence signatures that may indicate functionality. The rationale is that the probability of calling a false ORF that also includes a known sequence signature is negligible. Some metagenomic annotation programs use this rationale. For example, Motif EXtraction (MEX)
<xref ref-type="bibr" rid="pcbi.1000667-Kunik1">[93]</xref>
is an unsupervised motif creation method that is successful in identifying enzymes in genomic and metagenomic data
<xref ref-type="bibr" rid="pcbi.1000667-Sharon1">[94]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Meroz1">[95]</xref>
. Short, enzyme-specific peptides are identified in an unsupervised learning stage. They are subsequently associated with certain functions, in the supervised learning stage. The reason an unsupervised stage takes place is because, in many cases, new motifs can be identified within ORFans, even though their functional association may be unknown.</p>
<p>Even unassembled single reads (singletons) may be used to infer functional information, being long enough to find short motifs or significant BLAST hits. BLASTing singletons and annotating the results without assembly or postassembly has its use. Two versatile and useful annotation pipelines for metagenomics that implement the annotation principles outlined above are MG-RAST
<xref ref-type="bibr" rid="pcbi.1000667-Meyer1">[96]</xref>
and RAMMCAP
<xref ref-type="bibr" rid="pcbi.1000667-Li1">[97]</xref>
. MG-RAST accepts a 454 dataset as input, normalizes it (removes artefactual duplicate sequences, a known problem with 454 sequencing), and then performs gene calling and annotation by a variety of sequence similarity searches (mainly BLAST) against various sequence databases, including 16S rDNA. It then produces statistics on species associations and on metabolic pathway associations using the SEED subsystems database as its guideline. RAMMCAP uses the fast clustering algorithm CD-HIT
<xref ref-type="bibr" rid="pcbi.1000667-Li2">[98]</xref>
to cluster translated ORFs by high sequence similarity. The rationale is that many similar putative ORFs strengthen the hypothesis that they are indeed real ORFs. Optionally, CD-HIT also serves to reduce the volume of data to be annotated by picking representatives from identical or nearly identical sequences and annotating only the representative sequences. The annotation is then transfered to the highly similar sequence in each similarity-based cluster. The sequences are then compared to the profile HMM databases Pfam
<xref ref-type="bibr" rid="pcbi.1000667-Finn1">[90]</xref>
and TIGRfam
<xref ref-type="bibr" rid="pcbi.1000667-Haft1">[99]</xref>
using HMMer (
<ext-link ext-link-type="uri" xlink:href="http://hmmer.janelia.org/">http://hmmer.janelia.org/</ext-link>
) for functional annotation.</p>
</sec>
<sec id="s8">
<title>Comparative Metagenomics</title>
<p>Comparing two or more metagenomes is necessary to understand how genomic differences affect, and are affected by, the abiotic environment. There are several sequence-based traits that can be compared: GC content was compared between marine and soil samples
<xref ref-type="bibr" rid="pcbi.1000667-Yooseph1">[59]</xref>
, microbial genome size
<xref ref-type="bibr" rid="pcbi.1000667-Raes1">[43]</xref>
, taxonomic
<xref ref-type="bibr" rid="pcbi.1000667-vonMering1">[71]</xref>
, and functional content (e.g.,
<xref ref-type="bibr" rid="pcbi.1000667-Turnbaugh1">[100]</xref>
). Many comparative analyses, pairwise or multiple, make use of ordination statistics as when several metagenomic datasets are involved, or when several types of metadata are hypothesized to affect the observed compositions of the metagenomic populations. Principal component analysis (PCA) and nonmetric multidimensional scaling (NM-MDS) are typically used to visualize the data and to reveal which factors affect the observed data most (e.g.,
<xref ref-type="bibr" rid="pcbi.1000667-Brulc1">[101]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Willner1">[102]</xref>
).</p>
<p>We mentioned MEGAN before as a binning software. MEGAN can also be used to compare the OTU composition of two or more frequency-normalized samples
<xref ref-type="bibr" rid="pcbi.1000667-Mitra2">[103]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Huson2">[104]</xref>
. MG-RAST provides a comparative functional and sequence-based analysis for uploaded samples, whereas IMG/M provides similar analysis for metagenomes that exist in the IMG/M site
<xref ref-type="bibr" rid="pcbi.1000667-Markowitz1">[105]</xref>
. RAMMCAP also provides the ability to compare metagenomes. Other software used for the comparison of microbial populations based on phylogenetic data are UniFrac
<xref ref-type="bibr" rid="pcbi.1000667-Lozupone1">[106]</xref>
and MetaStats
<xref ref-type="bibr" rid="pcbi.1000667-White1">[107]</xref>
, the latter being suitable for preprocessed clinical metagenomic data. Galaxy, an online workbench for the analysis of genomic data, can also perform some comparative metagenomic analysis, as well as taxonomic mapping
<xref ref-type="bibr" rid="pcbi.1000667-Giardine1">[108]</xref>
. ShotgunFunctionalizeR
<xref ref-type="bibr" rid="pcbi.1000667-Kristiansson1">[109]</xref>
is a stand-alone analysis tool for metagenomics samples written in R
<xref ref-type="bibr" rid="pcbi.1000667-R1">[110]</xref>
. The megx.net resource includes include MetaMine
<xref ref-type="bibr" rid="pcbi.1000667-Bohnebeck1">[111]</xref>
for annotating genes using neighboring ORF information, and MetaLook
<xref ref-type="bibr" rid="pcbi.1000667-Lombardot1">[112]</xref>
for organization of sequences using customized habitat criteria. CAMERA (
<ext-link ext-link-type="uri" xlink:href="http://camera.calit2.net">http://camera.calit2.net</ext-link>
) offers to BLAST the user's sequences against 40 existing genomic and metagenomic datasets. CAMERA also serves as an archive for select metagenomic datasets generated by marine microbial research funded by the Gordon and Betty Moore Foundation. All of these sites appear to be in a state of flux, with promised new functionalities to be added soon and with datasets constantly being updated.</p>
<p>We mentioned the importance of standardized recording of metadata in the Recording Metadata section above. Comparative analysis is where the importance of metadata comes into play: in order to properly compare between different environments, we need a common vocabulary describing the abiotic components. To date we do not know of software that provides a comparison between metadata or a comparative correlation between metadata and sequence data, although several such comparisons have been performed (see “Case Studies” section below).</p>
</sec>
<sec id="s9">
<title>Applications</title>
<p>In this section we will discuss a few studies involving metagenomics. We chose these studies because each one illustrates a different insight that is derived from using metagenomics.</p>
<sec id="s9a">
<title>Correlations between Environmental Data and Metadata</title>
<p>The study of the effects of the environment on microbes is as old as microbiology itself. Antoni van Leeuwenhoek noted that the “animalcules” scraped from his mouth and that he viewed under his microscope were gone or were immobile after he drank hot coffee. Leeuwenhoek was the first to describe a correlation between temperature change and organism viability
<xref ref-type="bibr" rid="pcbi.1000667-Egerton1">[113]</xref>
. Ever since then, microbe species distribution, genetics, pathogenicity, virulence, colonization—indeed every aspect of microbial life—has been correlated with habitat traits such as temperature, salinity, pH, nutrient content, etc. Traits of host-borne microbes have been correlated with the host species, age, habitat, behavior, feeding habits, host organs chosen for settlement/pathogenicity, and, of course, clinical symptoms and many other traits.</p>
<p>With the advent of metagenomics, we are now able to study the genomic potential of a bacterial community and how it is affected by and affects its habitat. Many metagenomic studies have looked to some extent at correlations between sequence data, environment, and environmental attributes in an attempt to gain biological insight. One notable study by Turnbaugh and colleagues looked at the connection between the gut microbiome and obesity. The authors discovered that the metagenome in obese mice was enriched in carbohydrate active enzymes over that of lean mice. A separate biochemical experiment confirmed that the microbiome in obese mice has a larger energy harvesting capacity than in lean mice. They concluded that the gut microbiome contributes to obesity through this feed-forward cycle
<xref ref-type="bibr" rid="pcbi.1000667-Turnbaugh1">[100]</xref>
.</p>
<p>Studies such as those presented above looked at bivariate correlations: obesity and carbohydrate active enzyme enrichment. One recent study by Gianoulis and colleagues suggests how to locate multivariate correlations between metagenomic data and environmental attributes
<xref ref-type="bibr" rid="pcbi.1000667-Gianoulis1">[114]</xref>
. At the same time, environmental factors may combine in unexpected ways revealing new insights. Gianoulis and colleagues have identified covariation in amino acid transport and cofactor synthesis in nutrient-poor ocean areas, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.</p>
</sec>
<sec id="s9b">
<title>Understanding Symbiosis</title>
<p>In many cases, symbiotic bacteria living in an animal host consist of a small number of species, which are often phylogenetically distant. Because they are few species and the phylogenetic distance makes their sequences relatively easy to bin, metagenomics is useful for studying symbionts. Eisen and his colleagues sequenced ESS data from bacterial symbionts living in the glassy-winged sharpshooter, which is an insect that lives solely on tree sap, a nutrient poor diet. By binning the ESS data they inferred that one symbiont synthesizes amino acids for the host insect, while another synthesizes cofactors and vitamins
<xref ref-type="bibr" rid="pcbi.1000667-Wu1">[115]</xref>
. Not only that, but the symbiont providing the vitamins lacks some amino-acid synthetic pathways, and the symbiont providing the amino-acid synthetic lacks the ability to synthesize the vitamins. Thus, both symbionts complement each other's metabolic deficiencies, as well as feeding their host. Another study of the marine gutless worm
<italic>Olavius algarvensis</italic>
has revealed the different roles of its four symbionts in generating nutrients and processing the worm's waste
<xref ref-type="bibr" rid="pcbi.1000667-Woyke1">[116]</xref>
. None of the symbionts in the insect or in the worm study could be cultured under the reported conditions. Metagenomics thus became the chosen avenue for these studies.</p>
</sec>
<sec id="s9c">
<title>Enriching Gene Families</title>
<p>Another type of study enabled by metagenomics is the search for new members of a gene family. Metagenomics has opened up the floodgates of genomic material. Consequently the laborious hen-pecking for exemplars to enrich a studied gene family from known cultured species, has been replaced by the laborious computational filtering of appropriate exemplars from millions of environmental sequences. The previously small bacterial Eukaryotic Protein Kinase Like (ELK) family has been enriched several folds by the Global Ocean Sampling (GOS) project. Many new members of known families were identified, as well as new families. Within the protein sequences, four new residues of unknown function were found to be conserved, setting the stage for future functional studies of this family
<xref ref-type="bibr" rid="pcbi.1000667-Kannan1">[117]</xref>
.</p>
</sec>
<sec id="s9d">
<title>Metagenomics and Environmental Virology</title>
<p>Outnumbering living microbes, viruses are the most abundant biological entity on Earth: there are an estimated 10
<sup>30</sup>
tailed bacteriophages in the biosphere
<xref ref-type="bibr" rid="pcbi.1000667-Brussow1">[118]</xref>
. In marine environments, viruses constitute 94% of all nucleic-acid containing particles, although owing to their small size they are estimated to constitute only 5% of the biomass. Metagenomic studies have enriched our knowledge of viral diversity and the role viruses play as facilitators of microbial genetic diversity. Sequence similarity analyses of viral metagenomic data have shown that approximately 90% of the sequences have no similarity to GenBank sequences, telling of an underrepresentation of viral sequence data in sequence databases
<xref ref-type="bibr" rid="pcbi.1000667-Edwards1">[119]</xref>
.</p>
<p>Transduction—the transfer of genetic material via a viral vector—is known to be a strong contributer to genetic diversity in prokaryotes. Metagenomic studies help us assess the magnitude of virally contributed genetic diversity. For example, the existence of photosynthetic genes in cyanophages—viruses infecting cyanobacteria—has been known for some time
<xref ref-type="bibr" rid="pcbi.1000667-Mann1">[120]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Millard1">[121]</xref>
. However, metagenomic studies have revealed the extent of this phenomenon: it is estimated that 60% of the psbA genes, a component of Photosystem I, in surface water are of phage origin. Another metagenomic study revealed the existence of whole photosynthetic cassettes in cyanophages, which may increase host fitness by supplementing and enhancing existing cyanobacterial photosystems. The latter findings were enabled by the metagenomic data from Global Ocean Sampling (GOS). Surveying these data using simple sequence similarity analyses and chromosomal gene location have revealed the existence of Photosystem I genes in cyanophages, and the extent of their distribution
<xref ref-type="bibr" rid="pcbi.1000667-Sharon1">[94]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Sharon2">[122]</xref>
.</p>
<p>Clinical virology also stands to benefit from metagenomic analysis
<xref ref-type="bibr" rid="pcbi.1000667-Delwart1">[123]</xref>
. Indeed, recent molecular-based discoveries of highly prevalent viral infections caused by anellovirus
<xref ref-type="bibr" rid="pcbi.1000667-Nishizawa1">[124]</xref>
and GBV-C
<xref ref-type="bibr" rid="pcbi.1000667-Simons1">[125]</xref>
highlight the need for a better understanding of the human viral flora.</p>
<p>The computational analysis of viral metagenomic data is particularly challenging. First, viruses may exist as a chromosomal insert, such as prophages, which are incorporated in the host genome. This incorporation confuses the ability to distinguish viral genomic elements from the host. Furthermore, when filtering exclusively for viral particles, prophage elements are lost. Second, viruses have no distinct phylogenetic marker gene, equivalent to the small ribosomal subunit rRNA in prokaryotes or eukaryotes. The lack of a consensual marker gene hampers phylogenetic and diversity analysis. Third, as stated above, most viral genes have no annotated homolog in sequence databases, which impedes functional analysis and indeed the identification of viral genes for what they are. Indeed, by some estimates the majority of ORFans in the biosphere is due to lateral gene transfer of viral origin
<xref ref-type="bibr" rid="pcbi.1000667-Yin1">[126]</xref>
and the fact that phage-induced lateral gene transfer contributes in a major way to microbial diversity
<xref ref-type="bibr" rid="pcbi.1000667-Hambly1">[127]</xref>
.</p>
</sec>
</sec>
<sec id="s10">
<title>The Future</title>
<p>We are in the midst of the fastest growing revolution in molecular biology, perhaps in all of life science, and it only seems to be accelerating. Sanger sequencing has been with us for over three decades. High-throughput 3730 sequencing has been around for 8 years, Roche 454 instrumentation has been available for 6 years, and Illumina GA for 3 years. The latter two methods have enabled us to generate more sequence data than Sanger sequencing has. We are still coming to grips with the large volume of data, and how to analyze it. Assembly, quality control, binning, and annotation all require ingenious algorithms combined with the latest computational power. It appears that sequencing technology is changing almost faster than the associated computational techniques can keep up. There are many indications that within a few years, short-read second generation sequencing may be outdated. Third generation sequencing that will enable the sequencing of a single chromosome in a single pass with few or no fragments should be established very soon
<xref ref-type="bibr" rid="pcbi.1000667-Clarke1">[35]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Eid1">[36]</xref>
. Does this plausible obsolescence of second generation sequencing change current metagenomic computational challenges? For some applications, assembly algorithms may be less warranted, but for species-rich samples, we may not be able to rely solely on third-generation sequencing for good sampling. Coverage assessment, gene finding, binning, and annotation will still be necessary.</p>
<p>The BASE technology from Oxford Nanopore is able to differentiate between cytosine and methyl-cytosine during sequencing
<xref ref-type="bibr" rid="pcbi.1000667-Branton1">[37]</xref>
. Methylation acts as a primitive immune system in bacteria
<xref ref-type="bibr" rid="pcbi.1000667-Boyer1">[128]</xref>
, and as an expression control mechanism in eukarya
<xref ref-type="bibr" rid="pcbi.1000667-Kass1">[129]</xref>
. This additional epigenetic information has been mostly unavailable in sequencing projects due to an inability to obtain it in a high-throughput fashion. Pyrosequencing already offers a capability for quantitative methylation
<xref ref-type="bibr" rid="pcbi.1000667-Tost1">[130]</xref>
and in all likelihood methylation data will be soon made available routinely along with the four base data, and the associated bioinformatics would need to address that.</p>
<p>Another growing problem is that of data management. Sequencing centers are working to equip themselves with computational infrastructure to meet the flow of sequence data. However, many research institutes who request the sequencing do not have the computational infrastructure needed to deal with analysis and long-term storage of these data. The sheer volume of data raises new constraints on its transfer and analysis. These challenges would have to be met by concerted efforts of life scientists, computer scientists, engineers, and funding agencies
<xref ref-type="bibr" rid="pcbi.1000667-Batley1">[131]</xref>
,
<xref ref-type="bibr" rid="pcbi.1000667-Richter2">[132]</xref>
.</p>
<p>Genomic data tell us what an organism is capable of doing, i.e., its genomic potential. What it is actually doing at a given time-frame is discovered by examining transcription (mRNA) and translation (protein) data. In the world of microbial communities, those studies have been dubbed metatranscriptomics and metaproteomics, respectively. These two fields are outside the scope of this review, but note that they too are very much in a development boom, technologically and computationally
<xref ref-type="bibr" rid="pcbi.1000667-Bailly1">[133]</xref>
<xref ref-type="bibr" rid="pcbi.1000667-Wilmes2">[135]</xref>
.</p>
<p>We hope this primer has been useful and informative. Because computational metagenomics is changing rapidly, we call upon the readers of this article who are knowledgeable in the subject to use the comment section of
<italic>PLoS Computational Biology</italic>
to provide updated information.</p>
<boxed-text id="pcbi-1000667-box001" position="float">
<sec id="s10a1">
<title>Box 1. Glossary of terms</title>
<p>
<bold>Binning</bold>
Clustering sequences based on their nucleotide composition or similarity to a reference database</p>
<p>
<bold>Contig</bold>
A set of overlapping DNA segments</p>
<p>
<bold>Coverage (in sequencing)</bold>
The mean number of times a nucleotide is sequenced in a genome</p>
<p>
<bold>ESS</bold>
Environmental Shotgun Sequencing</p>
<p>
<bold>
<italic>K</italic>
</bold>
<bold>
<sub>a</sub>
/</bold>
<bold>
<italic>K</italic>
</bold>
<bold>
<sub>s</sub>
</bold>
The ratio of the rate of nonsynonymous substitutions (
<italic>K</italic>
<sub>a</sub>
) to the rate of synonymous substitutions (
<italic>K</italic>
<sub>s</sub>
), which can be used as an indicator of selective pressure acting on a protein-coding gene</p>
<p>
<bold>Mate pairs</bold>
Sequences known to be in the 3′ and 5′ of a contig from a single clone</p>
<p>
<bold>Metadata</bold>
Definitional data that provide information about or documentation of other data</p>
<p>
<bold>Metagenome</bold>
The DNA obtained from uncultured microorganisms</p>
<p>
<bold>Metagenomics</bold>
The study of genomic DNA obtained from uncultured microorganisms</p>
<p>
<bold>Metaproteomics</bold>
The study of protein molecular data obtained from environmental samples using proteomics techniques</p>
<p>
<bold>Metatranscriptomics</bold>
The study of transcription sequence data obtained from environmental samples</p>
<p>
<bold>ORFan</bold>
An ORF that has no (or few, depending on definition) homologs in other organisms</p>
<p>
<bold>OTU</bold>
Operational taxonomic unit, species distinction in microbiology. Typically using rDNA and a percent similarity threshold for classifying microbes within the same, or different, OTUs</p>
<p>
<bold>Ontology</bold>
A formal representation of a set of concepts and the relationships between them. Ontologies are used to create a consensual unambiguous controlled vocabulary</p>
<p>
<bold>Polony</bold>
Discrete clonal amplifications of a single DNA molecule, grown in a gel matrix. The clusters can then be individually sequenced, producing short reads. Polony-based sequencing is the basis of most second generation sequencers</p>
<p>
<bold>Rarefaction curve</bold>
A curve describing the growth of a number of species discovered as a function of individuals sampled</p>
<p>
<bold>Ribotype</bold>
A phylotypic classification based on rDNA sequences</p>
<p>
<bold>Scaffold</bold>
A series of contigs that are in the right order but not necessarily connected in one contiguous stretch</p>
<p>
<bold>Shadow ORF</bold>
An incorrectly identified ORF that overlaps the coding region of the true ORF</p>
</sec>
</boxed-text>
</sec>
</body>
<back>
<ack>
<p>We would like to thank Linda Amaral-Zettler, Yanay Ofran, Daniel Huson, Jack Gilbert, Mufit Ozden, Frank Oliver Glöckner, and Renzo Kottmann for their critique and input. We would also like to thank the referees for contributing their time and effort towards making this a better article. IF dedicates his contribution to the memory of Ilan Friedberg.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1000667-Whitman1">
<label>1</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Whitman</surname>
<given-names>WB</given-names>
</name>
<name>
<surname>Coleman</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Wiebe</surname>
<given-names>WJ</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Prokaryotes: the unseen majority.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>95</volume>
<fpage>6578</fpage>
<lpage>6583</lpage>
<pub-id pub-id-type="pmid">9618454</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Savage1">
<label>2</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Savage</surname>
<given-names>DC</given-names>
</name>
</person-group>
<year>1977</year>
<article-title>Microbial ecology of the gastrointestinal tract.</article-title>
<source>Annu Rev Microbiol</source>
<volume>31</volume>
<fpage>107</fpage>
<lpage>133</lpage>
<pub-id pub-id-type="pmid">334036</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Berg1">
<label>3</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berg</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>1996</year>
<article-title>The indigenous gastrointestinal microflora.</article-title>
<source>Trends Microbiol</source>
<volume>4</volume>
<fpage>430</fpage>
<lpage>435</lpage>
<pub-id pub-id-type="pmid">8950812</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Collins1">
<label>4</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collins</surname>
<given-names>FS</given-names>
</name>
<name>
<surname>McKusick</surname>
<given-names>VA</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Implications of the human genome project for medical science.</article-title>
<source>JAMA</source>
<volume>285</volume>
<fpage>540</fpage>
<lpage>544</lpage>
<pub-id pub-id-type="pmid">11176855</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Kaput1">
<label>5</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaput</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cotton</surname>
<given-names>RGHG</given-names>
</name>
<name>
<surname>Hardman</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Watson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Al Aqeel</surname>
<given-names>AII</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Planning the human variome project: the spain report.</article-title>
<source>Hum Mut</source>
<volume>30</volume>
<fpage>496</fpage>
<lpage>510</lpage>
<pub-id pub-id-type="pmid">19306394</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-OHara1">
<label>6</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>O'Hara</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Shanahan</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>The gut flora as a forgotten organ.</article-title>
<source>EMBO Rep</source>
<volume>7</volume>
<fpage>688</fpage>
<lpage>693</lpage>
<pub-id pub-id-type="pmid">16819463</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Fiers1">
<label>7</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fiers</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Contreras</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Duerinck</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Haegeman</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Iserentant</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<year>1976</year>
<article-title>Complete nucleotide sequence of bacteriophage ms2 RNA: primary and secondary structure of the replicase gene.</article-title>
<source>Nature</source>
<volume>260</volume>
<fpage>500</fpage>
<lpage>507</lpage>
<pub-id pub-id-type="pmid">1264203</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sanger1">
<label>8</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanger</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Coulson</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Friedmann</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Air</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Barrell</surname>
<given-names>BG</given-names>
</name>
<etal></etal>
</person-group>
<year>1978</year>
<article-title>The nucleotide sequence of bacteriophage phix174.</article-title>
<source>J Mol Biol</source>
<volume>125</volume>
<fpage>225</fpage>
<lpage>246</lpage>
<pub-id pub-id-type="pmid">731693</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Fleischmann1">
<label>9</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fleischmann</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>White</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Clayton</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Kirkness</surname>
<given-names>EF</given-names>
</name>
<etal></etal>
</person-group>
<year>1995</year>
<article-title>Whole-genome random sequencing and assembly of haemophilus influenzae rd.</article-title>
<source>Science</source>
<volume>269</volume>
<fpage>496</fpage>
<lpage>512</lpage>
<pub-id pub-id-type="pmid">7542800</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Amann1">
<label>10</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amann</surname>
<given-names>RI</given-names>
</name>
<name>
<surname>Ludwig</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Schleifer</surname>
<given-names>KH</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>Phylogenetic identification and in situ detection of individual microbial cells without cultivation.</article-title>
<source>Microbiol Rev</source>
<volume>59</volume>
<fpage>143</fpage>
<lpage>169</lpage>
<pub-id pub-id-type="pmid">7535888</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Pace1">
<label>11</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pace</surname>
<given-names>NR</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>A molecular view of microbial diversity and the biosphere.</article-title>
<source>Science</source>
<volume>276</volume>
<fpage>734</fpage>
<lpage>740</lpage>
<pub-id pub-id-type="pmid">9115194</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Rapp1">
<label>12</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rappé</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>The uncultured microbial majority.</article-title>
<source>Annu Rev Microbiol</source>
<volume>57</volume>
<fpage>369</fpage>
<lpage>394</lpage>
<pub-id pub-id-type="pmid">14527284</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Handelsman1">
<label>13</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Handelsman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rondon</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Brady</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Clardy</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Goodman</surname>
<given-names>RM</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products.</article-title>
<source>Chem Biol</source>
<volume>5</volume>
<fpage>R245</fpage>
<lpage>R249</lpage>
<pub-id pub-id-type="pmid">9818143</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Rondon1">
<label>14</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rondon</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>August</surname>
<given-names>PR</given-names>
</name>
<name>
<surname>Bettermann</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Brady</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Grossman</surname>
<given-names>TH</given-names>
</name>
<etal></etal>
</person-group>
<year>2000</year>
<article-title>Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms.</article-title>
<source>Appl Environ Microbiol</source>
<volume>66</volume>
<fpage>2541</fpage>
<lpage>2547</lpage>
<pub-id pub-id-type="pmid">10831436</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Field1">
<label>15</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Garrity</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Selengut</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The minimum information about a genome sequence (migs) specification.</article-title>
<source>Nat Biotech</source>
<volume>26</volume>
<fpage>541</fpage>
<lpage>547</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Kottmann1">
<label>16</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kottmann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kagan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Kravitz</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>A standard MIGS/MIMS compliant xml schema: toward the development of the genomic contextual data markup language (gcdml).</article-title>
<source>OMICS</source>
<volume>12</volume>
<fpage>115</fpage>
<lpage>121</lpage>
<pub-id pub-id-type="pmid">18479204</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Brazma1">
<label>17</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brazma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hingamp</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Quackenbush</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sherlock</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Spellman</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<year>2001</year>
<article-title>Minimum information about a microarray experiment (MIAME)–toward standards for microarray data.</article-title>
<source>Nat Genet</source>
<volume>29</volume>
<fpage>365</fpage>
<lpage>371</lpage>
<pub-id pub-id-type="pmid">11726920</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Westbrook1">
<label>18</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Westbrook</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Fitzgerald</surname>
<given-names>PM</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>The PDB format, mmCIF, and other data formats.</article-title>
<source>Methods Biochem Anal</source>
<volume>44</volume>
<fpage>161</fpage>
<lpage>179</lpage>
<pub-id pub-id-type="pmid">12647386</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sanger2">
<label>19</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanger</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Coulson</surname>
<given-names>AR</given-names>
</name>
</person-group>
<year>1975</year>
<article-title>A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase.</article-title>
<source>J Mol Biol</source>
<volume>94</volume>
<fpage>441</fpage>
<lpage>448</lpage>
<pub-id pub-id-type="pmid">1100841</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sanger3">
<label>20</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanger</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Nicklen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Coulson</surname>
<given-names>AR</given-names>
</name>
</person-group>
<year>1977</year>
<article-title>DNA sequencing with chain-terminating inhibitors.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>74</volume>
<fpage>5463</fpage>
<lpage>5467</lpage>
<pub-id pub-id-type="pmid">271968</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sorek1">
<label>21</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sorek</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Creevey</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Francino</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Genome-wide experimental determination of barriers to horizontal gene transfer.</article-title>
<source>Science</source>
<volume>318</volume>
<fpage>1449</fpage>
<lpage>1452</lpage>
<pub-id pub-id-type="pmid">17947550</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-PedrosAlio1">
<label>22</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedros-Alio</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Ecology: Dipping into the rare biosphere.</article-title>
<source>Science</source>
<volume>315</volume>
<fpage>192</fpage>
<lpage>193</lpage>
<pub-id pub-id-type="pmid">17218512</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sogin1">
<label>23</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sogin</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Mark Welch</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Huse</surname>
<given-names>SM</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Microbial diversity in the deep sea and the underexplored “rare biosphere”.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>103</volume>
<fpage>12115</fpage>
<lpage>12120</lpage>
<pub-id pub-id-type="pmid">16880384</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Hamp1">
<label>24</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamp</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Fodor</surname>
<given-names>AA</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Effects of experimental choices and analysis noise on surveys of the “rare biosphere”.</article-title>
<source>Appl Environ Microbiol</source>
<volume>75</volume>
<fpage>3263</fpage>
<lpage>3270</lpage>
<pub-id pub-id-type="pmid">19270149</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Neufeld1">
<label>25</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neufeld</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mohn</surname>
<given-names>WW</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Scratching the surface of the rare biosphere with ribosomal sequence tag primers.</article-title>
<source>FEMS Microbiol Lett</source>
<volume>283</volume>
<fpage>146</fpage>
<lpage>153</lpage>
<pub-id pub-id-type="pmid">18429998</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Mitra1">
<label>26</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitra</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>In situ localized amplification and contact replication of many individual DNA molecules.</article-title>
<source>Nucl Acids Res</source>
<volume>27</volume>
<fpage>e34</fpage>
<pub-id pub-id-type="pmid">10572186</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Porreca1">
<label>27</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Porreca</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Shendure</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Polony DNA sequencing.</article-title>
<publisher-loc>Hoboken (New Jersey)</publisher-loc>
<publisher-name>John Wiley and Sons, Inc</publisher-name>
<comment>Current protocols in molecular biology. Frederick M. Ausubel, et al., editors. Chapter 7</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Nyrn1">
<label>28</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nyrén</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pettersson</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Uhlén</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>1993</year>
<article-title>Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay.</article-title>
<source>Anal Biochem</source>
<volume>208</volume>
<fpage>171</fpage>
<lpage>175</lpage>
<pub-id pub-id-type="pmid">8382019</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Ronaghi1">
<label>29</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ronaghi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Uhlén</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nyrén</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>A sequencing method based on real-time pyrophosphate.</article-title>
<source>Science</source>
<volume>281</volume>
<fpage>363</fpage>
<lpage>365</lpage>
<pub-id pub-id-type="pmid">9705713</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Margulies1">
<label>30</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Margulies</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Egholm</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>WE</given-names>
</name>
<name>
<surname>Attiya</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>JS</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Genome sequencing in microfabricated high-density picolitre reactors.</article-title>
<source>Nature</source>
<volume>7057</volume>
<fpage>376</fpage>
<lpage>380</lpage>
<pub-id pub-id-type="pmid">16056220</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Holt1">
<label>31</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holt</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>SJM</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>The new paradigm of flow cell sequencing.</article-title>
<source>Genome Res</source>
<volume>18</volume>
<fpage>839</fpage>
<lpage>846</lpage>
<pub-id pub-id-type="pmid">18519653</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Shendure1">
<label>32</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shendure</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Next-generation DNA sequencing.</article-title>
<source>Nat Biotechnol</source>
<volume>26</volume>
<fpage>1135</fpage>
<lpage>1145</lpage>
<pub-id pub-id-type="pmid">18846087</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Harismendy1">
<label>33</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harismendy</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Strausberg</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Stockwell</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Evaluation of next generation sequencing platforms for population targeted sequencing studies.</article-title>
<source>Genome Biol</source>
<volume>10</volume>
<fpage>R32</fpage>
<pub-id pub-id-type="pmid">19327155</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-McPherson1">
<label>34</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McPherson</surname>
<given-names>JD</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Next-generation gap.</article-title>
<source>Nat Methods</source>
<volume>6</volume>
<fpage>S2</fpage>
<lpage>S5</lpage>
<pub-id pub-id-type="pmid">19844227</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Clarke1">
<label>35</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clarke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Jayasinghe</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reid</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Continuous base identification for single-molecule nanopore DNA sequencing.</article-title>
<source>Nat Nano</source>
<volume>4</volume>
<fpage>265</fpage>
<lpage>270</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Eid1">
<label>36</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eid</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fehr</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Luong</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lyle</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Real-time DNA sequencing from single polymerase molecules.</article-title>
<source>Science</source>
<volume>323</volume>
<fpage>133</fpage>
<lpage>138</lpage>
<pub-id pub-id-type="pmid">19023044</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Branton1">
<label>37</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Branton</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Deamer</surname>
<given-names>DW</given-names>
</name>
<name>
<surname>Marziali</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bayley</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Benner</surname>
<given-names>SA</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The potential and challenges of nanopore sequencing.</article-title>
<source>Nat Biotech</source>
<volume>26</volume>
<fpage>1146</fpage>
<lpage>1153</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Lander1">
<label>38</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
</person-group>
<year>1988</year>
<article-title>Genomic mapping by fingerprinting random clones: a mathematical analysis.</article-title>
<source>Genomics</source>
<volume>2</volume>
<fpage>231</fpage>
<lpage>239</lpage>
<pub-id pub-id-type="pmid">3294162</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Torsvik1">
<label>39</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Torsvik</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Goksoyr</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Daae</surname>
<given-names>FL</given-names>
</name>
</person-group>
<year>1990</year>
<article-title>High diversity in DNA of soil bacteria.</article-title>
<source>Appl Environ Microbiol</source>
<volume>56</volume>
<fpage>782</fpage>
<lpage>787</lpage>
<pub-id pub-id-type="pmid">2317046</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Youssef1">
<label>40</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Youssef</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Elshahed</surname>
<given-names>MS</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Species richness in soil bacterial communities: a proposed approach to overcome sample size bias.</article-title>
<source>J of Microb Meth</source>
<volume>75</volume>
<fpage>86</fpage>
<lpage>91</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Fierer1">
<label>41</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fierer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>RB</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>The diversity and biogeography of soil bacterial communities.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>103</volume>
<fpage>626</fpage>
<lpage>631</lpage>
<pub-id pub-id-type="pmid">16407148</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Countway1">
<label>42</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Countway</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>Gast</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Pratik Sava</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Caron</surname>
<given-names>DA</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Protistan diversity estimates based on 18s rDNA from seawater incubations in the western north atlantic.</article-title>
<source>J Euk Micriobiol</source>
<volume>52</volume>
<fpage>95</fpage>
<lpage>106</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Raes1">
<label>43</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Korbel</surname>
<given-names>JO</given-names>
</name>
<name>
<surname>Lercher</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Prediction of effective genome size in metagenomic samples.</article-title>
<source>Genome Biol</source>
<volume>8</volume>
<fpage>R10</fpage>
<pub-id pub-id-type="pmid">17224063</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Richter1">
<label>44</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Richter</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Ott</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Auch</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Schmid</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Metasima sequencing simulator for genomics and metagenomics.</article-title>
<source>PLoS ONE</source>
<volume>3</volume>
<fpage>e3373</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0003373">10.1371/journal.pone.0003373</ext-link>
</comment>
<pub-id pub-id-type="pmid">18841204</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Batzoglou1">
<label>45</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Stanley</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Butler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gnerre</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>ARACHNE: a whole-genome shotgun assembler.</article-title>
<source>Genome Res</source>
<volume>12</volume>
<fpage>177</fpage>
<lpage>189</lpage>
<pub-id pub-id-type="pmid">11779843</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Aparicio1">
<label>46</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aparicio</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stupka</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Putnam</surname>
<given-names>N</given-names>
</name>
<name>
<surname>ming Chia</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Whole-genome shotgun assembly and analysis of the genome of fugu rubripes.</article-title>
<source>Science</source>
<volume>297</volume>
<fpage>1301</fpage>
<lpage>1310</lpage>
<pub-id pub-id-type="pmid">12142439</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Myers1">
<label>47</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Dew</surname>
<given-names>IM</given-names>
</name>
<name>
<surname>Fasulo</surname>
<given-names>DP</given-names>
</name>
<etal></etal>
</person-group>
<year>2000</year>
<article-title>A whole-genome assembly of drosophila.</article-title>
<source>Science</source>
<volume>287</volume>
<fpage>2196</fpage>
<lpage>2204</lpage>
<pub-id pub-id-type="pmid">10731133</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Mavromatis1">
<label>48</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mavromatis</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Goltsman</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.</article-title>
<source>Nat Methods</source>
<volume>4</volume>
<fpage>495</fpage>
<lpage>500</lpage>
<pub-id pub-id-type="pmid">17468765</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Pevzner1">
<label>49</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>An eulerian path approach to DNA fragment assembly.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>98</volume>
<fpage>9748</fpage>
<lpage>9753</lpage>
<pub-id pub-id-type="pmid">11504945</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Chaisson1">
<label>50</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chaisson</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Short read fragment assembly of bacterial genomes.</article-title>
<source>Genome Res</source>
<volume>18</volume>
<fpage>324</fpage>
<lpage>330</lpage>
<pub-id pub-id-type="pmid">18083777</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Myers2">
<label>51</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>The fragment assembly string graph.</article-title>
<source>Bioinformatics</source>
<volume>21</volume>
<issue>Suppl 2</issue>
<fpage>ii79</fpage>
<lpage>ii85</lpage>
<pub-id pub-id-type="pmid">16204131</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Chaisson2">
<label>52</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chaisson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Pevzner</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Fragment assembly with short reads.</article-title>
<source>Bioinformatics</source>
<volume>20</volume>
<fpage>2067</fpage>
<lpage>2074</lpage>
<pub-id pub-id-type="pmid">15059830</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Zerbino1">
<label>53</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zerbino</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Velvet: algorithms for de novo short read assembly using de bruijn graphs.</article-title>
<source>Genome Res</source>
<volume>18</volume>
<fpage>821</fpage>
<lpage>829</lpage>
<pub-id pub-id-type="pmid">18349386</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sundquist1">
<label>54</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sundquist</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ronaghi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Pevzner</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Whole-genome sequencing and assembly with high-throughput, short-read technologies.</article-title>
<source>PLoS ONE</source>
<volume>2</volume>
<fpage>e484</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0000484">10.1371/journal.pone.0000484</ext-link>
</comment>
<pub-id pub-id-type="pmid">17534434</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Warren1">
<label>55</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Warren</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Holt</surname>
<given-names>RA</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Assembling millions of short DNA sequences using SSAKE.</article-title>
<source>Bioinformatics</source>
<volume>23</volume>
<fpage>500</fpage>
<lpage>501</lpage>
<pub-id pub-id-type="pmid">17158514</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Ye1">
<label>56</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ye</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>An orfome assembly approach to metagenomics sequences analysis.</article-title>
<source>J Bioinform Comput Biol</source>
<volume>7</volume>
<fpage>455</fpage>
<lpage>471</lpage>
<pub-id pub-id-type="pmid">19507285</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Pop1">
<label>57</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Genome assembly reborn: recent computational challenges.</article-title>
<source>Brief Bioinform</source>
<volume>4</volume>
<fpage>354</fpage>
<lpage>366</lpage>
<pub-id pub-id-type="pmid">19482960</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Flicek1">
<label>58</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Flicek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Sense from sequence reads: methods for alignment and assembly.</article-title>
<source>Nat Methods</source>
<volume>6</volume>
<fpage>S6</fpage>
<lpage>S12</lpage>
<pub-id pub-id-type="pmid">19844229</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Yooseph1">
<label>59</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yooseph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>SJ</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>The Sorcerer II global ocean sampling expedition: expanding the universe of protein families.</article-title>
<source>PLoS Biol</source>
<volume>5</volume>
<fpage>e16</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pbio.0050016">10.1371/journal.pbio.0050016</ext-link>
</comment>
<pub-id pub-id-type="pmid">17355171</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Altschul1">
<label>60</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<year>1990</year>
<article-title>Basic local alignment search tool.</article-title>
<source>J Mol Biol</source>
<volume>215</volume>
<fpage>403</fpage>
<lpage>410</lpage>
<pub-id pub-id-type="pmid">2231712</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Altschul2">
<label>61</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Schäffer</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<etal></etal>
</person-group>
<year>1997</year>
<article-title>Gapped blast and psi-blast: a new generation of protein database search programs.</article-title>
<source>Nucleic Acids Res</source>
<volume>25</volume>
<fpage>3389</fpage>
<lpage>3402</lpage>
<pub-id pub-id-type="pmid">9254694</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Azad1">
<label>62</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Azad</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Borodovsky</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.</article-title>
<source>Brief Bioinform</source>
<volume>5</volume>
<fpage>118</fpage>
<lpage>130</lpage>
<pub-id pub-id-type="pmid">15260893</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Yooseph2">
<label>63</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yooseph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering.</article-title>
<source>BMC Bioinformatics</source>
<volume>9</volume>
<fpage>182</fpage>
<pub-id pub-id-type="pmid">18402669</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Hoff1">
<label>64</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoff</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Tech</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lingner</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Daniel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Morgenstern</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Gene prediction in metagenomic fragments: a large scale machine learning approach.</article-title>
<source>BMC Bioinformatics</source>
<volume>9</volume>
<fpage>217</fpage>
<pub-id pub-id-type="pmid">18442389</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Schouls1">
<label>65</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schouls</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Schot</surname>
<given-names>CS</given-names>
</name>
<name>
<surname>Jacobs</surname>
<given-names>JA</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Horizontal transfer of segments of the 16s rRNA genes between species of the streptococcus anginosus group.</article-title>
<source>J Bacteriol</source>
<volume>185</volume>
<fpage>7241</fpage>
<lpage>7246</lpage>
<pub-id pub-id-type="pmid">14645285</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-DeSantis1">
<label>66</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeSantis</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Larsen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rojas</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Brodie</surname>
<given-names>EL</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Greengenes, a chimera-checked 16s rRNA gene database and workbench compatible with arb.</article-title>
<source>Appl Environ Microbiol</source>
<volume>72</volume>
<fpage>5069</fpage>
<lpage>5072</lpage>
<pub-id pub-id-type="pmid">16820507</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Case1">
<label>67</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Case</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Boucher</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Dahllof</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Holmstrom</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Doolittle</surname>
<given-names>FW</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Use of 16s rRNA and rpob genes as molecular markers for microbial ecology studies.</article-title>
<source>Appl Environ Microbiol</source>
<volume>73</volume>
<fpage>278</fpage>
<lpage>288</lpage>
<pub-id pub-id-type="pmid">17071787</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Klappenbach1">
<label>68</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klappenbach</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Saxman</surname>
<given-names>PR</given-names>
</name>
<name>
<surname>Cole</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>TM</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>rrndb: the Ribosomal RNA Operon Copy Number Database.</article-title>
<source>Nucl Acids Res</source>
<volume>29</volume>
<fpage>181</fpage>
<lpage>184</lpage>
<pub-id pub-id-type="pmid">11125085</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Walsh1">
<label>69</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walsh</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Bapteste</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kamekura</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Doolittle</surname>
<given-names>FW</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Evolution of the RNA polymerase b' subunit gene (rpob') in halobacteriales: a complementary molecular marker to the ssu rRNA gene.</article-title>
<source>Mol Biol Evol</source>
<volume>21</volume>
<fpage>2340</fpage>
<lpage>2351</lpage>
<pub-id pub-id-type="pmid">15356285</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Achenbach1">
<label>70</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Achenbach</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Carey</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Madigan</surname>
<given-names>MT</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Photosynthetic and phylogenetic primers for detection of anoxygenic phototrophs in natural environments.</article-title>
<source>Appl Environ Microbiol</source>
<volume>67</volume>
<fpage>2922</fpage>
<lpage>2926</lpage>
<pub-id pub-id-type="pmid">11425703</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-vonMering1">
<label>71</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tringe</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Doerks</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Quantitative phylogenetic assessment of microbial communities in diverse environments.</article-title>
<source>Science</source>
<volume>315</volume>
<fpage>1126</fpage>
<lpage>1130</lpage>
<pub-id pub-id-type="pmid">17272687</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Enright1">
<label>72</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Enright</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Spratt</surname>
<given-names>BG</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Multilocus sequence typing.</article-title>
<source>Trends Microbiol</source>
<volume>7</volume>
<fpage>482</fpage>
<lpage>487</lpage>
<pub-id pub-id-type="pmid">10603483</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Maiden1">
<label>73</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maiden</surname>
<given-names>MCJ</given-names>
</name>
<name>
<surname>Bygraves</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Feil</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Morelli</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>JE</given-names>
</name>
<etal></etal>
</person-group>
<year>1998</year>
<article-title>Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>95</volume>
<fpage>3140</fpage>
<lpage>3145</lpage>
<pub-id pub-id-type="pmid">9501229</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Mahenthiralingam1">
<label>74</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahenthiralingam</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Baldwin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Drevinek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Vanlaere</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Vandamme</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Multilocus sequence typing breathes life into a microbial metagenome.</article-title>
<source>PLoS ONE</source>
<volume>1</volume>
<fpage>e17</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0000017">10.1371/journal.pone.0000017</ext-link>
</comment>
<pub-id pub-id-type="pmid">17183643</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Zhu1">
<label>75</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Massana</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Not</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Marie</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Vaulot</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Mapping of picoeucaryotes in marine ecosystems with quantitative pcr of the 18s rRNA gene.</article-title>
<source>FEMS Microbiol Ecol</source>
<volume>52</volume>
<fpage>79</fpage>
<lpage>92</lpage>
<pub-id pub-id-type="pmid">16329895</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Loram1">
<label>76</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Loram</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Boonham</surname>
<given-names>N</given-names>
</name>
<name>
<surname>O'Toole</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Trapido-Rosenthal</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Douglas</surname>
<given-names>AE</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Molecular quantification of symbiotic dinoflagellate algae of the genus symbiodinium.</article-title>
<source>Biol Bull</source>
<volume>212</volume>
<fpage>259</fpage>
<lpage>268</lpage>
<pub-id pub-id-type="pmid">17565115</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Colwell1">
<label>77</label>
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Colwell</surname>
<given-names>RK</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>EstimateS - statistical estimation of species richness and shared species from samples</article-title>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Schloss1">
<label>78</label>
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Schloss</surname>
<given-names>PK</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Mothur - the one-stop source for your computational microbial ecology needs</article-title>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Knight1">
<label>79</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Maxwell</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Birmingham</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Carnes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Caporaso</surname>
<given-names>JG</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>PyCogent: a toolkit for making sense from sequence.</article-title>
<source>Genome Biol</source>
<volume>8</volume>
<fpage>R171</fpage>
<pub-id pub-id-type="pmid">17708774</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Angly1">
<label>80</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Angly</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Rodriguez-Brito</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Bangor</surname>
<given-names>D</given-names>
</name>
<name>
<surname>McNairnie</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Breitbart</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Phaccs, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information.</article-title>
<source>BMC Bioinformatics</source>
<volume>6</volume>
<fpage>41</fpage>
<pub-id pub-id-type="pmid">15743531</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-1">
<label>81</label>
<mixed-citation publication-type="journal">
<year>1988</year>
<article-title>(1988) International committee on systematic bacteriology announcement of the report of the ad hoc committee on reconciliation of approaches to bacterial systematics.</article-title>
<source>J Appl Bacteriol</source>
<volume>64</volume>
<fpage>283</fpage>
<lpage>284</lpage>
<pub-id pub-id-type="pmid">3170381</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Schbath1">
<label>82</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schbath</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Prum</surname>
<given-names>B</given-names>
</name>
<name>
<surname>de Turckheim</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>Exceptional motifs in different markov chain models for a statistical analysis of DNA sequences.</article-title>
<source>J Comput Biol</source>
<volume>2</volume>
<fpage>417</fpage>
<lpage>437</lpage>
<pub-id pub-id-type="pmid">8521272</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Teeling1">
<label>83</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Teeling</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Waldmann</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lombardot</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Glöckner</surname>
<given-names>FO</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Tetra: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.</article-title>
<source>BMC Bioinformatics</source>
<volume>5</volume>
<fpage>163</fpage>
<pub-id pub-id-type="pmid">15507136</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Mchardy1">
<label>84</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mchardy</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Martín</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Tsirigos</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Rigoutsos</surname>
<given-names>I</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Accurate phylogenetic classification of variable-length DNA fragments.</article-title>
<source>Nat Methods</source>
<volume>4</volume>
<fpage>63</fpage>
<lpage>72</lpage>
<pub-id pub-id-type="pmid">17179938</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Chan1">
<label>85</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>CKKK</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>SLL</given-names>
</name>
<name>
<surname>Halgamuge</surname>
<given-names>SK</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing.</article-title>
<source>J Biomed Biotech</source>
<fpage> 513701</fpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Chan2">
<label>86</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>CKK</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Halgamuge</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>SL</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Binning sequences using very sparse labels within a metagenome.</article-title>
<source>BMC Bioinformatics</source>
<volume>9</volume>
<fpage>215</fpage>
<pub-id pub-id-type="pmid">18442374</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Tzahor1">
<label>87</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tzahor</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Aharonovich</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Kirkup</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Yogev</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Frank</surname>
<given-names>IB</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>A supervised learning approach for taxonomic classification of core-photosystem-ii genes and transcripts in the marine environment.</article-title>
<source>BMC Genomics</source>
<volume>10</volume>
<fpage>229</fpage>
<pub-id pub-id-type="pmid">19445709</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Huson1">
<label>88</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Auch</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>MEGAN analysis of metagenomic data.</article-title>
<source>Genome Res</source>
<volume>17</volume>
<fpage>377</fpage>
<lpage>386</lpage>
<pub-id pub-id-type="pmid">17255551</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Krause1">
<label>89</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krause</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Diaz</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Goesmann</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kelley</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nattkemper</surname>
<given-names>TW</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Phylogenetic classification of short environmental DNA fragments.</article-title>
<source>Nucl Acids Res</source>
<volume>36</volume>
<fpage>2230</fpage>
<lpage>2239</lpage>
<pub-id pub-id-type="pmid">18285365</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Finn1">
<label>90</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Finn</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Tate</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mistry</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Coggill</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Sammut</surname>
<given-names>SJJ</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The pfam protein families database.</article-title>
<source>Nucleic Acids Res</source>
<volume>36</volume>
<fpage>D281</fpage>
<lpage>D288</lpage>
<pub-id pub-id-type="pmid">18039703</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Brady1">
<label>91</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brady</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Phymm and phymmbl: metagenomic phylogenetic classification with interpolated markov models.</article-title>
<source>Nat Methods</source>
<volume>6</volume>
<fpage>673</fpage>
<lpage>676</lpage>
<pub-id pub-id-type="pmid">19648916</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Friedberg1">
<label>92</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedberg</surname>
<given-names>I</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Automated protein function prediction-the genomic challenge.</article-title>
<source>Brief Bioinform</source>
<volume>7</volume>
<fpage>225</fpage>
<lpage>242</lpage>
<pub-id pub-id-type="pmid">16772267</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Kunik1">
<label>93</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kunik</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Meroz</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Solan</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Sandbank</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Weingart</surname>
<given-names>U</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Functional representation of enzymes by specific peptides.</article-title>
<source>PLoS Comput Biol</source>
<volume>3</volume>
<fpage>e167</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.0030167">10.1371/journal.pcbi.0030167</ext-link>
</comment>
<pub-id pub-id-type="pmid">17722976</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sharon1">
<label>94</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Tzahor</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shmoish</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Man-Aharonovich</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Viral photosynthetic reaction center genes and transcripts in the marine environment.</article-title>
<source>ISME J</source>
<volume>1</volume>
<fpage>492</fpage>
<lpage>501</lpage>
<pub-id pub-id-type="pmid">18043651</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Meroz1">
<label>95</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meroz</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Horn</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Biological roles of specific peptides in enzymes.</article-title>
<source>Proteins</source>
<volume>72</volume>
<fpage>606</fpage>
<lpage>612</lpage>
<pub-id pub-id-type="pmid">18247412</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Meyer1">
<label>96</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Paarmann</surname>
<given-names>D</given-names>
</name>
<name>
<surname>D'Souza</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Glass</surname>
<given-names>EM</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The metagenomics rast server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.</article-title>
<source>BMC Bioinformatics</source>
<volume>9</volume>
<fpage>386</fpage>
<pub-id pub-id-type="pmid">18803844</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Li1">
<label>97</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Analysis and comparison of very large metagenomes with fast clustering and functional annotation.</article-title>
<source>BMC Bioinformatics</source>
<volume>10</volume>
<fpage>359</fpage>
<pub-id pub-id-type="pmid">19863816</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Li2">
<label>98</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences.</article-title>
<source>Bioinformatics</source>
<volume>22</volume>
<fpage>1658</fpage>
<lpage>1659</lpage>
<pub-id pub-id-type="pmid">16731699</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Haft1">
<label>99</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haft</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Selengut</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>White</surname>
<given-names>O</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>The TIGRFAMs database of protein families.</article-title>
<source>Nucl Acids Res</source>
<volume>31</volume>
<fpage>371</fpage>
<lpage>373</lpage>
<pub-id pub-id-type="pmid">12520025</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Turnbaugh1">
<label>100</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Mahowald</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Magrini</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>An obesity-associated gut microbiome with increased capacity for energy harvest.</article-title>
<source>Nature</source>
<volume>444</volume>
<fpage>1027</fpage>
<lpage>1031</lpage>
<pub-id pub-id-type="pmid">17183312</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Brulc1">
<label>101</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brulc</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Antonopoulos</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Berg Miller</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>MK</given-names>
</name>
<name>
<surname>Yannarell</surname>
<given-names>AC</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>106</volume>
<fpage>1948</fpage>
<lpage>1953</lpage>
<pub-id pub-id-type="pmid">19181843</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Willner1">
<label>102</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Willner</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Furlan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Haynes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schmieder</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Angly</surname>
<given-names>FE</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals.</article-title>
<source>PLoS ONE</source>
<volume>4</volume>
<fpage>e7370</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0007370">10.1371/journal.pone.0007370</ext-link>
</comment>
<pub-id pub-id-type="pmid">19816605</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Mitra2">
<label>103</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Klar</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Visual and statistical comparison of metagenomes.</article-title>
<source>Bioinformatics</source>
<fpage> btp341</fpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Huson2">
<label>104</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Auch</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Methods for comparative metagenomics.</article-title>
<source>BMC Bioinformatics</source>
<volume>10</volume>
<fpage>S1</fpage>
<lpage>S12</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Markowitz1">
<label>105</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Markowitz</surname>
<given-names>VM</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Szeto</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Palaniappan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>IMG/M: a data management and analysis system for metagenomes.</article-title>
<source>Nucleic Acids Res</source>
<volume>36</volume>
<fpage>D534</fpage>
<lpage>D538</lpage>
<pub-id pub-id-type="pmid">17932063</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Lozupone1">
<label>106</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lozupone</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>UniFrac: a new phylogenetic method for comparing microbial communities.</article-title>
<source>Appl Environ Microbiol</source>
<volume>71</volume>
<fpage>8228</fpage>
<lpage>8235</lpage>
<pub-id pub-id-type="pmid">16332807</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-White1">
<label>107</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>White</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Nagarajan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Statistical methods for detecting differentially abundant features in clinical metagenomic samples.</article-title>
<source>PLoS Comput Biol</source>
<volume>5</volume>
<fpage>e1000352</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1000352">10.1371/journal.pcbi.1000352</ext-link>
</comment>
<pub-id pub-id-type="pmid">19360128</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Giardine1">
<label>108</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giardine</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Riemer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hardison</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Burhans</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Elnitski</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Galaxy: a platform for interactive large-scale genome analysis.</article-title>
<source>Genome Res</source>
<volume>15</volume>
<fpage>1451</fpage>
<lpage>1455</lpage>
<pub-id pub-id-type="pmid">16169926</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Kristiansson1">
<label>109</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kristiansson</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dalevi</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes.</article-title>
<source>Bioinformatics</source>
<volume>25</volume>
<fpage>2737</fpage>
<lpage>2738</lpage>
<pub-id pub-id-type="pmid">19696045</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-R1">
<label>110</label>
<mixed-citation publication-type="book">
<collab>R Development Core Team</collab>
<year>2009</year>
<article-title>R: a language and environment for statistical computing.</article-title>
<publisher-loc>Vienna</publisher-loc>
<publisher-name>R Foundation for Statistical Computing</publisher-name>
<comment>Available:
<ext-link ext-link-type="uri" xlink:href="http://www.R-project.org">http://www.R-project.org</ext-link>
. ISBN 3-900051-07-0</comment>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Bohnebeck1">
<label>111</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bohnebeck</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Lombardot</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kottmann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Glockner</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Metamine - a tool to detect and analyse gene patterns in their environmental context.</article-title>
<source>BMC Bioinformatics</source>
<volume>9</volume>
<fpage>459</fpage>
<pub-id pub-id-type="pmid">18957118</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Lombardot1">
<label>112</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lombardot</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kottmann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Giuliani</surname>
<given-names>G</given-names>
</name>
<name>
<surname>de Bono</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Addor</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Metalook: a 3d visualisation software for marine ecological genomics.</article-title>
<source>BMC Bioinformatics</source>
<volume>8</volume>
<fpage>406</fpage>
<pub-id pub-id-type="pmid">17953757</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Egerton1">
<label>113</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Egerton</surname>
<given-names>FN</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>A history of the ecological sciences, part 19: Leeuwenhoek's microscopic natural history.</article-title>
<source>Ecol Appl</source>
<volume>87</volume>
<fpage>47</fpage>
<lpage>58</lpage>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Gianoulis1">
<label>114</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gianoulis</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>PV</given-names>
</name>
<name>
<surname>Bjornson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Korbel</surname>
<given-names>JO</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Quantifying environmental adaptation of metabolic pathways in metagenomics.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>106</volume>
<fpage>1374</fpage>
<lpage>1379</lpage>
<pub-id pub-id-type="pmid">19164758</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Wu1">
<label>115</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Daugherty</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Van Aken</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Pai</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Watkins</surname>
<given-names>KL</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters.</article-title>
<source>PLoS Biol</source>
<volume>4</volume>
<fpage>e188</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pbio.0040188">10.1371/journal.pbio.0040188</ext-link>
</comment>
<pub-id pub-id-type="pmid">16729848</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Woyke1">
<label>116</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woyke</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Teeling</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Huntemann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Symbiosis insights through metagenomic analysis of a microbial consortium.</article-title>
<source>Nature</source>
<volume>443</volume>
<fpage>950</fpage>
<lpage>955</lpage>
<pub-id pub-id-type="pmid">16980956</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Kannan1">
<label>117</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kannan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>SSS</given-names>
</name>
<name>
<surname>Zhai</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Venter</surname>
<given-names>JCC</given-names>
</name>
<name>
<surname>Manning</surname>
<given-names>G</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Structural and functional diversity of the microbial kinome.</article-title>
<source>PLoS Biol</source>
<volume>5</volume>
<fpage>e17</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pbio.0050017">10.1371/journal.pbio.0050017</ext-link>
</comment>
<pub-id pub-id-type="pmid">17355172</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Brussow1">
<label>118</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brussow</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hendrix</surname>
<given-names>RW</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Phage genomics: small is beautiful.</article-title>
<source>Cell</source>
<volume>108</volume>
<fpage>13</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="pmid">11792317</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Edwards1">
<label>119</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Viral metagenomics.</article-title>
<source>Nat Rev Microbiol</source>
<volume>3</volume>
<fpage>504</fpage>
<lpage>510</lpage>
<pub-id pub-id-type="pmid">15886693</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Mann1">
<label>120</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mann</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Cook</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Millard</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Clokie</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Marine ecosystems: bacterial photosynthesis genes in a virus.</article-title>
<source>Nature</source>
<volume>424</volume>
<fpage>741</fpage>
<pub-id pub-id-type="pmid">12917674</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Millard1">
<label>121</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Millard</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Clokie</surname>
<given-names>MRJ</given-names>
</name>
<name>
<surname>Shub</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Mann</surname>
<given-names>NH</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Genetic organization of the psbad region in phages infecting marine synechococcus strains.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>101</volume>
<fpage>11007</fpage>
<lpage>11012</lpage>
<pub-id pub-id-type="pmid">15263091</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Sharon2">
<label>122</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Alperovitch</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Haynes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Glaser</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Photosystem i gene cassettes are present in marine virus genomes.</article-title>
<source>Nature</source>
<volume>461</volume>
<fpage>258</fpage>
<lpage>262</lpage>
<pub-id pub-id-type="pmid">19710652</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Delwart1">
<label>123</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Delwart</surname>
<given-names>EL</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Viral metagenomics.</article-title>
<source>Rev Med Virol</source>
<volume>17</volume>
<fpage>115</fpage>
<lpage>131</lpage>
<pub-id pub-id-type="pmid">17295196</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Nishizawa1">
<label>124</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nishizawa</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Okamoto</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Konishi</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Yoshizawa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Miyakawa</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<year>1997</year>
<article-title>A novel dna virus (ttv) associated with elevated transaminase levels in posttransfusion hepatitis of unknown etiology.</article-title>
<source>Biochem Biophys Res Commun</source>
<volume>241</volume>
<fpage>92</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="pmid">9405239</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Simons1">
<label>125</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Simons</surname>
<given-names>JN</given-names>
</name>
<name>
<surname>Leary</surname>
<given-names>TP</given-names>
</name>
<name>
<surname>Dawson</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Pilot-Matias</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Muerhoff</surname>
<given-names>AS</given-names>
</name>
<etal></etal>
</person-group>
<year>1995</year>
<article-title>Isolation of novel virus-like sequences associated with human hepatitis.</article-title>
<source>Nat Med</source>
<volume>1</volume>
<fpage>564</fpage>
<lpage>569</lpage>
<pub-id pub-id-type="pmid">7585124</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Yin1">
<label>126</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yin</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>On the origin of microbial orfans: quantifying the strength of the evidence for viral lateral transfer.</article-title>
<source>BMC Evol Biol</source>
<volume>6</volume>
<fpage>63</fpage>
<pub-id pub-id-type="pmid">16914045</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Hambly1">
<label>127</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hambly</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Suttle</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>The viriosphere, diversity, and genetic exchange within phage communities.</article-title>
<source>Curr Opin Microbiol</source>
<volume>8</volume>
<fpage>444</fpage>
<lpage>450</lpage>
<pub-id pub-id-type="pmid">15979387</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Boyer1">
<label>128</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boyer</surname>
<given-names>HW</given-names>
</name>
</person-group>
<year>1971</year>
<article-title>Dna restriction and modification mechanisms in bacteria.</article-title>
<source>Annu Rev Microbiol</source>
<volume>25</volume>
<fpage>153</fpage>
<lpage>176</lpage>
<pub-id pub-id-type="pmid">4949033</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Kass1">
<label>129</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kass</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>How does DNA methylation repress transcription?</article-title>
<source>Trends Genet</source>
<volume>13</volume>
<fpage>444</fpage>
<lpage>449</lpage>
<pub-id pub-id-type="pmid">9385841</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Tost1">
<label>130</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tost</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gut</surname>
<given-names>IG</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Dna methylation analysis by pyrosequencing.</article-title>
<source>Nat Protoc</source>
<volume>2</volume>
<fpage>2265</fpage>
<lpage>2275</lpage>
<pub-id pub-id-type="pmid">17853883</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Batley1">
<label>131</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Batley</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Genome sequence data: management, storage, and visualization.</article-title>
<source>BioTechniques</source>
<volume>46</volume>
<fpage>333</fpage>
<lpage>336</lpage>
<pub-id pub-id-type="pmid">19480628</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Richter2">
<label>132</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Richter</surname>
<given-names>BG</given-names>
</name>
<name>
<surname>Sexton</surname>
<given-names>DP</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Managing and analyzing next-generation sequence data.</article-title>
<source>PLoS Comput Biol</source>
<volume>5</volume>
<fpage>e1000369</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1000369">10.1371/journal.pcbi.1000369</ext-link>
</comment>
<pub-id pub-id-type="pmid">19557125</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Bailly1">
<label>133</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailly</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fraissinet-Tachet</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Verner</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Debaud</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Lemaire</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Soil eukaryotic functional diversity, a metatranscriptomic approach.</article-title>
<source>ISME J</source>
<volume>1</volume>
<fpage>632</fpage>
<lpage>642</lpage>
<pub-id pub-id-type="pmid">18043670</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Wilmes1">
<label>134</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilmes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bond</surname>
<given-names>PL</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms.</article-title>
<source>Environ Microbiol</source>
<volume>6</volume>
<fpage>911</fpage>
<lpage>920</lpage>
<pub-id pub-id-type="pmid">15305916</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1000667-Wilmes2">
<label>135</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilmes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bond</surname>
<given-names>PL</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Metaproteomics: studying functional gene expression in microbial ecosystems.</article-title>
<source>Trends Microbiol</source>
<volume>14</volume>
<fpage>92</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="pmid">16406790</pub-id>
</mixed-citation>
</ref>
</ref-list>
<fn-group>
<fn fn-type="conflict">
<p>The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="financial-disclosure">
<p>The authors acknowledge funding from the Gordon and Betty Moore Foundation (
<ext-link ext-link-type="uri" xlink:href="http://www.moore.org">http://www.moore.org</ext-link>
), grant name Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA). IF acknowledges funding from Miami University start up funds. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</fn>
</fn-group>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000714 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000714 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2829047
   |texte=   A Primer on Metagenomics
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:20195499" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024