Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data

Identifieur interne : 000679 ( Pmc/Corpus ); précédent : 000678; suivant : 000680

Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data

Auteurs : Micah Hamady ; Catherine Lozupone ; Rob Knight

Source :

RBID : PMC:2797552

Abstract

Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible. Insight into community structure has been limited by the inability to integrate and visualize such vast datasets. Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis. Its new array-based implementation offers orders of magnitude improvements over the original version. New 3D visualization of principal coordinates analysis (PCoA) results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities. We demonstrate the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens. We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, revealing patterns relating human and gut samples. Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection. Together, these case studies point the way towards a broad range of applications and demonstrate some of the new features of Fast UniFrac.


Url:
DOI: 10.1038/ismej.2009.97
PubMed: 19710709
PubMed Central: 2797552

Links to Exploration step

PMC:2797552

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data</title>
<author>
<name sortKey="Hamady, Micah" sort="Hamady, Micah" uniqKey="Hamady M" first="Micah" last="Hamady">Micah Hamady</name>
<affiliation>
<nlm:aff id="A1">Department of Computer Science, University of Colorado, Boulder, CO 80309, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lozupone, Catherine" sort="Lozupone, Catherine" uniqKey="Lozupone C" first="Catherine" last="Lozupone">Catherine Lozupone</name>
<affiliation>
<nlm:aff id="A2">Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="A3">Center for Genome Sciences, Washington University School of Medicine, St. Louis, MO 63108, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Knight, Rob" sort="Knight, Rob" uniqKey="Knight R" first="Rob" last="Knight">Rob Knight</name>
<affiliation>
<nlm:aff id="A2">Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="A4">Howard Hughes Medical Institute</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19710709</idno>
<idno type="pmc">2797552</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2797552</idno>
<idno type="RBID">PMC:2797552</idno>
<idno type="doi">10.1038/ismej.2009.97</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000679</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data</title>
<author>
<name sortKey="Hamady, Micah" sort="Hamady, Micah" uniqKey="Hamady M" first="Micah" last="Hamady">Micah Hamady</name>
<affiliation>
<nlm:aff id="A1">Department of Computer Science, University of Colorado, Boulder, CO 80309, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lozupone, Catherine" sort="Lozupone, Catherine" uniqKey="Lozupone C" first="Catherine" last="Lozupone">Catherine Lozupone</name>
<affiliation>
<nlm:aff id="A2">Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="A3">Center for Genome Sciences, Washington University School of Medicine, St. Louis, MO 63108, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Knight, Rob" sort="Knight, Rob" uniqKey="Knight R" first="Rob" last="Knight">Rob Knight</name>
<affiliation>
<nlm:aff id="A2">Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="A4">Howard Hughes Medical Institute</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">The ISME journal</title>
<idno type="ISSN">1751-7362</idno>
<idno type="eISSN">1751-7370</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p id="P1">Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible. Insight into community structure has been limited by the inability to integrate and visualize such vast datasets. Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis. Its new array-based implementation offers orders of magnitude improvements over the original version. New 3D visualization of principal coordinates analysis (PCoA) results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities. We demonstrate the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens. We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, revealing patterns relating human and gut samples. Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection. Together, these case studies point the way towards a broad range of applications and demonstrate some of the new features of Fast UniFrac.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Alexander, E" uniqKey="Alexander E">E Alexander</name>
</author>
<author>
<name sortKey="Stock, A" uniqKey="Stock A">A Stock</name>
</author>
<author>
<name sortKey="Breiner, Hw" uniqKey="Breiner H">HW Breiner</name>
</author>
<author>
<name sortKey="Behnke, A" uniqKey="Behnke A">A Behnke</name>
</author>
<author>
<name sortKey="Bunge, J" uniqKey="Bunge J">J Bunge</name>
</author>
<author>
<name sortKey="Yakimov, Mm" uniqKey="Yakimov M">MM Yakimov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balakirev, Es" uniqKey="Balakirev E">ES Balakirev</name>
</author>
<author>
<name sortKey="Pavlyuchkov, Va" uniqKey="Pavlyuchkov V">VA Pavlyuchkov</name>
</author>
<author>
<name sortKey="Ayala, Fj" uniqKey="Ayala F">FJ Ayala</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bryant, Ja" uniqKey="Bryant J">JA Bryant</name>
</author>
<author>
<name sortKey="Lamanna, C" uniqKey="Lamanna C">C Lamanna</name>
</author>
<author>
<name sortKey="Morlon, H" uniqKey="Morlon H">H Morlon</name>
</author>
<author>
<name sortKey="Kerkhoff, Aj" uniqKey="Kerkhoff A">AJ Kerkhoff</name>
</author>
<author>
<name sortKey="Enquist, Bj" uniqKey="Enquist B">BJ Enquist</name>
</author>
<author>
<name sortKey="Green, Jl" uniqKey="Green J">JL Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Desantis, Tz" uniqKey="Desantis T">TZ DeSantis</name>
</author>
<author>
<name sortKey="Brodie, El" uniqKey="Brodie E">EL Brodie</name>
</author>
<author>
<name sortKey="Moberg, Jp" uniqKey="Moberg J">JP Moberg</name>
</author>
<author>
<name sortKey="Zubieta, Ix" uniqKey="Zubieta I">IX Zubieta</name>
</author>
<author>
<name sortKey="Piceno, Ym" uniqKey="Piceno Y">YM Piceno</name>
</author>
<author>
<name sortKey="Andersen, Gl" uniqKey="Andersen G">GL Andersen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Desantis, Tz" uniqKey="Desantis T">TZ DeSantis</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Larsen, N" uniqKey="Larsen N">N Larsen</name>
</author>
<author>
<name sortKey="Rojas, M" uniqKey="Rojas M">M Rojas</name>
</author>
<author>
<name sortKey="Brodie, El" uniqKey="Brodie E">EL Brodie</name>
</author>
<author>
<name sortKey="Keller, K" uniqKey="Keller K">K Keller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Desnues, C" uniqKey="Desnues C">C Desnues</name>
</author>
<author>
<name sortKey="Rodriguez Brito, B" uniqKey="Rodriguez Brito B">B Rodriguez-Brito</name>
</author>
<author>
<name sortKey="Rayhawk, S" uniqKey="Rayhawk S">S Rayhawk</name>
</author>
<author>
<name sortKey="Kelley, S" uniqKey="Kelley S">S Kelley</name>
</author>
<author>
<name sortKey="Tran, T" uniqKey="Tran T">T Tran</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elifantz, H" uniqKey="Elifantz H">H Elifantz</name>
</author>
<author>
<name sortKey="Waidner, La" uniqKey="Waidner L">LA Waidner</name>
</author>
<author>
<name sortKey="Michelou, Vk" uniqKey="Michelou V">VK Michelou</name>
</author>
<author>
<name sortKey="Cottrell, Mt" uniqKey="Cottrell M">MT Cottrell</name>
</author>
<author>
<name sortKey="Kirchman, Dl" uniqKey="Kirchman D">DL Kirchman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fierer, N" uniqKey="Fierer N">N Fierer</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Lauber, Cl" uniqKey="Lauber C">CL Lauber</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frank, Dn" uniqKey="Frank D">DN Frank</name>
</author>
<author>
<name sortKey="St Amand, Al" uniqKey="St Amand A">AL St Amand</name>
</author>
<author>
<name sortKey="Feldman, Ra" uniqKey="Feldman R">RA Feldman</name>
</author>
<author>
<name sortKey="Boedeker, Ec" uniqKey="Boedeker E">EC Boedeker</name>
</author>
<author>
<name sortKey="Harpaz, N" uniqKey="Harpaz N">N Harpaz</name>
</author>
<author>
<name sortKey="Pace, Nr" uniqKey="Pace N">NR Pace</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fraune, S" uniqKey="Fraune S">S Fraune</name>
</author>
<author>
<name sortKey="Bosch, Tc" uniqKey="Bosch T">TC Bosch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Graham, Ch" uniqKey="Graham C">CH Graham</name>
</author>
<author>
<name sortKey="Fine, Pv" uniqKey="Fine P">PV Fine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grice, Ea" uniqKey="Grice E">EA Grice</name>
</author>
<author>
<name sortKey="Kong, Hh" uniqKey="Kong H">HH Kong</name>
</author>
<author>
<name sortKey="Renaud, G" uniqKey="Renaud G">G Renaud</name>
</author>
<author>
<name sortKey="Young, Ac" uniqKey="Young A">AC Young</name>
</author>
<author>
<name sortKey="Bouffard, Gg" uniqKey="Bouffard G">GG Bouffard</name>
</author>
<author>
<name sortKey="Blakesley, Rw" uniqKey="Blakesley R">RW Blakesley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Walker, Jj" uniqKey="Walker J">JJ Walker</name>
</author>
<author>
<name sortKey="Harris, Jk" uniqKey="Harris J">JK Harris</name>
</author>
<author>
<name sortKey="Gold, Nj" uniqKey="Gold N">NJ Gold</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harrison, Bk" uniqKey="Harrison B">BK Harrison</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Berelson, W" uniqKey="Berelson W">W Berelson</name>
</author>
<author>
<name sortKey="Orphan, Vj" uniqKey="Orphan V">VJ Orphan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hartman, Wh" uniqKey="Hartman W">WH Hartman</name>
</author>
<author>
<name sortKey="Richardson, Cj" uniqKey="Richardson C">CJ Richardson</name>
</author>
<author>
<name sortKey="Vilgalys, R" uniqKey="Vilgalys R">R Vilgalys</name>
</author>
<author>
<name sortKey="Bruland, Gl" uniqKey="Bruland G">GL Bruland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hiibel, Sr" uniqKey="Hiibel S">SR Hiibel</name>
</author>
<author>
<name sortKey="Pereyra, Lp" uniqKey="Pereyra L">LP Pereyra</name>
</author>
<author>
<name sortKey="Inman, Ly" uniqKey="Inman L">LY Inman</name>
</author>
<author>
<name sortKey="Tischer, A" uniqKey="Tischer A">A Tischer</name>
</author>
<author>
<name sortKey="Reisman, Dj" uniqKey="Reisman D">DJ Reisman</name>
</author>
<author>
<name sortKey="Reardon, Kf" uniqKey="Reardon K">KF Reardon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hsu, Sf" uniqKey="Hsu S">SF Hsu</name>
</author>
<author>
<name sortKey="Buckley, Dh" uniqKey="Buckley D">DH Buckley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huber, Ja" uniqKey="Huber J">JA Huber</name>
</author>
<author>
<name sortKey="Mark Welch, Db" uniqKey="Mark Welch D">DB Mark Welch</name>
</author>
<author>
<name sortKey="Morrison, Hg" uniqKey="Morrison H">HG Morrison</name>
</author>
<author>
<name sortKey="Huse, Sm" uniqKey="Huse S">SM Huse</name>
</author>
<author>
<name sortKey="Neal, Pr" uniqKey="Neal P">PR Neal</name>
</author>
<author>
<name sortKey="Butterfield, Da" uniqKey="Butterfield D">DA Butterfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kanagawa, T" uniqKey="Kanagawa T">T Kanagawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Maxwell, P" uniqKey="Maxwell P">P Maxwell</name>
</author>
<author>
<name sortKey="Birmingham, A" uniqKey="Birmingham A">A Birmingham</name>
</author>
<author>
<name sortKey="Carnes, J" uniqKey="Carnes J">J Carnes</name>
</author>
<author>
<name sortKey="Caporaso, Jg" uniqKey="Caporaso J">JG Caporaso</name>
</author>
<author>
<name sortKey="Easton, Bc" uniqKey="Easton B">BC Easton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lauber, Cl" uniqKey="Lauber C">CL Lauber</name>
</author>
<author>
<name sortKey="Sinsabaugh, Rl" uniqKey="Sinsabaugh R">RL Sinsabaugh</name>
</author>
<author>
<name sortKey="Zak, Dr" uniqKey="Zak D">DR Zak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Lozupone, C" uniqKey="Lozupone C">C Lozupone</name>
</author>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ramey, Rr" uniqKey="Ramey R">RR Ramey</name>
</author>
<author>
<name sortKey="Bircher, Js" uniqKey="Bircher J">JS Bircher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Lozupone, Ca" uniqKey="Lozupone C">CA Lozupone</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Klein, S" uniqKey="Klein S">S Klein</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
<author>
<name sortKey="Wang, B" uniqKey="Wang B">B Wang</name>
</author>
<author>
<name sortKey="Zhang, M" uniqKey="Zhang M">M Zhang</name>
</author>
<author>
<name sortKey="Rantalainen, M" uniqKey="Rantalainen M">M Rantalainen</name>
</author>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
<author>
<name sortKey="Zhou, H" uniqKey="Zhou H">H Zhou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lozupone, C" uniqKey="Lozupone C">C Lozupone</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lozupone, C" uniqKey="Lozupone C">C Lozupone</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lozupone, Ca" uniqKey="Lozupone C">CA Lozupone</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Cantarel, Bl" uniqKey="Cantarel B">BL Cantarel</name>
</author>
<author>
<name sortKey="Coutinho, Pm" uniqKey="Coutinho P">PM Coutinho</name>
</author>
<author>
<name sortKey="Henrissat, B" uniqKey="Henrissat B">B Henrissat</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lozupone, Ca" uniqKey="Lozupone C">CA Lozupone</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lozupone, Ca" uniqKey="Lozupone C">CA Lozupone</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ludwig, W" uniqKey="Ludwig W">W Ludwig</name>
</author>
<author>
<name sortKey="Strunk, O" uniqKey="Strunk O">O Strunk</name>
</author>
<author>
<name sortKey="Westram, R" uniqKey="Westram R">R Westram</name>
</author>
<author>
<name sortKey="Richter, L" uniqKey="Richter L">L Richter</name>
</author>
<author>
<name sortKey="Meier, H" uniqKey="Meier H">H Meier</name>
</author>
<author>
<name sortKey="Yadhukumar" uniqKey="Yadhukumar">Yadhukumar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marhaver, Kl" uniqKey="Marhaver K">KL Marhaver</name>
</author>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martin, Ap" uniqKey="Martin A">AP Martin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nasidze, I" uniqKey="Nasidze I">I Nasidze</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Quinque, D" uniqKey="Quinque D">D Quinque</name>
</author>
<author>
<name sortKey="Tang, K" uniqKey="Tang K">K Tang</name>
</author>
<author>
<name sortKey="Stoneking, M" uniqKey="Stoneking M">M Stoneking</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Osman, S" uniqKey="Osman S">S Osman</name>
</author>
<author>
<name sortKey="La Duc, Mt" uniqKey="La Duc M">MT La Duc</name>
</author>
<author>
<name sortKey="Dekas, A" uniqKey="Dekas A">A Dekas</name>
</author>
<author>
<name sortKey="Newcombe, D" uniqKey="Newcombe D">D Newcombe</name>
</author>
<author>
<name sortKey="Venkateswaran, K" uniqKey="Venkateswaran K">K Venkateswaran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Porter, Tm" uniqKey="Porter T">TM Porter</name>
</author>
<author>
<name sortKey="Skillman, Je" uniqKey="Skillman J">JE Skillman</name>
</author>
<author>
<name sortKey="Moncalvo, Jm" uniqKey="Moncalvo J">JM Moncalvo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rawls, Jf" uniqKey="Rawls J">JF Rawls</name>
</author>
<author>
<name sortKey="Mahowald, Ma" uniqKey="Mahowald M">MA Mahowald</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roesch, Lf" uniqKey="Roesch L">LF Roesch</name>
</author>
<author>
<name sortKey="Fulthorpe, Rr" uniqKey="Fulthorpe R">RR Fulthorpe</name>
</author>
<author>
<name sortKey="Riva, A" uniqKey="Riva A">A Riva</name>
</author>
<author>
<name sortKey="Casella, G" uniqKey="Casella G">G Casella</name>
</author>
<author>
<name sortKey="Hadwin, Ak" uniqKey="Hadwin A">AK Hadwin</name>
</author>
<author>
<name sortKey="Kent, Ad" uniqKey="Kent A">AD Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sagaram, Us" uniqKey="Sagaram U">US Sagaram</name>
</author>
<author>
<name sortKey="Deangelis, Km" uniqKey="Deangelis K">KM DeAngelis</name>
</author>
<author>
<name sortKey="Trivedi, P" uniqKey="Trivedi P">P Trivedi</name>
</author>
<author>
<name sortKey="Andersen, Gl" uniqKey="Andersen G">GL Andersen</name>
</author>
<author>
<name sortKey="Lu, Se" uniqKey="Lu S">SE Lu</name>
</author>
<author>
<name sortKey="Wang, N" uniqKey="Wang N">N Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sogin, Ml" uniqKey="Sogin M">ML Sogin</name>
</author>
<author>
<name sortKey="Morrison, Hg" uniqKey="Morrison H">HG Morrison</name>
</author>
<author>
<name sortKey="Huber, Ja" uniqKey="Huber J">JA Huber</name>
</author>
<author>
<name sortKey="Mark Welch, D" uniqKey="Mark Welch D">D Mark Welch</name>
</author>
<author>
<name sortKey="Huse, Sm" uniqKey="Huse S">SM Huse</name>
</author>
<author>
<name sortKey="Neal, Pr" uniqKey="Neal P">PR Neal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Yatsunenko, T" uniqKey="Yatsunenko T">T Yatsunenko</name>
</author>
<author>
<name sortKey="Cantarel, Bl" uniqKey="Cantarel B">BL Cantarel</name>
</author>
<author>
<name sortKey="Duncan, A" uniqKey="Duncan A">A Duncan</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Mahowald, Ma" uniqKey="Mahowald M">MA Mahowald</name>
</author>
<author>
<name sortKey="Magrini, V" uniqKey="Magrini V">V Magrini</name>
</author>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wen, L" uniqKey="Wen L">L Wen</name>
</author>
<author>
<name sortKey="Ley, Re" uniqKey="Ley R">RE Ley</name>
</author>
<author>
<name sortKey="Volchkov, Py" uniqKey="Volchkov P">PY Volchkov</name>
</author>
<author>
<name sortKey="Stranges, Pb" uniqKey="Stranges P">PB Stranges</name>
</author>
<author>
<name sortKey="Avanesyan, L" uniqKey="Avanesyan L">L Avanesyan</name>
</author>
<author>
<name sortKey="Stonebraker, Ac" uniqKey="Stonebraker A">AC Stonebraker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Widmann, J" uniqKey="Widmann J">J Widmann</name>
</author>
<author>
<name sortKey="Hamady, M" uniqKey="Hamady M">M Hamady</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, Kh" uniqKey="Wilson K">KH Wilson</name>
</author>
<author>
<name sortKey="Wilson, Wj" uniqKey="Wilson W">WJ Wilson</name>
</author>
<author>
<name sortKey="Radosevich, Jl" uniqKey="Radosevich J">JL Radosevich</name>
</author>
<author>
<name sortKey="Desantis, Tz" uniqKey="Desantis T">TZ DeSantis</name>
</author>
<author>
<name sortKey="Viswanathan, Vs" uniqKey="Viswanathan V">VS Viswanathan</name>
</author>
<author>
<name sortKey="Kuczmarski, Ta" uniqKey="Kuczmarski T">TA Kuczmarski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, M" uniqKey="Zhu M">M Zhu</name>
</author>
<author>
<name sortKey="Ghodsi, A" uniqKey="Ghodsi A">A Ghodsi</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<pmc-dir>properties manuscript</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-journal-id">101301086</journal-id>
<journal-id journal-id-type="pubmed-jr-id">33338</journal-id>
<journal-id journal-id-type="nlm-ta">ISME J</journal-id>
<journal-id journal-id-type="iso-abbrev">ISME J</journal-id>
<journal-title-group>
<journal-title>The ISME journal</journal-title>
</journal-title-group>
<issn pub-type="ppub">1751-7362</issn>
<issn pub-type="epub">1751-7370</issn>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19710709</article-id>
<article-id pub-id-type="pmc">2797552</article-id>
<article-id pub-id-type="doi">10.1038/ismej.2009.97</article-id>
<article-id pub-id-type="manuscript">NIHMS135997</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hamady</surname>
<given-names>Micah</given-names>
</name>
<xref ref-type="aff" rid="A1">1</xref>
<xref rid="FN2" ref-type="author-notes">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lozupone</surname>
<given-names>Catherine</given-names>
</name>
<xref ref-type="aff" rid="A2">2</xref>
<xref ref-type="aff" rid="A3">3</xref>
<xref rid="FN2" ref-type="author-notes">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Knight</surname>
<given-names>Rob</given-names>
</name>
<xref ref-type="aff" rid="A2">2</xref>
<xref ref-type="aff" rid="A4">4</xref>
</contrib>
</contrib-group>
<aff id="A1">
<label>1</label>
Department of Computer Science, University of Colorado, Boulder, CO 80309, USA</aff>
<aff id="A2">
<label>2</label>
Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA</aff>
<aff id="A3">
<label>3</label>
Center for Genome Sciences, Washington University School of Medicine, St. Louis, MO 63108, USA</aff>
<aff id="A4">
<label>4</label>
Howard Hughes Medical Institute</aff>
<author-notes>
<corresp id="FN1">Correspondence to
<email>rob.knight@colorado.edu</email>
</corresp>
<fn id="FN2" fn-type="equal">
<label>*</label>
<p>The first two authors contributed equally.</p>
</fn>
</author-notes>
<pub-date pub-type="nihms-submitted">
<day>18</day>
<month>8</month>
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>27</day>
<month>8</month>
<year>2009</year>
</pub-date>
<pub-date pub-type="ppub">
<month>1</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>01</day>
<month>7</month>
<year>2010</year>
</pub-date>
<volume>4</volume>
<issue>1</issue>
<fpage>17</fpage>
<lpage>27</lpage>
<pmc-comment>elocation-id from pubmed: 10.1038/ismej.2009.97</pmc-comment>
<permissions>
<license>
<license-p>Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
<ext-link ext-link-type="uri" xlink:href="http://www.nature.com/authors/editorial_policies/license.html#terms">http://www.nature.com/authors/editorial_policies/license.html#terms</ext-link>
</license-p>
</license>
</permissions>
<abstract>
<p id="P1">Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible. Insight into community structure has been limited by the inability to integrate and visualize such vast datasets. Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis. Its new array-based implementation offers orders of magnitude improvements over the original version. New 3D visualization of principal coordinates analysis (PCoA) results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities. We demonstrate the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens. We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, revealing patterns relating human and gut samples. Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection. Together, these case studies point the way towards a broad range of applications and demonstrate some of the new features of Fast UniFrac.</p>
</abstract>
<kwd-group>
<kwd>beta diversity</kwd>
<kwd>community ecology</kwd>
<kwd>multiplex pyrosequencing of 16S rDNA</kwd>
<kwd>PhyloChips</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="S1">
<title>Introduction</title>
<p id="P2">Understanding beta diversity is critical for studies of microbial ecology because of the enormous variation among microbial communities even when those communities are sampled from similar environment types (
<xref rid="R30" ref-type="bibr">Lozupone and Knight, 2007</xref>
). In contrast to alpha diversity, which measures how many kinds of organism are in a single community, beta diversity measures how community membership varies over time or space, and is especially important for finding trends in large numbers of samples (a problem that significance tests for differences between each pair of communities differs cannot address). For example, Human Microbiome Projects (
<xref rid="R43" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2007</xref>
) and related efforts to study microbial communities occupying various human body habitats are revealing a surprising amount of diversity among individuals in skin (
<xref rid="R9" ref-type="bibr">Fierer
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R13" ref-type="bibr">Grice
<italic>et al.</italic>
, 2008</xref>
), gut (
<xref rid="R42" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2009</xref>
), and mouth ecosystems (
<xref rid="R35" ref-type="bibr">Nasidze
<italic>et al.</italic>
, 2009</xref>
). Because all current methods of surveying microbial communities using culture-independent methods introduce inherent biases in DNA extraction and/or amplification of small subunit rRNA genes, patterns that relate different communities may be more meaningful than estimates of diversity or of taxon abundance within a single community (
<xref rid="R20" ref-type="bibr">Kanagawa, 2003</xref>
). Measures of beta diversity can be either taxon-based (using overlap in lists of species, genera, OTUs, etc.) or phylogenetic (using overlap on a phylogenetic tree). Phylogenetic beta diversity measures, such as UniFrac (
<xref rid="R27" ref-type="bibr">Lozupone
<italic>et al.</italic>
, 2006</xref>
;
<xref rid="R28" ref-type="bibr">Lozupone and Knight, 2005</xref>
), are especially important because, unlike taxon-based measures, they exploit the similarities and differences among species (
<xref rid="R12" ref-type="bibr">Graham and Fine, 2008</xref>
;
<xref rid="R31" ref-type="bibr">Lozupone and Knight, 2008</xref>
). This additional information makes phylogenetic beta diversity measures more effective at revealing ecological patterns than taxon-based methods (
<xref rid="R31" ref-type="bibr">Lozupone and Knight, 2008</xref>
).</p>
<p id="P3">Considerable insight has been gained from applying beta diversity methods to microbes in different environments. For example, to date >70 papers have used UniFrac to compare microbial assemblages. These include bacterial (
<xref rid="R16" ref-type="bibr">Hartman
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R18" ref-type="bibr">Hsu and Buckley, 2009</xref>
;
<xref rid="R38" ref-type="bibr">Rawls
<italic>et al.</italic>
, 2006</xref>
), archaeal (
<xref rid="R15" ref-type="bibr">Harrison
<italic>et al.</italic>
, 2009</xref>
), eukaryotic (
<xref rid="R1" ref-type="bibr">Alexander
<italic>et al.</italic>
, 2009</xref>
;
<xref rid="R37" ref-type="bibr">Porter
<italic>et al.</italic>
, 2008</xref>
), and viral (
<xref rid="R7" ref-type="bibr">Desnues
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R33" ref-type="bibr">Marhaver
<italic>et al.</italic>
, 2008</xref>
) assemblages important for understanding human health and disease (
<xref rid="R10" ref-type="bibr">Frank
<italic>et al.</italic>
, 2007</xref>
;
<xref rid="R26" ref-type="bibr">Li
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R36" ref-type="bibr">Osman
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R44" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2006</xref>
;
<xref rid="R45" ref-type="bibr">Wen
<italic>et al.</italic>
, 2008</xref>
), bioremediation (
<xref rid="R17" ref-type="bibr">Hiibel
<italic>et al.</italic>
, 2008</xref>
), and basic ecology and evolution (
<xref rid="R3" ref-type="bibr">Balakirev
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R4" ref-type="bibr">Bryant
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R11" ref-type="bibr">Fraune and Bosch, 2007</xref>
). Applications of UniFrac have focused both on 16S rRNA sequence sets from both Sanger sequencing and pyrosequencing (
<xref rid="R9" ref-type="bibr">Fierer
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R14" ref-type="bibr">Hamady
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R42" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2009</xref>
) and on sequences from genes with other functions (
<xref rid="R8" ref-type="bibr">Elifantz
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R18" ref-type="bibr">Hsu and Buckley, 2009</xref>
;
<xref rid="R22" ref-type="bibr">Lauber
<italic>et al.</italic>
, 2009</xref>
;
<xref rid="R29" ref-type="bibr">Lozupone
<italic>et al.</italic>
, 2008</xref>
).</p>
<p id="P4">Despite the clear advantages of phylogenetic beta diversity approaches, the challenges inherent in building and analyzing trees with thousands to millions of sequences have thus far limited the broad application of these techniques. For example, many recent pyrosequencing studies in a range of environments have used taxon-based methods to compare samples (
<xref rid="R19" ref-type="bibr">Huber
<italic>et al.</italic>
, 2007</xref>
;
<xref rid="R39" ref-type="bibr">Roesch
<italic>et al.</italic>
, 2007</xref>
;
<xref rid="R41" ref-type="bibr">Sogin
<italic>et al.</italic>
, 2006</xref>
), primarily because of these challenges. Similarly, to our knowledge, no PhyloChip studies have yet exploited phylogenetic beta diversity techniques, despite the potential of the PhyloChip to collect data from dozens or hundreds of samples in a cost-effective manner. Here we make these techniques available to the broader community by presenting Fast UniFrac, a much faster version of UniFrac, which allows analysis of much larger datasets. In addition to a stand-alone version, the online version includes more advanced visualizations to facilitate rapid identification of patterns in large and complex datasets. These visualizations include 3D views of any combination of the first 10 principal coordinates, and parallel coordinates plots that plot the position of each sample along each of the first 10 principal coordinates, showing which coordinates discriminate among groups of samples. Parallelization of the resampling techniques, such as jackknifing, makes it more feasible to test whether particular clusters are robust to sampling effort. Together with the ability to accept pyrosequencing and PhyloChip datasets as input, Fast UniFrac should greatly expand our insight into a wide range of microbial processes.</p>
</sec>
<sec sec-type="materials|methods" id="S2">
<title>Materials and Methods</title>
<sec id="S3">
<title>Performance enhancements</title>
<p id="P5">With the goal of supporting very large datasets, including pyrosequencing and PhyloChip datasets produced in association with the Human Microbiome Project, we have redesigned UniFrac so that calculations on the phylogenetic tree are performed using an array-based implementation instead of a tree-based one. In the original implementation of UniFrac, environment data is stored associated with objects representing the nodes of a tree. In order to calculate the UniFrac value for a specific comparison, it is necessary to traverse the tree, assign the states (environments) for the internal nodes based on presence/absence (unweighted) or the sum of the counts (weighted) in the child nodes, and traverse the tree once again to perform the calculations. In the new implementation, we store the environment states in an array, and use accelerated vector operations in the numpy package to propagate states down the tree and to multiply the states by the branch lengths (
<xref rid="F1" ref-type="fig">Fig. 1</xref>
) (in addition, we cache the tree structure implicitly using a nested list of arrays for speed). There are several advantages to this new approach: (i) environment states can be propagated using the cache, which is much faster than using custom tree objects; (ii) by using logical and numerical operations, the whole array of environments or specific pairs of environments, can be isolated as array slices, saving the expensive traversal step; (iii) the tree does not need to be pruned for branches absent from the chosen pair of environments because the branch lengths for those branches get multiplied by 0 (being absent from all environments) and do not contribute to the overall result; and (iv) because the array of counts of each sequence in each environment is contiguous, jackknifing can be performed rapidly. This re-conceptualization also leads to potential future improvements, such as using MPI or other parallelization toolkits and/or GPUs to accelerate the comparisons further. The array-based implementation also uses far less memory and storage space than the tree-based implementation, allowing the same hardware to process much larger datasets. Finally, parallelization of the Monte Carlo operations such as the P test (
<xref rid="R34" ref-type="bibr">Martin, 2002</xref>
) and sequence jackknifing greatly improves the performance of significance tests, and allows larger numbers of replicates so that P-values for rarer events can be estimated. These speed enhancements produce the same final result, but have allowed us to increase the default limits from 5,000 unique sequences, 50 samples, and 100 permutations in the original UniFrac web application to 100,000 unique sequences, 200 samples, and 1000 permutations in the Fast UniFrac web interface.</p>
</sec>
<sec id="S4">
<title>New features</title>
<sec id="S5">
<title>BLAST-based phylogeny generation</title>
<p id="P6">The application of UniFrac to large sequence sets, such as those generated with pyrosequencing, is also limited by the computational power needed to make a
<italic>de novo</italic>
phylogenetic tree using standard methods, such as neighbor joining, likelihood, or parsimony methods. We show below that the analysis of such large sequence sets is possible by assigning them to their closest relative in a phylogeny of the Greengenes core set (
<xref rid="R6" ref-type="bibr">DeSantis
<italic>et al.</italic>
, 2006</xref>
) using BLAST's megablast protocol (
<xref rid="R2" ref-type="bibr">Altschul
<italic>et al.</italic>
, 1990</xref>
). The Greengenes core set reference tree is given as a drop down menu option during upload of data, and a detailed protocol and python script has been provided in the Fast UniFrac tutorial for the generation of a BLAST-based sample mapping file that corresponds to the Greengenes core set or any other reference tree.</p>
</sec>
<sec id="S6">
<title>Visualization enhancements</title>
<p id="P7">As the size and complexity of microbial datasets rapidly increase, so does the difficulty associated with interpreting the results and identifying ecologically meaningful patterns. New ways of exploring and visualizing results are thus essential. Fast UniFrac introduces several powerful tools to assist in visualizations of the results of principal coordinates analysis (PCoA), such as in 3D using the Java KiNG viewer (
<ext-link ext-link-type="uri" xlink:href="http://kinemage.biochem.duke.edu/software/king.php">http://kinemage.biochem.duke.edu/software/king.php</ext-link>
). These tools include (i) the ability to color large collections of samples using different user-defined subcategories (e.g. coloring environmental samples according to temperature or pH), (ii) automatic scaled/unscaled views which accentuate dimensions that explain more variance, (iii) the ability to interactively explore hundreds of points (and user-configurable labels) in three dimensions, (iv) parallel coordinates displays that allow the dimensions that separate particular groups of environments to be readily identified, and (v) scree plots that help researchers more easily discern the number of important dimensions and thus assist in inferring biological significance in complex datasets (
<xref rid="R48" ref-type="bibr">Zhu and Ghodsi, 2006</xref>
).</p>
</sec>
<sec id="S7">
<title>PhyloChip support</title>
<p id="P8">Another new feature is support for PhyloChip data (
<xref rid="R5" ref-type="bibr">DeSantis
<italic>et al.</italic>
, 2007</xref>
;
<xref rid="R47" ref-type="bibr">Wilson
<italic>et al.</italic>
, 2002</xref>
) using the UniFrac export option of the PhyloTrac software (
<ext-link ext-link-type="uri" xlink:href="http://phylotrac.org">http://phylotrac.org</ext-link>
). In the PhyloChip interface, a reference tree allows the comparison of multiple PhyloChip runs: all that is required is a combined mapping file containing abundance information from all of the PhyoChip samples, together with an additional mapping file relating each sample to study meta-data.</p>
</sec>
<sec id="S8">
<title>Usability</title>
<p id="P9">Finally, we added important usability enhancements that allow multiple user-defined category mappings to be uploaded, along with sample descriptions that permit easier and more rapid exploration of the dataset broken down by a range of different parameters or categories. For example, one might want to color a set of mammalian gut samples by diet, by species, by taxonomic order, by continent of origin, etc. to determine which factors were most important in structuring the communities. The ‘
<italic>category mapping</italic>
’ file can also be automatically generated in the Fast UniFrac web interface. When this option is selected, an example category mapping file is generated with a single real subcategory called
<italic>Envs</italic>
containing values identical to the
<italic>sample IDs</italic>
provided in the
<italic>sample ID mapping file</italic>
. In addition to the real subcategory, several placeholder subcategories are created that act as a template for users when the file is downloaded and modified for future runs. Error checking and error correction for problems with the input trees and other input data has been substantially expanded, and numerous other performance-related optimizations substantially accelerate the overall workflow.</p>
</sec>
<sec id="S9">
<title>Sources of data</title>
<p id="P10">Data for testing and validation of Fast UniFrac came from four main sources: (1) a large meta-analysis of Sanger-sequencing data from a wide range of different host-associated and free-living environments (
<xref rid="R24" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008b</xref>
), (2) an analysis of how gut bacterial populations change in obese humans on fat and carbohydrate restricted diets (
<xref rid="R25" ref-type="bibr">Ley
<italic>et al.</italic>
, 2006</xref>
) (3) pyrosequencing studies of the human hand (
<xref rid="R9" ref-type="bibr">Fierer
<italic>et al.</italic>
, 2008</xref>
), and of fecal microbiota of lean and obese twin pairs and their mothers (
<xref rid="R42" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2009</xref>
), and (4) a PhyloChip study of citrus pathogens (
<xref rid="R40" ref-type="bibr">Sagaram
<italic>et al.</italic>
, 2009</xref>
). These studies were chosen as they represent some of the largest datasets for their respective types of analyses. A reference tree was assembled from the Greengenes core set (
<xref rid="R6" ref-type="bibr">DeSantis
<italic>et al.</italic>
, 2006</xref>
): both this tree and the PhyloChip G2 reference tree are available from the Fast UniFrac web site.</p>
</sec>
<sec id="S10">
<title>Phylogenetic methods</title>
<p id="P11">The application of UniFrac to large datasets, such as those generated by pyrosequencing, has been limited by the ability to make
<italic>de novo</italic>
trees using standard tree building methods. Although programs such as ARB's parsimony insertion algorithm (
<xref rid="R32" ref-type="bibr">Ludwig
<italic>et al.</italic>
, 2004</xref>
) have been used to analyze datasets with almost 100,000 sequences (
<xref rid="R24" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008b</xref>
), this technique is very time consuming, and cannot be automated or enhanced by parallelization on high performance clusters for the larger datasets that pyrosequencing produces. We demonstrate that using BLAST's (
<xref rid="R2" ref-type="bibr">Altschul
<italic>et al.</italic>
, 1990</xref>
) megablast method to find the nearest neighbor of each short read in an existing library (in this case the Greengenes core set), recaptures the same patterns detected using the parsimony insertion method of ARB, and that these methods can be applied to pyrosequencing data with hundreds of thousands of sequences. The method of BLASTing sequence reads to an existing phylogeny can be extended to work with any gene and any existing phylogeny.</p>
</sec>
<sec id="S11">
<title>Megablast Protocol</title>
<p id="P12">The Greengenes core set was downloaded from (
<ext-link ext-link-type="uri" xlink:href="http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/11-Aug_2007">http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/11-Aug_2007</ext-link>
) and made into a BLAST database using formatdb. The Global Environment dataset (
<xref rid="R24" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008b</xref>
)(99,801 sequences), the human obesity dataset (
<xref rid="R25" ref-type="bibr">Ley
<italic>et al.</italic>
, 2006</xref>
)(18,348 sequences), and all unique pyrosequences from studies of the human hand, and the fecal microbiota of lean and obese twins (
<xref rid="R9" ref-type="bibr">Fierer
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R42" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2009</xref>
) (232,165 unique sequences from 680,000 initial reads) were then searched against the Greengenes core set using megablast. The hit tables were parsed to make sample ID mapping files, in which each sequence was mapped to its closest hit in the core set. Query sequences that had no hit below an e-value threshold of either 1e-50 or 1e-30 were excluded from the analysis (255 sequences were excluded with a 1e-50 criterion for the Global Environment dataset and 4789 unique sequences for the human pyrosequencing datasets. No sequences were excluded from the human obesity dataset with a 1e-30 criterion). A script for performing this analysis is available in the tutorial at the Fast UniFrac web site.</p>
<p id="P13">A tree containing the same set of sequences as in the Greengenes core set FASTA file was obtained by downloading the most recent ARB database available at the Greengenes site (
<ext-link ext-link-type="uri" xlink:href="http://greengenes.lbl.gov/Download/Sequence_Data/arb_databases/greengenes236469.arb.gz">http://greengenes.lbl.gov/Download/Sequence_Data/arb_databases/greengenes236469.arb.gz</ext-link>
). The database is annotated with a “coreset” field, and searching for sequences with value “1” in that field produced a list approximating the core set. Because the overlap with the core set FASTA file was imperfect (both extra and missing sequences), the missing sequences in the core set FASTA file were added using ARB's parsimony insertion, and then extra sequences were marked and pruned from the tree. The resulting reference tree, that we call “Greengenes Core” is available for download from the tutorial and as a drop down menu option in the Fast UniFrac web site. In addition, to assess the impact that accounting for phylogenetic relationships, as opposed to shared “best hit” information alone, had on the results, we also performed analyses on the Greengenes tree represented as a “star phylogeny,” which was produced by attaching all sequences in the core set to a root node with a branch length of 1.</p>
<p id="P14">Category mapping files were created and the data analyzed through the Fast UniFrac web interface. The category mapping allows for the samples to be grouped by any number of criteria for coloration and dynamic visualization of PCoA analysis results in the 3D visualization using the Java KiNG viewer (
<ext-link ext-link-type="uri" xlink:href="http://kinemage.biochem.duke.edu/software/king.php">http://kinemage.biochem.duke.edu/software/king.php</ext-link>
). Sample and experiment descriptions were also added in this file that are displayed on the samples throughout the interface upload and results pages, aiding in results interpretation.</p>
</sec>
<sec id="S12">
<title>Global Environment ARB parsimony insertion protocol</title>
<p id="P15">Sequences were parsimony inserted into the Greengenes core set in ARB as previously described (
<xref rid="R24" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008b</xref>
). In this analysis, the sequence sets from each sample were dereplicated by the DivergentSet method (
<xref rid="R46" ref-type="bibr">Widmann
<italic>et al.</italic>
, 2006</xref>
) and only one divergent sequence from each sample was used. The environment file from the original analysis was edited so that the sample names conformed to the Fast UniFrac interface conventions (e.g. to remove underscores and other characters with special meanings in the Fast UniFrac web interface). The resulting file was analyzed using the Fast UniFrac web interface, using the same category mapping file as for the megablast to Greengenes dataset.</p>
</sec>
<sec id="S13">
<title>PhyloChip / PhyloTrac Protocol</title>
<p id="P16">PhyloTrac was downloaded from
<ext-link ext-link-type="uri" xlink:href="http://phylotrac.org">http://phylotrac.org</ext-link>
. The CEL data and PhyloTrac thresholds were obtained from a previously published study (
<xref rid="R40" ref-type="bibr">Sagaram
<italic>et al.</italic>
, 2009</xref>
), in which microbial communities from citrus trees infected with the Huanglongbing pathogen and controls were assessed by PhyloChip, and reanalyzed. A Fast UniFrac sample ID mapping file (environment file) was exported from PhyoTrac and uploaded to Fast UniFrac. The G2 PhyloChip was selected as the reference tree, and the category mapping file auto-generated. This file was then downloaded and modified to use the same categories as in the paper, using metadata kindly provided by the authors. Finally, the results were analyzed using the Fast UniFrac web interface.</p>
</sec>
<sec id="S14">
<title>Availability</title>
<p id="P17">The Fast UniFrac Python code is now available in the 1.3 release of the open source PyCogent package (
<xref rid="R21" ref-type="bibr">Knight
<italic>et al.</italic>
, 2007</xref>
), available at
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/pycogent">http://sourceforge.net/projects/pycogent</ext-link>
, and the web interface is available at
<ext-link ext-link-type="uri" xlink:href="http://www.bmf.colorado.edu/fastunifrac">http://www.bmf.colorado.edu/fastunifrac</ext-link>
.</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="S15">
<title>Results</title>
<sec id="S16" sec-type="methods">
<title>Comparing the ARB and BLAST protocols using the global environmental survey dataset</title>
<p id="P18">The ARB parsimony insertion protocol and the megablast protocol gave similar results for the global environment survey, both at the broad level and in detail (
<xref rid="F2" ref-type="fig">Fig. 2A,B,D</xref>
). The amount of variation explained by the principal axes is about the same (PC1 is 7.3% for ARB, 10.0% for BLAST) and the pairwise UniFrac distances between samples were highly correlated for the two protocols (
<xref rid="F2" ref-type="fig">Fig. 2D</xref>
). Perhaps more importantly, the overall clustering patterns are very similar and would yield the same ecological inferences. Samples from the vertebrate gut (blue) clearly separate from free-living environments (magenta, green) along PC1, with the termite gut (orange) and human mouth and skin (particularly from the vulva)(red) having intermediate values. Free-living assemblages separated into saline (magenta) vs. non-saline (green) envrionments along PC3, with mixed habitats (grey) such as estuaries intermediate between the two. In contrast, the results of using megablast to the Greengenes coreset, but using a star phylogeny instead of the core set phylogeny, looked quite different. The amount of variation explained by PC1 is less (4.3%) and the clustering forms a star pattern with less clear separation between samples and environments as in the other two methods. The pairwise UniFrac distances between samples for the star phylogeny and the ARB parsimony insertion protocol were far less correlated (
<xref rid="F2" ref-type="fig">Fig. 2E</xref>
). Overall this shows that the ‘megablast to the Greengenes core set’ protocol is a good alternative to ARB parsimony insertion for making a tree because it produces essentially the same result in a dataset where accounting for phylogenetic relationships affects the results.</p>
<p id="P19">Fifty of the 464 samples in the global environment dataset were also subsampled in order to provide a simpler example for the tutorial at the Fast UniFrac website and to test the robustness of the conclusions to the number of samples used. Despite using only ∼10% of the samples, the same major patterns emerged, with PC1 again separating the vertebrate gut from free-living samples and the termite gut intermediate. Salinity was again an important factor, with saline water separating from non-saline soils and sediment along PC2. This subset demonstrates the robustness of the global environment survey result, and also provides an example dataset for exploring the functionality of the web interface. Because PCoA results can be affected by the number of samples from different groups in the study, redoing the analysis with random subsets of samples is a good way to test the robustness of the results.</p>
</sec>
<sec id="S17" sec-type="methods">
<title>Comparing the ARB and BLAST protocols using the human obesity dataset</title>
<p id="P20">The global environment dataset contained samples from extremely different environments. However, UniFrac is also useful for exploring closely related samples. We thus also tested an example dataset consisting of closely related microbial communities to illustrate that the resolution of the megablast protocol is sufficient for the dynamic monitoring of the same community over time. We repeated the UniFrac analysis reported in Figure 1a of (
<xref rid="R25" ref-type="bibr">Ley
<italic>et al.</italic>
, 2006</xref>
). Here, Ley et al. sequenced the bacteria in stool samples from 11 obese individuals who followed either a fat-restricted (FAT-R) (n=5) or carbohydrate-restricted (CARB-R)(n=6) diet for 3-4 timepoints over the course of a year. Hierarchical clustering based on UniFrac analysis of an ARB parsimony insertion tree showed that the bacterial lineages were remarkably constant within individuals over time, because samples from the same person generally clustered with each other rather than with samples from other people (
<xref rid="R25" ref-type="bibr">Ley
<italic>et al.</italic>
, 2006</xref>
). Repeating this analysis with the megablast to greengenes protocol and Fast UniFrac as described above yielded trees that differed somewhat in the details of the topology, but for which the samples clustered equally well by individual (see
<xref rid="SD1" ref-type="supplementary-material">Supplementary information</xref>
). Thus the megablast protocol provides sufficient resolution for the analysis of similar as well as dissimilar sample types.</p>
</sec>
<sec id="S18" sec-type="methods">
<title>Combining the hand and gut pyrosequencing datasets</title>
<p id="P21">The combination of hand and gut datasets provides the largest combined pyrosequencing 16S rRNA dataset analyzed to date, encompassing 680,000 sequences. PCoA analysis of pairwise weighted UniFrac values shows that, as expected, the difference between the hand and gut samples accounts for the majority of the variation among these samples (63.2%) (
<xref rid="F3" ref-type="fig">Fig. 3</xref>
). Gut samples differentiate along PC2 (10.6%) and skin along PC3 (6.3%), forming two separate gradients, one within the hand samples and one within the gut samples, that are orthogonal to each other (
<xref rid="F3" ref-type="fig">Fig. 3</xref>
). The relative importance of the hand-gut differences is most easily viewed when the axes are scaled by the % of the variation explained in the 3D viewer (
<xref rid="F3" ref-type="fig">Fig. 3A</xref>
). However, the separation between the gut and hand samples in PC axes 2 and 3, can be most easily seen using an unscaled view (
<xref rid="F3" ref-type="fig">Fig. 3B,C</xref>
). The parallel coordinates plot, which is also accessed in the 3D viewer (
<xref rid="F3" ref-type="fig">Fig. 3D</xref>
), allows for easy visualization for which of the first 10 PCoA axes the hand and gut samples vary across, and the scree plot, which is displayed directly in the web interface, allows for easy visualization on the relative and cumulative importance of the different axes (
<xref rid="F3" ref-type="fig">Fig. 3E</xref>
). The major pattern, with orthogonal gradients in hand and gut, is visually immediately obvious but was unsuspected before the datasets were combined.</p>
<p id="P22">This analysis of these pyrosequencing reads would have taken approximately two orders of magnitude longer to perform using the original version of UniFrac on a single CPU. To compare the performance of Fast UniFrac to the original implementation, we sampled 1000-10,000 unique nodes from the reference tree from the hand/gut dataset (225 samples, ∼680,000 sequences) in steps of 1000 (
<xref rid="F4" ref-type="fig">Fig. 4</xref>
). On average, each number of nodes corresponds to a much larger number of sequences because many sequences are abundant across samples. Both implementations were compared on the same set of trees: 10 trees were created for each sample size, and the average is displayed with 95% intervals on a log scale. In general, the new implementation is 10-100 times faster than the original implementation, and the large difference in performance between weighted and unweighted UniFrac in the original implementation is eliminated.</p>
</sec>
<sec id="S19" sec-type="methods">
<title>Analysis of PhyloChip data</title>
<p id="P23">The 24 PhyloChip dataset used was from a study in which leaf samples from citrus trees infected with the Huanglongbing pathogen from several different groves were analyzed using the PhyloChip(
<xref rid="R40" ref-type="bibr">Sagaram
<italic>et al.</italic>
, 2009</xref>
). The entire analysis of 24 PhyloChip samples took Fast UniFrac a matter of minutes after exporting the data from PhyloTrac (
<xref rid="F5" ref-type="fig">Fig. 5</xref>
). Like the original study, we found no significant clustering of the overall community by grove or disease status although the clustering does look suggestive and larger sample sizes could make the patterns more conclusive: the clear arch effect, in which samples are spread along a curve, strongly suggests that there is a single underlying gradient that explains much of the variation in the community, and the scree plot shows that most (>80%) of the variance in the data is explained by the first three principal components. Additional collection of metadata about the individual plants may help explain the major unmeasured sources of variation in the dataset and allow more subtle patterns associated with infection to be detected. The ability to see results colored by different metadata categories in the context of the full dataset is extremely useful for exploratory analyses, and can direct additional sample collection efforts once the overall patterns are clear.</p>
</sec>
</sec>
<sec sec-type="discussion" id="S20">
<title>Discussion</title>
<p id="P24">Our results indicate that the performance increase achieved with Fast UniFrac, and the corresponding ability to perform analyses and meta-analyses of large numbers of samples using readily available techniques (e.g. BLAST and PhyloTrac), will greatly enhance a wide range of studies of microbial ecology. In general, the speedup by two orders of magnitude in processing time, the ability to rapidly color samples according to different criteria and to display more than the first three dimensions for rapid profiling, and the ability to reproduce previous results using a standardized pipeline based on familiar tools will allow many groups to integrate large pyrosequencing and/or PhyloChip studies, thus providing key cyberinfrastructure for Human Microbiome Projects and related efforts.</p>
<p id="P25">Some of the biological findings presented here are intriguing in their own right, although detailed follow-up is beyond the scope of the present paper. We note that the saline/non-saline split in environmental samples (
<xref rid="R30" ref-type="bibr">Lozupone and Knight, 2007</xref>
) and the even deeper split between environmental and host-associated samples (
<xref rid="R24" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008b</xref>
) have now been recaptured using a range of methodologies and appear to be robust. The levels of intra- and interpersonal variability observed within and between human body habitats (
<xref rid="R9" ref-type="bibr">Fierer
<italic>et al.</italic>
, 2008</xref>
;
<xref rid="R10" ref-type="bibr">Frank
<italic>et al.</italic>
, 2007</xref>
;
<xref rid="R23" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008a</xref>
;
<xref rid="R42" ref-type="bibr">Turnbaugh
<italic>et al.</italic>
, 2009</xref>
) suggest that large sample sizes, including time series analyses, will be especially critical for understanding whether or not observed community structures are significantly associated with physiologic or pathophysiologic states. Our re-analysis of the PhyloChip data associated with Huanglongbing pathogen-infected citrus (
<xref rid="R40" ref-type="bibr">Sagaram
<italic>et al.</italic>
, 2009</xref>
) reinforces this point: although we see intriguing differences in intrinsic variability of the leaf communities in different groves, much larger numbers of samples would be required to establish these patterns conclusively. However, the decreasing cost of the PhyloChip and, especially, of barcoded multiplex pyrosequencing (
<xref rid="R14" ref-type="bibr">Hamady
<italic>et al.</italic>
, 2008</xref>
) should provide the statistical power required to observe subtle biomarkers of disease.</p>
<p id="P26">We note that significance tests such as the P test (
<xref rid="R34" ref-type="bibr">Martin, 2002</xref>
) and the UniFrac significance tests become decreasingly useful as the depth of coverage and the number of samples increases. For example, essentially all pairs of pyrosequencing-derived samples we examined in this study are significantly different by the P test (data not shown), since statistical power increases with sampling effort. Performing many pairwise significance tests in studies with many samples, however, has limited meaning because 1) corrections for multiple comparisons, such as the Bonferroni correction, make it difficult to detect real differences because of a high Type II error rate (β errors or false negatives), 2) the number of randomizations that are needed to detect differences becomes prohibitively large, and 3) no information is gained on variation in the degree of difference between sample pairs since significance is a factor of both degree of difference and sampling effort. We recommend a shift in emphasis from testing whether each pair of samples is
<italic>significantly</italic>
different using multivariate methods, such as PCoA and hierarchical clustering to detect broad trends of similarities and differences that relate all samples (a broad suite of statistical techniques, such as the Mantel test, ANOSIM, PERMANOVA, etc. already exists to test for significant differences among categories). If samples really are drawn from a single distribution, as the P test and UniFrac significance test assumes as its null hypothesis, then no large-scale trends will be observed. In contrast, if sample clustering does exist, the ability to relate large-scale differences in community to specific biological observables, such as sample type, pH, salinity, or other variables becomes essential. By allowing the UniFrac distance matrices to be exported for analysis in third-party packages such as R and PRIMER, and by allowing the same principal coordinates projection to be colored many different ways according to different user-supplied categorical variables, Fast UniFrac facilitates insight into the specific variables associated with sample clustering. Similarly, the ability to perform lineage-specific analyses by including only a subset of the tree allows insight into the specific lineages responsible for associations with ecologically important variables.</p>
<p id="P27">In conclusion, we have shown that Fast UniFrac provides order-of-magnitude improvements in speed over the original version, together with many user interface enhancements and connections to other data sources that greatly increase the throughput of analyses. Contribution of the Fast UniFrac code to open-source efforts such as PyCogent (
<xref rid="R21" ref-type="bibr">Knight
<italic>et al.</italic>
, 2007</xref>
) and the Human Microbiome Project Data Analysis and Coordination Center (
<ext-link ext-link-type="uri" xlink:href="http://www.hmpdacc.org/">http://www.hmpdacc.org/</ext-link>
) will provide key cyberinfrastructure as the field moves beyond clone libraries to analyses of hundreds to thousands of PhyloChips or massively parallel sequencing efforts that yield millions of reads.</p>
</sec>
<sec sec-type="supplementary-material" id="S21">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="SD1">
<label>1</label>
<media xlink:href="NIHMS135997-supplement-1.doc" mimetype="application" mime-subtype="msword" xlink:type="simple" id="d37e763" position="anchor"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack id="S23">
<p>We thank Jeffrey I Gordon, Ruth Ley, Noah Fierer, Brian Muegge, Jesse Stombaugh, Daniel McDonald, and Christian Lauber for valuable feedback on the manuscript. This work was supported in part by NIH grants 1R01HG004872-01, 1U01HG004866-01, and P01DK078669, by the Crohn's and Colitis Foundation of America, and by a Bill and Melinda Gates Foundation Mal-ED Network Discovery Project.</p>
</ack>
<fn-group>
<fn id="FN3">
<p>
<bold>Subject Category:</bold>
Microbial population and community ecology</p>
</fn>
<fn id="FN4" fn-type="supplementary-material">
<p>
<bold>Supplementary Information:</bold>
Additional information, including example files and a tutorial are available on the Fast UniFrac website, mentioned above. Supplementary information is also available at the ISME Journal's website.</p>
</fn>
</fn-group>
<ref-list>
<ref id="R1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alexander</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stock</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Breiner</surname>
<given-names>HW</given-names>
</name>
<name>
<surname>Behnke</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bunge</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yakimov</surname>
<given-names>MM</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Microbial eukaryotes in the hypersaline anoxic L'Atalante deep-sea basin</article-title>
<source>Environ Microbiol</source>
<volume>11</volume>
<fpage>360</fpage>
<lpage>81</lpage>
<pub-id pub-id-type="pmid">18826436</pub-id>
</element-citation>
</ref>
<ref id="R2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<year>1990</year>
<article-title>Basic local alignment search tool</article-title>
<source>J Mol Biol</source>
<volume>215</volume>
<fpage>403</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="pmid">2231712</pub-id>
</element-citation>
</ref>
<ref id="R3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Balakirev</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Pavlyuchkov</surname>
<given-names>VA</given-names>
</name>
<name>
<surname>Ayala</surname>
<given-names>FJ</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>DNA variation and symbiotic associations in phenotypically diverse sea urchin Strongylocentrotus intermedius</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>105</volume>
<fpage>16218</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">18852450</pub-id>
</element-citation>
</ref>
<ref id="R4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bryant</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Lamanna</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Morlon</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kerkhoff</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Enquist</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>JL</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Colloquium paper: microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>105</volume>
<issue>1</issue>
<fpage>11505</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="pmid">18695215</pub-id>
</element-citation>
</ref>
<ref id="R5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeSantis</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Brodie</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Moberg</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Zubieta</surname>
<given-names>IX</given-names>
</name>
<name>
<surname>Piceno</surname>
<given-names>YM</given-names>
</name>
<name>
<surname>Andersen</surname>
<given-names>GL</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>High-density universal 16S rRNA microarray analysis reveals broader diversity than typical clone library when sampling the environment</article-title>
<source>Microb Ecol</source>
<volume>53</volume>
<fpage>371</fpage>
<lpage>83</lpage>
<pub-id pub-id-type="pmid">17334858</pub-id>
</element-citation>
</ref>
<ref id="R6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeSantis</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Larsen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rojas</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Brodie</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Keller</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB</article-title>
<source>Appl Environ Microbiol</source>
<volume>72</volume>
<fpage>5069</fpage>
<lpage>72</lpage>
<pub-id pub-id-type="pmid">16820507</pub-id>
</element-citation>
</ref>
<ref id="R7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Desnues</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rodriguez-Brito</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Rayhawk</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kelley</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Haynes</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Biodiversity and biogeography of phages in modern stromatolites and thrombolites</article-title>
<source>Nature</source>
<volume>452</volume>
<fpage>340</fpage>
<lpage>3</lpage>
<pub-id pub-id-type="pmid">18311127</pub-id>
</element-citation>
</ref>
<ref id="R8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elifantz</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Waidner</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Michelou</surname>
<given-names>VK</given-names>
</name>
<name>
<surname>Cottrell</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Kirchman</surname>
<given-names>DL</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Diversity and abundance of glycosyl hydrolase family 5 in the North Atlantic Ocean</article-title>
<source>FEMS Microbiol Ecol</source>
<volume>63</volume>
<fpage>316</fpage>
<lpage>27</lpage>
<pub-id pub-id-type="pmid">18194344</pub-id>
</element-citation>
</ref>
<ref id="R9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fierer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lauber</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>The influence of sex, handedness, and washing on the diversity of hand surface bacteria</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>105</volume>
<fpage>17994</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="pmid">19004758</pub-id>
</element-citation>
</ref>
<ref id="R10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frank</surname>
<given-names>DN</given-names>
</name>
<name>
<surname>St Amand</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Feldman</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Boedeker</surname>
<given-names>EC</given-names>
</name>
<name>
<surname>Harpaz</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pace</surname>
<given-names>NR</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>104</volume>
<fpage>13780</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="pmid">17699621</pub-id>
</element-citation>
</ref>
<ref id="R11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fraune</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bosch</surname>
<given-names>TC</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Long-term maintenance of species-specific bacterial microbiota in the basal metazoan Hydra</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>104</volume>
<fpage>13146</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="pmid">17664430</pub-id>
</element-citation>
</ref>
<ref id="R12">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Graham</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Fine</surname>
<given-names>PV</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Phylogenetic beta diversity: linking ecological and evolutionary processes across space in time</article-title>
<source>Ecol Lett</source>
<volume>11</volume>
<fpage>1265</fpage>
<lpage>77</lpage>
<pub-id pub-id-type="pmid">19046358</pub-id>
</element-citation>
</ref>
<ref id="R13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grice</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Renaud</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Bouffard</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>Blakesley</surname>
<given-names>RW</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>A diversity profile of the human skin microbiota</article-title>
<source>Genome Res</source>
<volume>18</volume>
<fpage>1043</fpage>
<lpage>50</lpage>
<pub-id pub-id-type="pmid">18502944</pub-id>
</element-citation>
</ref>
<ref id="R14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Walker</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Gold</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex</article-title>
<source>Nat Methods</source>
<volume>5</volume>
<fpage>235</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">18264105</pub-id>
</element-citation>
</ref>
<ref id="R15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harrison</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Berelson</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Orphan</surname>
<given-names>VJ</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Variations in archaeal and bacterial diversity associated with the sulfate-methane transition zone in continental margin sediments (Santa Barbara Basin, California)</article-title>
<source>Appl Environ Microbiol</source>
<volume>75</volume>
<fpage>1487</fpage>
<lpage>99</lpage>
<pub-id pub-id-type="pmid">19139232</pub-id>
</element-citation>
</ref>
<ref id="R16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hartman</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Vilgalys</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bruland</surname>
<given-names>GL</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Environmental and anthropogenic controls over bacterial communities in wetland soils</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>105</volume>
<fpage>17842</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">19004771</pub-id>
</element-citation>
</ref>
<ref id="R17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hiibel</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Pereyra</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Inman</surname>
<given-names>LY</given-names>
</name>
<name>
<surname>Tischer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reisman</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Reardon</surname>
<given-names>KF</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Microbial community analysis of two field-scale sulfate-reducing bioreactors treating mine drainage</article-title>
<source>Environ Microbiol</source>
<volume>10</volume>
<fpage>2087</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="pmid">18430021</pub-id>
</element-citation>
</ref>
<ref id="R18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hsu</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Buckley</surname>
<given-names>DH</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Evidence for the functional significance of diazotroph community structure in soil</article-title>
<source>ISME J</source>
<volume>3</volume>
<fpage>124</fpage>
<lpage>36</lpage>
<pub-id pub-id-type="pmid">18769458</pub-id>
</element-citation>
</ref>
<ref id="R19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huber</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Mark Welch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Huse</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Neal</surname>
<given-names>PR</given-names>
</name>
<name>
<surname>Butterfield</surname>
<given-names>DA</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Microbial population structures in the deep marine biosphere</article-title>
<source>Science</source>
<volume>318</volume>
<fpage>97</fpage>
<lpage>100</lpage>
<pub-id pub-id-type="pmid">17916733</pub-id>
</element-citation>
</ref>
<ref id="R20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanagawa</surname>
<given-names>T</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Bias and artifacts in multitemplate polymerase chain reactions (PCR)</article-title>
<source>J Biosci Bioeng</source>
<volume>96</volume>
<fpage>317</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">16233530</pub-id>
</element-citation>
</ref>
<ref id="R21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Maxwell</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Birmingham</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Carnes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Caporaso</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>Easton</surname>
<given-names>BC</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>PyCogent: a toolkit for making sense from sequence</article-title>
<source>Genome Biol</source>
<volume>8</volume>
<fpage>R171</fpage>
<pub-id pub-id-type="pmid">17708774</pub-id>
</element-citation>
</ref>
<ref id="R22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lauber</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Sinsabaugh</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Zak</surname>
<given-names>DR</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Laccase gene composition and relative abundance in oak forest soil is not affected by short-term nitrogen fertilization</article-title>
<source>Microb Ecol</source>
<volume>57</volume>
<fpage>50</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">18758844</pub-id>
</element-citation>
</ref>
<ref id="R23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lozupone</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ramey</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Bircher</surname>
<given-names>JS</given-names>
</name>
<etal></etal>
</person-group>
<year>2008a</year>
<article-title>Evolution of mammals and their gut microbes</article-title>
<source>Science</source>
<volume>320</volume>
<fpage>1647</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="pmid">18497261</pub-id>
</element-citation>
</ref>
<ref id="R24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Lozupone</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<year>2008b</year>
<article-title>Worlds within worlds: evolution of the vertebrate gut microbiota</article-title>
<source>Nat Rev Microbiol</source>
<volume>6</volume>
<fpage>776</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="pmid">18794915</pub-id>
</element-citation>
</ref>
<ref id="R25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Klein</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Microbial ecology: human gut microbes associated with obesity</article-title>
<source>Nature</source>
<volume>444</volume>
<fpage>1022</fpage>
<lpage>3</lpage>
<pub-id pub-id-type="pmid">17183309</pub-id>
</element-citation>
</ref>
<ref id="R26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rantalainen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Symbiotic gut microbes modulate human metabolic phenotypes</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>105</volume>
<fpage>2117</fpage>
<lpage>22</lpage>
<pub-id pub-id-type="pmid">18252821</pub-id>
</element-citation>
</ref>
<ref id="R27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lozupone</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>UniFrac--an online tool for comparing microbial community diversity in a phylogenetic context</article-title>
<source>BMC Bioinformatics</source>
<volume>7</volume>
<fpage>371</fpage>
<pub-id pub-id-type="pmid">16893466</pub-id>
</element-citation>
</ref>
<ref id="R28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lozupone</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>UniFrac: a new phylogenetic method for comparing microbial communities</article-title>
<source>Appl Environ Microbiol</source>
<volume>71</volume>
<fpage>8228</fpage>
<lpage>35</lpage>
<pub-id pub-id-type="pmid">16332807</pub-id>
</element-citation>
</ref>
<ref id="R29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lozupone</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Cantarel</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Coutinho</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Henrissat</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The convergence of carbohydrate active gene repertoires in human gut microbes</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>105</volume>
<fpage>15076</fpage>
<lpage>81</lpage>
<pub-id pub-id-type="pmid">18806222</pub-id>
</element-citation>
</ref>
<ref id="R30">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lozupone</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Global patterns in bacterial diversity</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>104</volume>
<fpage>11436</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="pmid">17592124</pub-id>
</element-citation>
</ref>
<ref id="R31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lozupone</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Species divergence and the measurement of microbial diversity</article-title>
<source>FEMS Microbiol Rev</source>
<volume>32</volume>
<fpage>557</fpage>
<lpage>78</lpage>
<pub-id pub-id-type="pmid">18435746</pub-id>
</element-citation>
</ref>
<ref id="R32">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ludwig</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Strunk</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Westram</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Meier</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Yadhukumar</surname>
</name>
<etal></etal>
</person-group>
<year>2004</year>
<article-title>ARB: a software environment for sequence data</article-title>
<source>Nucleic Acids Res</source>
<volume>32</volume>
<fpage>1363</fpage>
<lpage>71</lpage>
<pub-id pub-id-type="pmid">14985472</pub-id>
</element-citation>
</ref>
<ref id="R33">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marhaver</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Viral communities associated with healthy and bleaching corals</article-title>
<source>Environ Microbiol</source>
<volume>10</volume>
<fpage>2277</fpage>
<lpage>86</lpage>
<pub-id pub-id-type="pmid">18479440</pub-id>
</element-citation>
</ref>
<ref id="R34">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>AP</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Phylogenetic approaches for describing and comparing the diversity of microbial communities</article-title>
<source>Appl Environ Microbiol</source>
<volume>68</volume>
<fpage>3673</fpage>
<lpage>82</lpage>
<pub-id pub-id-type="pmid">12147459</pub-id>
</element-citation>
</ref>
<ref id="R35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nasidze</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Quinque</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Stoneking</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Global diversity in the human salivary microbiome</article-title>
<source>Genome Res</source>
</element-citation>
</ref>
<ref id="R36">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Osman</surname>
<given-names>S</given-names>
</name>
<name>
<surname>La Duc</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Dekas</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Newcombe</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Venkateswaran</surname>
<given-names>K</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Microbial burden and diversity of commercial airline cabin air during short and long durations of travel</article-title>
<source>ISME J</source>
<volume>2</volume>
<fpage>482</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="pmid">18256704</pub-id>
</element-citation>
</ref>
<ref id="R37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Porter</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Skillman</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Moncalvo</surname>
<given-names>JM</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Fruiting body and soil rDNA sampling detects complementary assemblage of Agaricomycotina (Basidiomycota, Fungi) in a hemlock-dominated forest plot in southern Ontario</article-title>
<source>Mol Ecol</source>
<volume>17</volume>
<fpage>3037</fpage>
<lpage>50</lpage>
<pub-id pub-id-type="pmid">18494767</pub-id>
</element-citation>
</ref>
<ref id="R38">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rawls</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Mahowald</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Reciprocal gut microbiota transplants from zebrafish and mice to germ-free recipients reveal host habitat selection</article-title>
<source>Cell</source>
<volume>127</volume>
<fpage>423</fpage>
<lpage>33</lpage>
<pub-id pub-id-type="pmid">17055441</pub-id>
</element-citation>
</ref>
<ref id="R39">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roesch</surname>
<given-names>LF</given-names>
</name>
<name>
<surname>Fulthorpe</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Riva</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Casella</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hadwin</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Kent</surname>
<given-names>AD</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Pyrosequencing enumerates and contrasts soil microbial diversity</article-title>
<source>ISME J</source>
<volume>1</volume>
<fpage>283</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="pmid">18043639</pub-id>
</element-citation>
</ref>
<ref id="R40">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sagaram</surname>
<given-names>US</given-names>
</name>
<name>
<surname>DeAngelis</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Trivedi</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Andersen</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>N</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Bacterial diversity analysis of Huanglongbing pathogen-infected citrus, using PhyloChip arrays and 16S rRNA gene clone library sequencing</article-title>
<source>Appl Environ Microbiol</source>
<volume>75</volume>
<fpage>1566</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="pmid">19151177</pub-id>
</element-citation>
</ref>
<ref id="R41">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sogin</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Mark Welch</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Huse</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Neal</surname>
<given-names>PR</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Microbial diversity in the deep sea and the underexplored “rare biosphere”</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>103</volume>
<fpage>12115</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="pmid">16880384</pub-id>
</element-citation>
</ref>
<ref id="R42">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yatsunenko</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Cantarel</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Duncan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>A core gut microbiome in obese and lean twins</article-title>
<source>Nature</source>
<volume>457</volume>
<fpage>480</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="pmid">19043404</pub-id>
</element-citation>
</ref>
<ref id="R43">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fraser-Liggett</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>The human microbiome project</article-title>
<source>Nature</source>
<volume>449</volume>
<fpage>804</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="pmid">17943116</pub-id>
</element-citation>
</ref>
<ref id="R44">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Mahowald</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Magrini</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>An obesity-associated gut microbiome with increased capacity for energy harvest</article-title>
<source>Nature</source>
<volume>444</volume>
<fpage>1027</fpage>
<lpage>31</lpage>
<pub-id pub-id-type="pmid">17183312</pub-id>
</element-citation>
</ref>
<ref id="R45">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Volchkov</surname>
<given-names>PY</given-names>
</name>
<name>
<surname>Stranges</surname>
<given-names>PB</given-names>
</name>
<name>
<surname>Avanesyan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stonebraker</surname>
<given-names>AC</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Innate immunity and intestinal microbiota in the development of Type 1 diabetes</article-title>
<source>Nature</source>
<volume>455</volume>
<fpage>1109</fpage>
<lpage>13</lpage>
<pub-id pub-id-type="pmid">18806780</pub-id>
</element-citation>
</ref>
<ref id="R46">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Widmann</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hamady</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>DivergentSet, a tool for picking non-redundant sequences from large sequence collections</article-title>
<source>Mol Cell Proteomics</source>
<volume>5</volume>
<fpage>1520</fpage>
<lpage>32</lpage>
<pub-id pub-id-type="pmid">16769708</pub-id>
</element-citation>
</ref>
<ref id="R47">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilson</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Radosevich</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>DeSantis</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Viswanathan</surname>
<given-names>VS</given-names>
</name>
<name>
<surname>Kuczmarski</surname>
<given-names>TA</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>High-density microarray of small-subunit ribosomal DNA probes</article-title>
<source>Appl Environ Microbiol</source>
<volume>68</volume>
<fpage>2535</fpage>
<lpage>41</lpage>
<pub-id pub-id-type="pmid">11976131</pub-id>
</element-citation>
</ref>
<ref id="R48">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ghodsi</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Automatic dimensionality selection from the scree plot via the use of profile likelihood</article-title>
<source>Computational Statistics and Data Analysis</source>
<volume>51</volume>
<fpage>918</fpage>
<lpage>930</lpage>
</element-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>Difference in procedure between the original UniFrac and the new Fast UniFrac (for clarity, only the unweighted UniFrac algorithm is shown here, but similar principles apply to weighted UniFrac). In the original procedure, (A) environments are stored as sets in a tree object, (B) the tree is pruned to include only the branches leading to wanted environments, (C) the sets of environments are compared using set algorithms, states are assigned to each internal node, and (D) the result is calculated by another tree traversal. In the new procedure, (E) the environments are stored as an array of tip × environment counts, (F) selected environments are chosen by slicing this array, (G) internal states are calculated using array operations on slices of the array, and (H) the products of the incidence array and the branch lengths of nodes leading to either or both of the environments are summed, allowing calculation of the UniFrac value. The array-based approach allows substantial gains in efficiency.</p>
</caption>
<graphic xlink:href="nihms135997f1"></graphic>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>Global Environmental Survey dataset (
<xref rid="R24" ref-type="bibr">Ley
<italic>et al.</italic>
, 2008b</xref>
) analyzed using PCoA of unweighted pairwise UniFrac distances with trees generated using 1) megablast mapping to the Greengenes core set tree (A), 2) an ARB parsimony insertion tree (B) and 3) megablast mapping to the Greengenes core set represented as a star phylogeny (i.e. a phylogeny in which all taxa are treated as equally related, ignoring the actual phylogenetic information) (C). All plots show the first three principal axes as visualized in the 3D viewer. Scatterplots of the pairwise UniFrac distances (D, E), as well as the PCoA analysis, show that megablast to the Greengenes core set produced similar results as ARB parsimony insertion, but only when the phylogenetic relationships in the Greengenes core set are considered.</p>
</caption>
<graphic xlink:href="nihms135997f2"></graphic>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>Principal coordinates analysis of Weighted UniFrac values between hand (blue) and gut (red) pyrosequencing datasets with the axes scaled by the percentage of the variance that they contain (A) or unscaled (B,C). Panel B plots PC1 vs PC2 and Panel C plots PC1 vs PC3. A parallel coordinates plot (D) allows visualization of which of the first 10 PC axes the hand vs. gut samples are varying across: in this display, the position of each sample along each of the first 10 axes is plotted (for example, the hand samples score high on PC1 and the gut samples score low, so on the first line, for PC1, the hand samples have high values and the gut samples have low values). A scree plot (E) allows for easy visualization of the % fraction of the variance explained by the first 10 PC axes, both individually (red) and cumulatively (blue).</p>
</caption>
<graphic xlink:href="nihms135997f3"></graphic>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>Performance of Fast UniFrac versus original implementation on sample sizes ranging from 1000 to 10,000 sequences. Fast UniFrac implementation is consistently about 2 orders of magnitude faster, and largely eliminates the difference in time to calculate weighted and unweighted UniFrac metrics.</p>
</caption>
<graphic xlink:href="nihms135997f4"></graphic>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>Example PhyoChip analysis performed using PhlyoTrac and Fast UniFrac. (A) Exporting the environment file from PhyloTrac, (B) uploading to Fast UniFrac, (C) viewing weighted Fast UniFrac PCoA results in the web interface directly (in this display, each point is a sample, and we see a 2D projection of the first two principal coordinates obtained by PCoA; the relatively smooth curve suggests that there is a gradient connecting the samples), (D) viewing unweighted Fast UniFrac ordination results in the linked 3D viewer: again, each point is a sample and the distances are calculated by PCoA of the UniFrac distances, but in this case three dimensions are shown, and (E) a scree plot showing how much of the variation is explained singly or cumulatively by each of the first 10 principal coordinates, allowing the user to see that, for example, the first three principal coordinates together explain over 80% of the variance in the samples. As reported in the original study, no clear patterns are readily seen using ordination, but demonstrates the speed and ease with which this sort analysis can now be performed.</p>
</caption>
<graphic xlink:href="nihms135997f5"></graphic>
</fig>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000679 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000679 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2797552
   |texte=   Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:19710709" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024