Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000030 ( Pmc/Corpus ); précédent : 0000299; suivant : 0000310 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences</title>
<author>
<name sortKey="Ogilvie, Lesley A" sort="Ogilvie, Lesley A" uniqKey="Ogilvie L" first="Lesley A." last="Ogilvie">Lesley A. Ogilvie</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bowler, Lucas D" sort="Bowler, Lucas D" uniqKey="Bowler L" first="Lucas D." last="Bowler">Lucas D. Bowler</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Caplin, Jonathan" sort="Caplin, Jonathan" uniqKey="Caplin J" first="Jonathan" last="Caplin">Jonathan Caplin</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dedi, Cinzia" sort="Dedi, Cinzia" uniqKey="Dedi C" first="Cinzia" last="Dedi">Cinzia Dedi</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Diston, David" sort="Diston, David" uniqKey="Diston D" first="David" last="Diston">David Diston</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a4">Present address: Mikrobiologische and Biotechnologische Risiken Bundesamt für Gesundheit BAG, 3003 Bern, Switzerland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cheek, Elizabeth" sort="Cheek, Elizabeth" uniqKey="Cheek E" first="Elizabeth" last="Cheek">Elizabeth Cheek</name>
<affiliation>
<nlm:aff id="a3">
<institution>School of Computing, Engineering and Mathematics, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Huw" sort="Taylor, Huw" uniqKey="Taylor H" first="Huw" last="Taylor">Huw Taylor</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ebdon, James E" sort="Ebdon, James E" uniqKey="Ebdon J" first="James E." last="Ebdon">James E. Ebdon</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jones, Brian V" sort="Jones, Brian V" uniqKey="Jones B" first="Brian V." last="Jones">Brian V. Jones</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24036533</idno>
<idno type="pmc">3778543</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3778543</idno>
<idno type="RBID">PMC:3778543</idno>
<idno type="doi">10.1038/ncomms3420</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000030</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences</title>
<author>
<name sortKey="Ogilvie, Lesley A" sort="Ogilvie, Lesley A" uniqKey="Ogilvie L" first="Lesley A." last="Ogilvie">Lesley A. Ogilvie</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bowler, Lucas D" sort="Bowler, Lucas D" uniqKey="Bowler L" first="Lucas D." last="Bowler">Lucas D. Bowler</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Caplin, Jonathan" sort="Caplin, Jonathan" uniqKey="Caplin J" first="Jonathan" last="Caplin">Jonathan Caplin</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dedi, Cinzia" sort="Dedi, Cinzia" uniqKey="Dedi C" first="Cinzia" last="Dedi">Cinzia Dedi</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Diston, David" sort="Diston, David" uniqKey="Diston D" first="David" last="Diston">David Diston</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a4">Present address: Mikrobiologische and Biotechnologische Risiken Bundesamt für Gesundheit BAG, 3003 Bern, Switzerland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cheek, Elizabeth" sort="Cheek, Elizabeth" uniqKey="Cheek E" first="Elizabeth" last="Cheek">Elizabeth Cheek</name>
<affiliation>
<nlm:aff id="a3">
<institution>School of Computing, Engineering and Mathematics, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Huw" sort="Taylor, Huw" uniqKey="Taylor H" first="Huw" last="Taylor">Huw Taylor</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ebdon, James E" sort="Ebdon, James E" uniqKey="Ebdon J" first="James E." last="Ebdon">James E. Ebdon</name>
<affiliation>
<nlm:aff id="a2">
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jones, Brian V" sort="Jones, Brian V" uniqKey="Jones B" first="Brian V." last="Jones">Brian V. Jones</name>
<affiliation>
<nlm:aff id="a1">
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nature Communications</title>
<idno type="eISSN">2041-1723</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes and access subliminal, phylogenetically targeted, phage sequences present within. We describe a portion of the biological dark matter extant in the human gut virome, and bring to light a population of potentially gut-specific
<italic>Bacteroidales</italic>
-like phage, poorly represented in existing virus like particle-derived viral metagenomes. These predominantly temperate phage were shown to encode functions of direct relevance to human health in the form of antibiotic resistance genes, and provided evidence for the existence of putative ‘viral-enterotypes’ among this fraction of the human gut virome.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Suttle, C A" uniqKey="Suttle C">C. A. Suttle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wommack, K E" uniqKey="Wommack K">K. E. Wommack</name>
</author>
<author>
<name sortKey="Colwell, R R" uniqKey="Colwell R">R. R. Colwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reyes, A" uniqKey="Reyes A">A. Reyes</name>
</author>
<author>
<name sortKey="Semenkovich, N P" uniqKey="Semenkovich N">N. P. Semenkovich</name>
</author>
<author>
<name sortKey="Whiteson, K" uniqKey="Whiteson K">K. Whiteson</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F. Rohwer</name>
</author>
<author>
<name sortKey="Gordon, J I" uniqKey="Gordon J">J. I. Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fuhrman, J A" uniqKey="Fuhrman J">J. A. Fuhrman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brussow, H" uniqKey="Brussow H">H. Brüssow</name>
</author>
<author>
<name sortKey="Canchaya, C" uniqKey="Canchaya C">C. Canchaya</name>
</author>
<author>
<name sortKey="Hardt, W D" uniqKey="Hardt W">W.-D. Hardt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M. Breitbart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Minot, S" uniqKey="Minot S">S. Minot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stern, A" uniqKey="Stern A">A. Stern</name>
</author>
<author>
<name sortKey="Mick, E" uniqKey="Mick E">E. Mick</name>
</author>
<author>
<name sortKey="Tirosh, I" uniqKey="Tirosh I">I. Tirosh</name>
</author>
<author>
<name sortKey="Sagy, O" uniqKey="Sagy O">O. Sagy</name>
</author>
<author>
<name sortKey="Sorek, R" uniqKey="Sorek R">R. Sorek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Williamson, S J" uniqKey="Williamson S">S. J. Williamson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angly, F E" uniqKey="Angly F">F. E. Angly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reyes, A" uniqKey="Reyes A">A. Reyes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Caporaso, J G" uniqKey="Caporaso J">J. G. Caporaso</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R. Knight</name>
</author>
<author>
<name sortKey="Kelley, S T" uniqKey="Kelley S">S. T. Kelley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ogilvie, L A" uniqKey="Ogilvie L">L. A. Ogilvie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lepage, P" uniqKey="Lepage P">P. Lepage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, B V" uniqKey="Jones B">B. V. Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gorski, A" uniqKey="Gorski A">A. Gorski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Colomer Lluch, M" uniqKey="Colomer Lluch M">M. Colomer-Lluch</name>
</author>
<author>
<name sortKey="Jofre, J" uniqKey="Jofre J">J. Jofre</name>
</author>
<author>
<name sortKey="Muniesa, M" uniqKey="Muniesa M">M. Muniesa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waldor, M K" uniqKey="Waldor M">M. K. Waldor</name>
</author>
<author>
<name sortKey="Mekalanos, J J" uniqKey="Mekalanos J">J. J. Mekalanos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F. Rohwer</name>
</author>
<author>
<name sortKey="Prangishvili, D" uniqKey="Prangishvili D">D. Prangishvili</name>
</author>
<author>
<name sortKey="Lindell, D" uniqKey="Lindell D">D. Lindell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thurber, R V" uniqKey="Thurber R">R. V. Thurber</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M. Haynes</name>
</author>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M. Breitbart</name>
</author>
<author>
<name sortKey="Wegley, L" uniqKey="Wegley L">L. Wegley</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F. Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qin, J" uniqKey="Qin J">J. Qin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pride, D T" uniqKey="Pride D">D. T. Pride</name>
</author>
<author>
<name sortKey="Meinersmann, R J" uniqKey="Meinersmann R">R. J. Meinersmann</name>
</author>
<author>
<name sortKey="Wassenaar, T M" uniqKey="Wassenaar T">T. M. Wassenaar</name>
</author>
<author>
<name sortKey="Blaser, M J" uniqKey="Blaser M">M. J. Blaser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pride, D T" uniqKey="Pride D">D. T. Pride</name>
</author>
<author>
<name sortKey="Wassenaar, T M" uniqKey="Wassenaar T">T. M. Wassenaar</name>
</author>
<author>
<name sortKey="Ghose, C" uniqKey="Ghose C">C. Ghose</name>
</author>
<author>
<name sortKey="Blaser, M J" uniqKey="Blaser M">M. J. Blaser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deschavanne, P" uniqKey="Deschavanne P">P. Deschavanne</name>
</author>
<author>
<name sortKey="Dubow, M S" uniqKey="Dubow M">M. S. DuBow</name>
</author>
<author>
<name sortKey="Regeard, C" uniqKey="Regeard C">C. Regeard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marchler Bauer, A" uniqKey="Marchler Bauer A">A. Marchler-Bauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tatusov, R L" uniqKey="Tatusov R">R. L. Tatusov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leplae, R" uniqKey="Leplae R">R. Leplae</name>
</author>
<author>
<name sortKey="Hebrant, A" uniqKey="Hebrant A">A. Hebrant</name>
</author>
<author>
<name sortKey="Wodak, S J" uniqKey="Wodak S">S. J. Wodak</name>
</author>
<author>
<name sortKey="Toussaint, A" uniqKey="Toussaint A">A. Toussaint</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kurokawa, K" uniqKey="Kurokawa K">K. Kurokawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, J" uniqKey="Xu J">J. Xu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Murphy, K C" uniqKey="Murphy K">K. C. Murphy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kruger, D H" uniqKey="Kruger D">D. H. Kruger</name>
</author>
<author>
<name sortKey="Bickle, T A" uniqKey="Bickle T">T. A. Bickle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Groth, A C" uniqKey="Groth A">A. C. Groth</name>
</author>
<author>
<name sortKey="Calos, M P" uniqKey="Calos M">M. P. Calos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B. Liu</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M. Pop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lund, F" uniqKey="Lund F">F. Lund</name>
</author>
<author>
<name sortKey="Tybring, L" uniqKey="Tybring L">L. Tybring</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wootton, M" uniqKey="Wootton M">M. Wootton</name>
</author>
<author>
<name sortKey="Walsh, T R" uniqKey="Walsh T">T. R. Walsh</name>
</author>
<author>
<name sortKey="Macfarlane, L" uniqKey="Macfarlane L">L. Macfarlane</name>
</author>
<author>
<name sortKey="Howe, R A" uniqKey="Howe R">R. A. Howe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M. Arumugam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dick, G J" uniqKey="Dick G">G. J. Dick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Duhaime, M B" uniqKey="Duhaime M">M. B. Duhaime</name>
</author>
<author>
<name sortKey="Wichels, A" uniqKey="Wichels A">A. Wichels</name>
</author>
<author>
<name sortKey="Waldmann, J" uniqKey="Waldmann J">J. Waldmann</name>
</author>
<author>
<name sortKey="Teeling, H" uniqKey="Teeling H">H. Teeling</name>
</author>
<author>
<name sortKey="Glockner, F O" uniqKey="Glockner F">F. O. Glöckner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saeed, I" uniqKey="Saeed I">I. Saeed</name>
</author>
<author>
<name sortKey="Tang, S L" uniqKey="Tang S">S.-L. Tang</name>
</author>
<author>
<name sortKey="Halgamuge, S K" uniqKey="Halgamuge S">S. K. Halgamuge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teeling, H" uniqKey="Teeling H">H. Teeling</name>
</author>
<author>
<name sortKey="Meyerdierks, A" uniqKey="Meyerdierks A">A. Meyerdierks</name>
</author>
<author>
<name sortKey="Bauer, M" uniqKey="Bauer M">M. Bauer</name>
</author>
<author>
<name sortKey="Amann, R" uniqKey="Amann R">R. Amann</name>
</author>
<author>
<name sortKey="Glockner, F O" uniqKey="Glockner F">F. O. Glöckner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghai, R" uniqKey="Ghai R">R. Ghai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pignatelli, M" uniqKey="Pignatelli M">M. Pignatelli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, S" uniqKey="Kim S">S. Kim</name>
</author>
<author>
<name sortKey="Rahman, M" uniqKey="Rahman M">M. Rahman</name>
</author>
<author>
<name sortKey="Seol, S Y" uniqKey="Seol S">S. Y. Seol</name>
</author>
<author>
<name sortKey="Yoon, S S" uniqKey="Yoon S">S. S. Yoon</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J. Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ebdon, J" uniqKey="Ebdon J">J. Ebdon</name>
</author>
<author>
<name sortKey="Muniesa, M" uniqKey="Muniesa M">M. Muniesa</name>
</author>
<author>
<name sortKey="Taylor, H" uniqKey="Taylor H">H. Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gill, S R" uniqKey="Gill S">S. R. Gill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teeling, H" uniqKey="Teeling H">H. Teeling</name>
</author>
<author>
<name sortKey="Waldmann, J" uniqKey="Waldmann J">J. Waldmann</name>
</author>
<author>
<name sortKey="Lombardot, T" uniqKey="Lombardot T">T. Lombardot</name>
</author>
<author>
<name sortKey="Bauer, M" uniqKey="Bauer M">M. Bauer</name>
</author>
<author>
<name sortKey="Glockner, F O" uniqKey="Glockner F">F. O. Glöckner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aziz, R K" uniqKey="Aziz R">R. K. Aziz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, B V" uniqKey="Jones B">B. V. Jones</name>
</author>
<author>
<name sortKey="Begley, M" uniqKey="Begley M">M. Begley</name>
</author>
<author>
<name sortKey="Hill, C" uniqKey="Hill C">C. Hill</name>
</author>
<author>
<name sortKey="Gahan, C G M" uniqKey="Gahan C">C. G. M. Gahan</name>
</author>
<author>
<name sortKey="Marchesi, J R" uniqKey="Marchesi J">J. R. Marchesi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, B V" uniqKey="Jones B">B. V. Jones</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F. Sun</name>
</author>
<author>
<name sortKey="Marchesi, J R" uniqKey="Marchesi J">J. R. Marchesi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J. Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, D H" uniqKey="Huson D">D. H. Huson</name>
</author>
<author>
<name sortKey="Scornavacca, C" uniqKey="Scornavacca C">C. Scornavacca</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S. Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schirle, M" uniqKey="Schirle M">M. Schirle</name>
</author>
<author>
<name sortKey="Heurtier, M" uniqKey="Heurtier M">M. Heurtier</name>
</author>
<author>
<name sortKey="Kuster, B" uniqKey="Kuster B">B. Kuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schevchenko, A" uniqKey="Schevchenko A">A. Schevchenko</name>
</author>
<author>
<name sortKey="Tomas, H" uniqKey="Tomas H">H. Tomas</name>
</author>
<author>
<name sortKey="Havli, J" uniqKey="Havli J">J. Havli</name>
</author>
<author>
<name sortKey="Olsen, J V" uniqKey="Olsen J">J. V. Olsen</name>
</author>
<author>
<name sortKey="Mann, M" uniqKey="Mann M">M. Mann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thompson, J D" uniqKey="Thompson J">J. D. Thompson</name>
</author>
<author>
<name sortKey="Higgins, D G" uniqKey="Higgins D">D. G. Higgins</name>
</author>
<author>
<name sortKey="Gibson, T J" uniqKey="Gibson T">T. J. Gibson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clarke, K R" uniqKey="Clarke K">K. R. Clarke</name>
</author>
<author>
<name sortKey="Gorley, R N" uniqKey="Gorley R">R. N. Gorley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ultsch, A" uniqKey="Ultsch A">A. Ultsch</name>
</author>
<author>
<name sortKey="Moerchen, F" uniqKey="Moerchen F">F. Moerchen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, B V" uniqKey="Jones B">B.V. Jones</name>
</author>
<author>
<name sortKey="Marchesi, J R" uniqKey="Marchesi J">J. R. Marchesi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hawkins, S A" uniqKey="Hawkins S">S. A. Hawkins</name>
</author>
<author>
<name sortKey="Layton, A C" uniqKey="Layton A">A. C. Layton</name>
</author>
<author>
<name sortKey="Ripp, S" uniqKey="Ripp S">S. Ripp</name>
</author>
<author>
<name sortKey="Williams, D" uniqKey="Williams D">D. Williams</name>
</author>
<author>
<name sortKey="Sayler, G S" uniqKey="Sayler G">G. S. Sayler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Puig, M" uniqKey="Puig M">M. Puig</name>
</author>
<author>
<name sortKey="Jofre, J" uniqKey="Jofre J">J. Jofre</name>
</author>
<author>
<name sortKey="Girones, R" uniqKey="Girones R">R. Girones</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nat Commun</journal-id>
<journal-id journal-id-type="iso-abbrev">Nat Commun</journal-id>
<journal-title-group>
<journal-title>Nature Communications</journal-title>
</journal-title-group>
<issn pub-type="epub">2041-1723</issn>
<publisher>
<publisher-name>Nature Pub. Group</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24036533</article-id>
<article-id pub-id-type="pmc">3778543</article-id>
<article-id pub-id-type="pii">ncomms3420</article-id>
<article-id pub-id-type="doi">10.1038/ncomms3420</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ogilvie</surname>
<given-names>Lesley A.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bowler</surname>
<given-names>Lucas D.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Caplin</surname>
<given-names>Jonathan</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Dedi</surname>
<given-names>Cinzia</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Diston</surname>
<given-names>David</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cheek</surname>
<given-names>Elizabeth</given-names>
</name>
<xref ref-type="aff" rid="a3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Taylor</surname>
<given-names>Huw</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ebdon</surname>
<given-names>James E.</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jones</surname>
<given-names>Brian V.</given-names>
</name>
<xref ref-type="corresp" rid="c1">a</xref>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<aff id="a1">
<label>1</label>
<institution>Centre for Biomedical and Health Science Research, School of Pharmacy and Biomolecular Sciences, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</aff>
<aff id="a2">
<label>2</label>
<institution>School of Environment and Technology, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</aff>
<aff id="a3">
<label>3</label>
<institution>School of Computing, Engineering and Mathematics, University of Brighton</institution>
, Brighton BN2 4GJ,
<country>UK</country>
</aff>
<aff id="a4">
<label>4</label>
Present address: Mikrobiologische and Biotechnologische Risiken Bundesamt für Gesundheit BAG, 3003 Bern, Switzerland</aff>
</contrib-group>
<author-notes>
<corresp id="c1">
<label>a</label>
<email>B.V.Jones@Brighton.ac.uk</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>16</day>
<month>09</month>
<year>2013</year>
</pub-date>
<volume>4</volume>
<elocation-id>2420</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>04</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>08</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2013, Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
<pmc-comment>author-paid</pmc-comment>
<license-p>This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. To view a copy of this licence visit http://creativecommons.org/licenses/by/3.0/.</license-p>
</license>
</permissions>
<abstract>
<p>Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes and access subliminal, phylogenetically targeted, phage sequences present within. We describe a portion of the biological dark matter extant in the human gut virome, and bring to light a population of potentially gut-specific
<italic>Bacteroidales</italic>
-like phage, poorly represented in existing virus like particle-derived viral metagenomes. These predominantly temperate phage were shown to encode functions of direct relevance to human health in the form of antibiotic resistance genes, and provided evidence for the existence of putative ‘viral-enterotypes’ among this fraction of the human gut virome.</p>
</abstract>
<abstract abstract-type="web-summary">
<p>
<inline-graphic id="i1" xlink:href="ncomms3420-i1.jpg"></inline-graphic>
Bacteriophages have a significant impact on microbial ecosystems, but additional tools are needed to assess viral communities. Ogilvie
<italic>et al.</italic>
present a new strategy to extract viral sequences from metagenomic data sets, and present new insights on their function in the gut ecosystem.</p>
</abstract>
</article-meta>
</front>
<body>
<p>Viruses are the most abundant infectious agents on the planet, and collectively constitute a highly diverse and largely unexplored gene-space, which accounts for much of the ‘biological dark matter’ in Earth’s biosphere
<xref ref-type="bibr" rid="b1">1</xref>
<xref ref-type="bibr" rid="b2">2</xref>
<xref ref-type="bibr" rid="b3">3</xref>
. Bacterial viruses (bacteriophage or phage) are considered the most numerous viral entities, and through their effects on host bacteria, phage can influence processes ranging from global geochemical cycles to bacterial virulence and pathogenesis
<xref ref-type="bibr" rid="b1">1</xref>
<xref ref-type="bibr" rid="b2">2</xref>
<xref ref-type="bibr" rid="b3">3</xref>
<xref ref-type="bibr" rid="b4">4</xref>
<xref ref-type="bibr" rid="b5">5</xref>
. The study of this expansive family of viruses continues to underpin many fundamental insights into microbial physiology and evolution, with the interplay of bacteria and phage now studied at scales ranging from the individual components of single-phage species, to community-level surveys of viral assemblages and their impacts on host microbial ecosystems.</p>
<p>The development of metagenomic tools for analysis of phage populations constitutes a major advance in this regard, which is poised to deliver unprecedented insight into the prokaryotic virosphere. This powerful culture-independent approach overcomes many limitations of traditional methods for phage isolation and characterization, ultimately promising almost unrestricted access to the genetic content of host microbiomes and their attendant viral collectives
<xref ref-type="bibr" rid="b3">3</xref>
<xref ref-type="bibr" rid="b6">6</xref>
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b8">8</xref>
<xref ref-type="bibr" rid="b9">9</xref>
<xref ref-type="bibr" rid="b10">10</xref>
<xref ref-type="bibr" rid="b11">11</xref>
. Application of these techniques to the study of microbial viromes has already provided major insights into a number of phage communities, including those associated with microbial ecosystems that develop in or on the human body
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b12">12</xref>
.</p>
<p>In particular, the retinue of phage associated with the human gut microbiome is now increasingly recognized as an important facet of this ecosystem, which may significantly influence its impact on human health
<xref ref-type="bibr" rid="b3">3</xref>
<xref ref-type="bibr" rid="b5">5</xref>
<xref ref-type="bibr" rid="b13">13</xref>
<xref ref-type="bibr" rid="b14">14</xref>
<xref ref-type="bibr" rid="b15">15</xref>
<xref ref-type="bibr" rid="b16">16</xref>
. Gut-associated phage have already been shown to encode genes that confer production of toxins, virulence factors or antibiotic resistance upon host bacteria
<xref ref-type="bibr" rid="b5">5</xref>
<xref ref-type="bibr" rid="b17">17</xref>
<xref ref-type="bibr" rid="b18">18</xref>
, and have the potential to modulate community structure and metabolic output through elimination of host species or introduction of new traits
<xref ref-type="bibr" rid="b1">1</xref>
<xref ref-type="bibr" rid="b16">16</xref>
<xref ref-type="bibr" rid="b19">19</xref>
. Furthermore, virome composition also appears to be altered in disease states, which has given rise to the hypothesis that the human gut virome may have a role in the pathogenesis of disorders associated with perturbation of the gut ecosystem
<xref ref-type="bibr" rid="b14">14</xref>
. Phage also hold considerable biotechnological and pharmaceutical potential, with the gut virome now a viable target for bio-prospecting and the development of novel therapeutic or diagnostic tools
<xref ref-type="bibr" rid="b3">3</xref>
<xref ref-type="bibr" rid="b13">13</xref>
.</p>
<p>However, current strategies for generating viral metagenomes are not without limitations, and are typically based on analysis of nucleic acids derived from purified virus like particles (VLPs)
<xref ref-type="bibr" rid="b3">3</xref>
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b20">20</xref>
. As such, these approaches are targeted towards analysis of free-phage particles present at the time of sampling, which restricts access to the quiescent virome fraction and obscures host-range information
<xref ref-type="bibr" rid="b8">8</xref>
. VLP-based approaches will also poorly represent phage not efficiently recovered during virion purification stages, and typically rely on subsequent amplification of extracted viral DNA before sequencing, which can also exclude some phage types
<xref ref-type="bibr" rid="b3">3</xref>
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b20">20</xref>
. Although these caveats do not undermine the overall utility of the VLP approach (which retains a clear advantage in accessing actively replicating phage), much scope remains to develop complementary strategies to access and analyse microbial viromes.</p>
<p>In this context, it is notable that conventional metagenomic data sets, derived from total community DNA, have been found to contain significant fractions of phage sequence data, and in the case of the gut microbiome, this has been estimated to be up to 17% of microbial DNA recovered from stool samples
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b21">21</xref>
. Owing to the focus on acquisition of chromosomal sequences and an independence from VLP extracts, these data sets are likely to capture prophage not readily accessed by VLP-based surveys
<xref ref-type="bibr" rid="b8">8</xref>
, and will by default also contain much genetic material from phage–host species or closely related organisms. The latter should facilitate inference of host-range and permit a more in-depth analysis of the local ecological landscape populated by recovered phage, and together with the former stands to provide an alternative and novel perspective on the gut virome. Therefore, whole-community metagenomes may constitute valuable resources for the analysis of phage communities, and in conjunction with VLP-derived data sets, provide a more complete understanding of phage concurrent with the human gut and other ecosystems
<xref ref-type="bibr" rid="b8">8</xref>
.</p>
<p>Nevertheless, the resolution and host-range affiliation of phage fragments present in conventional metagenomes remains challenging, with particular problems arising from the paucity of well-characterized phage reference genomes with established host ranges, a lack of universally conserved and robust phylogenetic anchors in phage genomes (akin to bacterial 16S rRNA genes), as well as the mosaic nature of phage genomes, and the fragmentary nature of metagenomic data sets
<xref ref-type="bibr" rid="b8">8</xref>
<xref ref-type="bibr" rid="b13">13</xref>
. These factors, in conjunction with the potential value of standard metagenomes for virome analysis, highlight the need to develop robust approaches for phage-oriented dissection of these repositories, and host-range affiliation of recovered phage sequences.</p>
<p>Here we demonstrate the application of a genome signature-based approach for retrieval of subliminal, phylogenetically targeted phage sequences present within conventional gut microbial metagenomes. Application of this strategy permitted the identification of a subset of gut-specific
<italic>Bacteroidales</italic>
-like phage sequences poorly represented in existing VLP-derived viral metagenomes. These phage sequences were shown to encode functions of direct relevance to human health, and provided new insights into the structure and composition of the human gut virome.</p>
<sec disp-level="1" sec-type="results">
<title>Results</title>
<sec disp-level="2">
<title>Genome signature-based recovery of ‘
<italic>Bacteroidales</italic>
-like’ phage</title>
<p>Members of the
<italic>Bacteroidales</italic>
, and in particular the genus
<italic>Bacteroides,</italic>
are abundant and important constituents of the human gut microbiome for which few complete phage genomes are available, with this region of the gut virome believed to remain largely uncharted
<xref ref-type="bibr" rid="b13">13</xref>
. To more fully explore this novel phage gene-space, we utilized
<italic>Bacteroidales</italic>
phage sequences as ‘drivers’ to interrogate 139 human gut metagenomes based on tetranucleotide usage profiles (TUPs) and functional profiles of contigs (
<xref ref-type="table" rid="t1">Table 1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Figs S1–S3</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
).</p>
<p>This strategy takes advantage of similarities in global nucleotide usage patterns, or the genome signature, arising between phage infecting the same or related host bacterial species
<xref ref-type="bibr" rid="b22">22</xref>
<xref ref-type="bibr" rid="b23">23</xref>
<xref ref-type="bibr" rid="b24">24</xref>
. We exploit this phenomenon to identify contigs related to
<italic>Bacteroidales</italic>
phage driver sequences in assembled gut metagenomes, and subsequent function-based binning to resolve phage fragments recovered in this process (
<xref ref-type="fig" rid="f1">Fig. 1</xref>
). We refer to this strategy as phage genome signature-based recovery (PGSR), and denote sequences obtained in this way with the PGSR prefix.</p>
<p>Interrogation of all large contigs (10 kb and over) from human gut metagenomes (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
) recovered 408 metagenomic fragments with TUPs similar to
<italic>Bacteroidales</italic>
phage drivers. Eighty five fragments were categorized as phage based on functional profiling, and the remainder classified as non-phage (presumed chromosomal,
<italic>n</italic>
=320), or could not be categorized (
<italic>n</italic>
=3) (
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 1</xref>
). The proportion of sequences categorized as phage within the total pool of 408 sequences recovered by PGSR (20.83%; 85/408) is congruent with recent studies estimating that up to 17% of total metagenomic DNA derived from stool samples may be viral in origin
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b21">21</xref>
. Of the PGSR sequences classified as phage, sizes ranged from 10–63.7 kb, with 16 sequences over 30 kb in length (
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 1</xref>
). This size range is consistent with that of available
<italic>Bacteroides</italic>
phage genomes used as drivers, and phage types known to be prominent within the human gut virome (particularly members of the
<italic>Siphoviridae</italic>
family)
<xref ref-type="bibr" rid="b11">11</xref>
, pointing to the recovery of near full-length or complete phage genomes.</p>
</sec>
<sec disp-level="2">
<title>Recovery of contiguous phage genome fragments</title>
<p>Owing to the dominance of chromosomal sequences in the metagenomic data sets examined, and the corollary that many PGSR phage fragments could therefore be chimeras corresponding to chromosome–prophage junctions, we also assessed the fidelity of the PGSR approach in this regard. Initially, 20 PGSR phage sequences were randomly selected, annotated and each open reading frame (ORF) evaluated in terms of their association with phage genomes (
<xref ref-type="fig" rid="f2">Fig. 2a</xref>
). The majority of sequences examined were shown to encode a clear and consistent phage-related signal across their entire length, with gene architectures and organization commensurate with driver phage genomes (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S3</xref>
). A potential exception of note being sequence no. 9, which exhibited a terminal region devoid of phage-related ORFs, indicating the possible presence of terminal chromosomal sequences (
<xref ref-type="fig" rid="f2">Fig. 2a</xref>
).</p>
<p>In an extension of this analysis, all protein encoding genes from all PGSR phage and PGSR non-phage contigs were used to search an extensive collection of phage and chromosomal sequences (
<xref ref-type="fig" rid="f2">Fig. 2b</xref>
). Results of these searches were used to calculate the relative abundance of homologous ORFs from PGSR sequences in phage genomes and chromosomes (
<xref ref-type="fig" rid="f2">Fig. 2b</xref>
). This demonstrated that the vast majority of genes from PGSR phage sequences were well represented in other phage genomes and phage data sets, but exhibited significantly lower relative abundance in chromosomal sequences analysed (
<xref ref-type="fig" rid="f2">Fig. 2b</xref>
). For PGSR non-phage sequences, which are presumed to be chromosomal in origin, the converse was true with high levels of representation in chromosomal sequences but a low relative abundance in phage sequences (
<xref ref-type="fig" rid="f2">Fig. 2b</xref>
). Taken together, these analyses demonstrate that contiguous phage sequences had been captured with high fidelity, and little or no chromosomal contamination was evident in the PGSR phage collection.</p>
</sec>
<sec disp-level="2">
<title>Comparative analysis of phage sequence recovery strategies</title>
<p>In order to ascertain if the PGSR approach offers advantages over existing strategies for prophage-oriented analysis of metagenomic data sets, we assessed the ability of conventional alignment-driven approaches to also recover the PGSR phage sequences identified here. Although surveys of the same data sets using the same driver sequences with alignment-driven methods (Blastn and tBlastn) recovered a range of sequences not identified by the PGSR approach, alignment-based searches failed to detect the majority of phage sequences identified by the PGSR approach (
<xref ref-type="fig" rid="f3">Fig. 3</xref>
).</p>
<p>In combination, all nucleotide-level searches with phage driver sequences identified 32.94% of PGSR phage sequences, with the majority of hits showing only low coverage of drivers, making a close relationship and a common host-range (that is, predicted bacterial host species) less likely to be a consistent feature of sequences recovered this way (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S2</xref>
). Gene-centric surveys utilizing translated capsid and terminase ORFs from drivers identified only 22.35% of PGSR phage sequences (
<xref ref-type="fig" rid="f3">Fig. 3</xref>
), but most hits exhibited relatively low levels of identity to driver sequence ORFs, again indicating the recovery of a more loosely related collection of contigs, with associated problems for host-range prediction (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S2</xref>
).</p>
<p>Alternatively, Stern
<italic>et al.</italic>
<xref ref-type="bibr" rid="b8">8</xref>
have recently described an elegant strategy utilizing CRISPR spacer regions to identify phage sequences in metagenomic data sets, and also facilitate host-range prediction. This strategy has been applied to the same gut metagenomic data sets used here, but only 16.47% of the 85 PGSR phage were represented among the 991 phage sequences recovered using CRISPR spacers (
<xref ref-type="fig" rid="f3">Fig. 3</xref>
). Collectively, these comparisons show the PGSR approach can identify phage or prophage sequences within metagenomes not readily detected by other approaches, and complement existing strategies to access viral metagenomes.</p>
</sec>
<sec disp-level="2">
<title>Inference of host phylogeny</title>
<p>A major benefit of the PGSR approach should be an inherent inference of host-range for retrieved phage contigs, based on that of driver sequences. In order to confirm the integrity of this host-range affiliation, we explored the relationship of PGSR sequences with a broad cross section of chromosomal sequences and phage genomes. Initially, PGSR sequences were compared with a collection of 324 chromosomes from gut-associated bacteria, 647 complete phage genomes and 188 large contigs from gut virome assemblies, based on TUPs. Relationships were visualized by construction of phylograms, which showed a clear association of chromosomal sequences congruent with membership of major bacterial divisions in the gut microbiome (Bacteroidetes, Firmicutes, Actinobacteria and Proteobacteria) (
<xref ref-type="fig" rid="f4">Fig. 4a</xref>
).</p>
<p>The majority of both PGSR phage and non-phage sequences were localized to four distinct regions of phylograms, designated Clusters I–IV (
<xref ref-type="fig" rid="f4">Fig. 4a</xref>
). Most of these clusters were dominated by chromosomal sequences from gut-associated
<italic>Bacteroides spp</italic>
., and other closely related members of the
<italic>Bacteroidales,</italic>
with clusters I, II and III collectively accounting for 90.69% of all PGSR sequences, and 95% of all
<italic>Bacteroidales</italic>
chromosomes used (
<xref ref-type="fig" rid="f4">Fig. 4a</xref>
). A distinct clustering of PGSR phage was also observed in phylograms constructed from TUPs of complete phage genomes and gut virome contigs (
<xref ref-type="fig" rid="f4">Fig. 4b</xref>
), and with the exception of a single sequence, PGSR phage were most closely related to each other and confined to a distinct clade (
<xref ref-type="fig" rid="f4">Fig. 4b</xref>
). The affiliation of PGSR sequences with the
<italic>Bacteroidales</italic>
was also retained when comparisons, were expanded to encompass a broader collection of bacterial chromosomes (
<italic>n</italic>
=1,700) from a wider range of habitats, and TUP-based affiliations examined using Emergent Self Organizing Maps (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S4</xref>
).</p>
<p>To confirm the TUP-based phylogenetic inference for PGSR sequences, and the implied host-range for PGSR phage, alignment-based searches of 1,821 bacterial and archaeal chromosomes at both the nucleotide (Blastn) and ORF (tBlastn) level were also conducted. In both searches, PGSR phage sequences that could be classified based on homology to chromosome sequences (minimum 75% identity, 1e
<sup>−5</sup>
or lower and over a minimum of 1 kb of query sequence for nucleotide alignments) were almost exclusively associated with members of the genus
<italic>Bacteroides</italic>
and mapped to all regions of phylograms populated by PGSR phage (
<xref ref-type="fig" rid="f4">Fig. 4a</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
). Furthermore, TUP-based host-range predictions were also supported by phylogenetic affiliations of contigs undertaken by Stern
<italic>et al.</italic>
<xref ref-type="bibr" rid="b8">8</xref>
, in CRISPR-based surveys of the MetaHIT data set
<xref ref-type="bibr" rid="b21">21</xref>
. In cases where PGSR phage contigs were identified and affiliated independently by Stern
<italic>et al.</italic>
<xref ref-type="bibr" rid="b8">8</xref>
, host-range associations were comparable, and in most cases identical to, those assigned in the present study (
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
).</p>
<p>Of the classifiable PGSR phage sequences not affiliated with
<italic>Bacteroides</italic>
<italic>spp.</italic>
by alignments (nt alignment;
<italic>n</italic>
=5, 10%), the majority were associated with the genus
<italic>Alistipes</italic>
(
<italic>n</italic>
=4), also a member of the gut-associated
<italic>Bacteroidales</italic>
, and terminase genes from
<italic>Bacteroidales</italic>
phage drivers have also previously been shown to be closely related to those associated with
<italic>Alistipes sp</italic>
.
<xref ref-type="bibr" rid="b13">13</xref>
(
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S1</xref>
). Conversely, only a small number of PGSR phage sequences (
<italic>n</italic>
=3; 3.5%), and several PGSR non-phage sequences (
<italic>n</italic>
=11; 3.43%) were affiliated with non-
<italic>Bacteroidales</italic>
species in alignments (
<xref ref-type="fig" rid="f4">Fig. 4c</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
). Overall, these analyses indicate that the PGSR approach is able to acquire phylogenetically targeted and closely related phage sequences from metagenomic data sets, and provide a strong indication of host-range taxonomy.</p>
</sec>
<sec disp-level="2">
<title>Habitat affiliation of
<italic>Bacteroidales</italic>
-like PGSR phage</title>
<p>In order to determine whether the
<italic>Bacteroidales</italic>
-like PGSR phage captured here are already well represented in existing gut viral metagenomes
<xref ref-type="bibr" rid="b11">11</xref>
, pyrosequencing reads from gut viromes were mapped to the PGSR phage sequence set with high stringency (minimum 90% identity over 90% of sequence read). The proportion of reads recruited was then used to estimate levels of PGSR phage representation in viral data sets. Sequences mapping to PGSR phage contigs were found to be poorly represented in these data sets, when compared with
<italic>Bacteroidales</italic>
-like phage contigs assembled from the same gut virome reads (also identified by applying the PGSR approach to virome assemblies) (
<xref ref-type="fig" rid="f5">Fig. 5a</xref>
). Given that the original analysis of these viromes also indicated phage associated with the
<italic>Bacteroidales</italic>
to be well represented
<xref ref-type="bibr" rid="b11">11</xref>
, this supports a specific under-representation of PGSR phage homologues in these data sets, rather than a paucity of
<italic>Bacteroidales</italic>
-like phage in general.</p>
<p>To explore the distribution of PGSR phage in other habitats, we next investigated their representation in a range of additional viromes and metagenomes (
<xref ref-type="fig" rid="f5">Fig. 5b,c</xref>
). Using 13 viral metagenomes derived from gut and non-gut environments (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
), we again mapped pyrosequencing reads to PGSR sequences, this time using a low stringency set of criteria (minimum 75% identity over 25% of sequence read) to provide the most conservative estimates of phage distribution. To further expand the range of habitats and ecosystems evaluated, the presence of sequences homologous to PGSR phage was also assessed in 12 conventional metagenomes and 2 virome assemblies (
<xref ref-type="fig" rid="f5">Fig. 5b,c</xref>
;
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
). For these assembled data sets, the results of Blast searches were used to classify each phage sequence based on the hit rate in gut and non-gut metagenomes (also using relaxed search criteria to afford conservative estimates of phage habitat affiliation). These surveys indicated a clear association of PGSR phage and virome contigs with the human gut microbiome, and a comparative rarity of homologous sequences in non-gut data sets (
<xref ref-type="fig" rid="f5">Fig. 5b,c</xref>
).</p>
</sec>
<sec disp-level="2">
<title>Functions and lifestyle of
<italic>Bacteroidales</italic>
-like PGSR phage</title>
<p>To examine the activities encoded by these novel
<italic>Bacteroidales</italic>
-like PGSR phage sequences, and compare their functional profiles with other phage and chromosomal sequence collections, we next used predicted ORFs from all PGSR contigs to search the Conserved Domain Database (CDD)
<xref ref-type="bibr" rid="b25">25</xref>
, the Clusters of Orthologous Groups database (COG)
<xref ref-type="bibr" rid="b26">26</xref>
, and the A CLAssification of Mobile Genetic Elements database (ACLAME) of MGE-encoded genes
<xref ref-type="bibr" rid="b27">27</xref>
(
<xref ref-type="fig" rid="f6">Fig. 6</xref>
). Collectively, these search results further supported the provenance and classification of PGSR sequences as phage or non-phage, and the fidelity of the PGSR approach for recovery of phage genome fragments from conventional metagenomes (
<xref ref-type="fig" rid="f6">Fig. 6</xref>
).</p>
<p>COG and CDD functional profiles showed striking differences between PGSR phage and non-phage, with PGSR phage profiles congruent with a viral lifestyle and enriched in genes involved in capsid structure, host lysis, genome packaging, transcription, as well as replication and recombination (
<italic>P</italic>
≤0.004,
<italic>χ</italic>
<sup>2</sup>
-test;
<xref ref-type="fig" rid="f6">Fig. 6a,b</xref>
). As expected for viral genomes, COG profiles from PGSR phage sequences also showed a general lack of functions associated with energy production, nutrient metabolism and transport (amino acids, lipids and carbohydrates), cell wall and membrane biogenesis, and ribosome production and translation (
<italic>P</italic>
≤0.01,
<italic>χ</italic>
<sup>2</sup>
-test;
<xref ref-type="fig" rid="f6">Fig. 6a</xref>
).</p>
<p>Although some differences were observed between individual phage sequence sets (Marine phage, NCBI phage and gut virome contigs), overall, the functional profile of PGSR phage was comparable to the other phage sequence collections analysed, while the PGSR non-phage functional profile was similar to that obtained from
<italic>Bacteroidales</italic>
chromosomes (
<xref ref-type="fig" rid="f6">Fig. 6a,b</xref>
). However, despite the similarities in functional profiles between phage sequence sets, surveys of the ACLAME database of MGE-encoded genes indicated marked differences in the prevailing lifestyle of human gut-associated phage, as compared with other phage sequence collections (
<xref ref-type="fig" rid="f6">Fig. 6c</xref>
). Assignable sequences in the ACLAME database from PGSR-phage and gut virome contigs were predominantly associated with prophage, in stark contrast to other phage sequence collections (
<italic>P</italic>
≤0.001,
<italic>χ</italic>
<sup>2</sup>
-test;
<xref ref-type="fig" rid="f6">Fig. 6c</xref>
). In keeping with these observations, 23.5% of PGSR phage contigs were identified as encoding integrases or site-specific recombinases based on CDD searches. The dominant conserved domain model among these proteins was the DNA_BRE_C superfamily (cd00379), which includes phage Lambda integrase and phage P1 Cre recombinase.</p>
<p>To further explore the functional profile of PGSR
<italic>Bacteroidales</italic>
-like phage, we used mass spectrometry to generate a shotgun metaproteome from a human faecal microbiome, and used the derived 177,729 mass spectra to search custom databases of all putative proteins encoded by PGSR
<italic>Bacteroidales-</italic>
like sequences (phage and non-phage), and all contigs from VLP-derived human gut viral metagenome assemblies
<xref ref-type="bibr" rid="b11">11</xref>
. Proteins from all data sets were identified in the metaproteome, but as expected, proteins derived from PGSR non-phage sequences (presumed to be chromosomal in origin) constituted the majority of matches (
<xref ref-type="fig" rid="f7">Fig. 7a</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S3</xref>
).</p>
<p>Phage-associated proteins detected represented just three COG classes (cell cycle control; replication, recombination and repair; general function prediction) (
<xref ref-type="fig" rid="f7">Fig. 7a</xref>
). This is in contrast to 13 COG classes represented by metaproteome hits from non-phage PGSR fragments, which included many proteins with activities linked to carbohydrate metabolism, a major activity of gut microbes and in particular
<italic>Bacteroides</italic>
<italic>spp.</italic>
<xref ref-type="bibr" rid="b21">21</xref>
<xref ref-type="bibr" rid="b28">28</xref>
<xref ref-type="bibr" rid="b29">29</xref>
(
<xref ref-type="fig" rid="f7">Fig. 7a</xref>
). When relative abundance of homologous ORFs was assessed in a broader range of phage genomes and chromosomes, a distinct functional separation was also apparent between phage and non-phage sequences (
<xref ref-type="fig" rid="f7">Fig. 7b</xref>
). Phage-associated metaproteome hits showed a high relative abundance in phage genomes and other phage sequences, but were poorly represented in chromosomal sequences, with the converse true for PGSR non-phage proteins (
<xref ref-type="fig" rid="f7">Fig. 7b</xref>
).</p>
<p>The predicted activities of viral-encoded proteins detected in the metaproteome were also congruent with a lysogenic viral lifestyle, and associated with stability and maintenance of phage genomes in host bacteria (DNA methylases, partitioning proteins, site-specific recombinases/integrases;
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S3</xref>
). DNA methylases are frequently deployed by phage for protection from host defence systems by preventing degradation from host endonucleases through DNA methylation, and may also be involved in stable lysogeny
<xref ref-type="bibr" rid="b30">30</xref>
<xref ref-type="bibr" rid="b31">31</xref>
. Site-specific recombinases/integrases and partitioning systems are also features of temperate phage and associated with the lysogenic cycle
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b32">32</xref>
. Overall, the results of these surveys fit well with recent studies of the gut virome indicating a dominance of temperate phage
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
, and show that predominantly lysogenic phage (most likely in the form of prophage) have been accessed by the PGSR approach.</p>
</sec>
<sec disp-level="2">
<title>
<italic>Bacteroidales</italic>
-like PGSR phage encode functional β-lactamases</title>
<p>Functional profiling of PGSR phage sequences also indicated that these encode activities of direct relevance to human health, in the form of antibiotic resistance genes. In total, 12 PGSR phage sequences were found collectively to encode five putative β-lactamase variants exhibiting high levels of identity to each other (designated type 1–5;
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S4</xref>
). These sequences were most closely related to predicted metallo-β-lactamases from
<italic>Bacteroides sp.</italic>
D22,
<italic>Bacteroides sp.</italic>
1_1_30 and
<italic>Bacteroides stercoris,</italic>
but showed no significant homology to entries in the Antibiotic Resistance Genes Database
<xref ref-type="bibr" rid="b33">33</xref>
(minimum 20% identity, 1e
<sup>−2</sup>
or lower).</p>
<p>To confirm the functionality of these putative resistance determinants, corresponding regions of PGSR phage were amplified from total gut metagenomic DNA, cloned and expressed in
<italic>E. coli.</italic>
Transformants were then tested for their susceptibility to a range of β-lactam antibiotics. Only Type-2 PGSR phage-encoded β-lactamases were successfully amplified and cloned, but were capable of conferring resistance against mecillinam (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S5</xref>
), a member of the amidinopenicillin family with high affinity for Gram-negative penicillin-binding protein 2, but little activity against Gram-positive bacteria
<xref ref-type="bibr" rid="b34">34</xref>
. This antibiotic is not widely used in many European countries or the USA, but has been identified as potentially useful in the treatment of multi-drug resistant infections caused by Gram-negative species
<xref ref-type="bibr" rid="b35">35</xref>
. As such, identification of viable mecillinam resistance genes circulating among lysogenic
<italic>Bacteroides</italic>
phage in the gut mobile metagenome is of particular significance, and highlights the potential for dissemination and spread of these resistance determinants via horizontal gene transfer.</p>
</sec>
<sec disp-level="2">
<title>Inter-individual variation in
<italic>Bacteroidales</italic>
-like phage carriage</title>
<p>To assess inter-individual variation in carriage of PGSR phage and related sequences, we calculated the relative abundance of sequences homologous to PGSR phage in individual gut metagenomes (minimum 80% identity over 50% of the subject sequence, 1e
<sup>−5</sup>
or lower). This indicated that such sequences are broadly distributed among the gut microbiomes examined (
<xref ref-type="fig" rid="f8">Fig. 8a</xref>
), with the incidence of PGSR homologues ranging from 51.8–82.73% of metagenomes for the five most broadly represented PGSR phage (encompassing both Japanese and European individuals) (
<xref ref-type="fig" rid="f8">Fig. 8a</xref>
). Notably, these apparently broadly distributed virotypes included sequences with homology to PGSR phage harbouring type-2 β-lactamases with proven function.</p>
<p>Heat maps of relative abundance data also suggested the existence of several distinct patterns of
<italic>Bacteroidales</italic>
-like phage carriage shared by multiple individuals (
<xref ref-type="fig" rid="f8">Fig. 8a</xref>
). To investigate this further, we employed a heuristic hierarchical ranking approach, to progressively group individual microbiomes based on phage relative abundance profiles. This simple strategy revealed four distinct variants of
<italic>Bacteroidales</italic>
-like phage relative abundance profiles across individual metagenomes, designated ‘viral-enterotypes’ A–D (
<xref ref-type="fig" rid="f8">Fig. 8b</xref>
). The validity of these putative phage-oriented microbiome groupings was subsequently confirmed using unsupervised ordination by non-metric multi-dimensional scaling (MDS) and analysis of similarities (ANOSIM) (
<italic>P</italic>
=0.002;
<xref ref-type="fig" rid="f8">Fig. 8c,d</xref>
). However, much overlap was evident between individual groups in all analyses, and not all groups were significantly or clearly separated (
<xref ref-type="fig" rid="f8">Fig. 8c,d</xref>
). These observations are reminiscent of the enterotypes model recently reported by Arumugam
<italic>et al.</italic>
<xref ref-type="bibr" rid="b36">36</xref>
in which members of the
<italic>Bacteroidales</italic>
also featured as drivers of the observed enterotypes
<xref ref-type="bibr" rid="b36">36</xref>
.</p>
</sec>
</sec>
<sec disp-level="1" sec-type="discussion">
<title>Discussion</title>
<sec disp-level="2">
<title></title>
<p>Bacteriophage genomes are believed to coevolve with, or adapt to long-term bacterial hosts, leading to the development of nucleotide usage patterns that resemble those of the host chromosome
<xref ref-type="bibr" rid="b22">22</xref>
<xref ref-type="bibr" rid="b23">23</xref>
<xref ref-type="bibr" rid="b24">24</xref>
<xref ref-type="bibr" rid="b37">37</xref>
. Here we show that global TUPs, in conjunction with functional profiling, can be employed for the direct phage-oriented dissection of conventional metagenomes, permitting the resolution and host-range affiliation of subliminal virome fractions contained within. A major advantage of the use of genome signatures in this application is the gene-independent, alignment-free nature of this approach. As nucleotide signatures are generally pervasive across genomes
<xref ref-type="bibr" rid="b23">23</xref>
<xref ref-type="bibr" rid="b37">37</xref>
, the requirement for the presence of conserved genes or motifs typically used for identification and classification of sequences is circumvented.</p>
<p>As such, genome signatures are well suited to analysis of sequence types lacking robust and universally conserved phylogenetic anchors, and fragmentary data sets where conventional gene-centric alignment-driven methods often perform poorly
<xref ref-type="bibr" rid="b37">37</xref>
<xref ref-type="bibr" rid="b38">38</xref>
<xref ref-type="bibr" rid="b39">39</xref>
<xref ref-type="bibr" rid="b40">40</xref>
<xref ref-type="bibr" rid="b41">41</xref>
<xref ref-type="bibr" rid="b42">42</xref>
. Metagenomes, and phage (or other MGE sequences) captured within, constitute prime examples of such data sets and sequence types, with the PGSR approach shown to resolve phage sequences not readily detected by conventional alignment-driven approaches, even when used in conjunction with phage-related sequence motifs or genes.</p>
<p>However, this method does not overcome all disadvantages of metagenomic approaches for viral discovery. For example, the focus on acquisition and analysis of chromosomal DNA in conventional metagenomic data sets will exclude RNA phage, and there remains a need for continued culture-based isolation of phage to provide well-characterized driver sequences. Despite these caveats, the PGSR approach can recover many additional phage sequences from few initial driver sequences, access phage not well represented in VLP-based censuses, and potentially be used to mine metagenomes for other MGE and semi-conserved sequences.</p>
<p>Furthermore, the use of well characterised phage sequences with known host-ranges, as drivers in the PGSR approach, permits recovery of contigs with a common taxonomic imprint, automatically providing an indication of host phylogeny. A high level of congruence between TUP inferred phage–host associations, and established host ranges for cultivable bacteria and their phage has previously been demonstrated
<xref ref-type="bibr" rid="b23">23</xref>
, and also indicated to hold true for viral sequences represented in metagenomic data sets
<xref ref-type="bibr" rid="b37">37</xref>
. Importantly, previous genome signature-based analyses of whole-community shotgun metagenomes have shown that the shared selective pressures placed upon microbes occupying a given habitat do not obscure the taxonomic imprint rooted in TUPs, even when the community is subject to strong and constant environmental stress, the genus-level resolution of metagenomic fragments remains feasible
<xref ref-type="bibr" rid="b37">37</xref>
. These observations are exemplified by the clear and consistent association of PGSR acquired contigs with
<italic>Bacteroides</italic>
spp. and members of the wider
<italic>Bacteroidales</italic>
in the present study.</p>
<p>Conversely, a small number of PGSR phage sequences (
<italic>n</italic>
=3) were affiliated with non-
<italic>Bacteroidales</italic>
species in alignment-driven surveys, and mapped to regions of phylograms closely related to members of the
<italic>Clostridiales,</italic>
but also populated by a mixture of
<italic>Bacteroidales</italic>
-affiliated and unaffiliated sequences. This variegated phylogenetic signal could be the result of convergent evolutionary processes that generate similar TUPs in unrelated organisms or phage genomes, obscuring the taxonomic imprint and leading to spurious host-range affiliations
<xref ref-type="bibr" rid="b22">22</xref>
<xref ref-type="bibr" rid="b23">23</xref>
. There is also the possibility that these sequences represent examples of viruses with very broad host-ranges
<xref ref-type="bibr" rid="b43">43</xref>
, or those in the process of adapting to new host species. Alternatively, the acquisition of new genetic material by horizontal gene transfer in phage is also well documented, and could account for the discordant alignment-based affiliations of the PGSR sequences in question. These issues are not unique to genome signature-based approaches and are also important considerations in gene-centric taxonomy
<xref ref-type="bibr" rid="b22">22</xref>
<xref ref-type="bibr" rid="b23">23</xref>
, constituting a potential limitation in both strategies.</p>
<p>The utilization of standard metagenomes in the PGSR approach should also provide access to fractions of bacteriophage communities that may be poorly represented by other methods. In light of the reported dominance of temperate phage in the human gut ecosystem
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b11">11</xref>
, it would be expected that greater access to quiescent phage will be important in further exploration of this viral community and will yield much insight into its structure and function. As such it is notable that the PGSR phage captured here were indicated to be predominantly prophage, and not well represented in existing VLP-derived gut viral data sets, supporting the identification and analysis of phage sequences not readily accessed by other approaches. However, variation in the geographic origins of the metagenomes and viromes utilized for these analyses cannot be excluded as a possible factor in the low level of PGSR phage representation in VLP-based data sets, with gut metagenomes from which PGSR phage were retrieved European in origin, but viral data sets generated from American individuals
<xref ref-type="bibr" rid="b11">11</xref>
<xref ref-type="bibr" rid="b13">13</xref>
<xref ref-type="bibr" rid="b21">21</xref>
. Alternatively, phage sequences recovered here may mostly represent inactivated prophage, which no longer contribute to the active, extrinsic VLP pool sampled in other studies.</p>
<p>Subsequent analyses showed PGSR phage not only encode functions directly relevant to human health (reinforcing the role of phage in spread of antibiotic resistance determinants) but also the potential specificity of PGSR phage to the human gut habitat, which is relevant to biotechnological applications of phage such as microbial source tracking
<xref ref-type="bibr" rid="b13">13</xref>
<xref ref-type="bibr" rid="b44">44</xref>
. In addition, the possible existence of ‘viral-enterotypes’ in this region of the gut virome was also revealed when individual gut metagenomes were compared. The phage-oriented grouping of microbiomes is reminiscent of the enterotypes model recently reported by Arumugam
<italic>et al.</italic>
<xref ref-type="bibr" rid="b36">36</xref>
, where individuals were grouped based on similarities in microbiome composition. Notably, two of the three microbial enterotypes presented by Arumugam
<italic>et al.</italic>
<xref ref-type="bibr" rid="b36">36</xref>
were driven by members of the
<italic>Bacteroidales</italic>
(
<italic>Bacteroides</italic>
and
<italic>Prevotella</italic>
), and it seems logical that examination of gut-specific temperate phage associated with these genera should generate concordant findings.</p>
<p>However, the
<italic>Bacteroidales</italic>
-like phage-oriented microbiome groupings observed here appear less well-defined and may be indicative of inter-individual gradients in phage population structure rather than entirely discrete groupings (as has also been posited for microbial enterotypes). Moreover, the grouping of individuals based on virome structure is inconsistent with other recent studies of the gut virome, where no such associations were observed
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b8">8</xref>
<xref ref-type="bibr" rid="b11">11</xref>
. These discrepancies may be due to the phylogenetically targeted analysis afforded by the PGSR approach coupled with the nature of the data sets from which PGSR phage are derived. In conjunction, these attributes should provide access to a closely related population of predominantly lysogenic phage (as prophage), expected to represent a more stable region of the phage ecological landscape in the gut microbiome.</p>
<p>Collectively, these factors could permit resolution of inter-individual similarities in gut virome structure obscured in studies focused on the virome as a whole, or the free, replicating virome fraction accessed through VLP libraries. Nevertheless, the data sets utilized here present only a ‘snapshot’ of the gut microbiome and do not capture the temporal dynamics of phage–host interactions. Much scope also remains to refine criteria and strategies used to identify and explore these putative viral-enterotypes. Although our observations provide the first indication that such groupings may exist in the gut virome, it is clear that further work will be required to confirm or refute the potential existence of viral-enterotypes within the
<italic>Bacteroidales</italic>
phage gene-space, and their significance, if any, for ecosystem function and development.</p>
<p>Overall, in this study we have validated a new strategy for analysing and understanding the composition of metagenomic data sets, as well as exploring and interpreting microbial viromes. This simple and accessible approach augments existing strategies, and can be applied retrospectively to available metagenomes to rapidly expand our knowledge of phage communities. Here we have employed the PGSR method to dissect human metagenomes with phylogenetic precision, and provide further insight into the structure and function of the human gut virome.</p>
</sec>
</sec>
<sec disp-level="1" sec-type="methods">
<title>Methods</title>
<sec disp-level="2">
<title>Phage genome signature-based dissection of gut metagenomes</title>
<p>To identify potential
<italic>Bacteroidales</italic>
-like phage sequences in human gut metagenomes, contigs from each data set were subject to genome signature comparisons with driver phage sequences, and subsequent binning based on encoded functions as outlined in
<xref ref-type="fig" rid="f1">Fig. 1</xref>
. Correlations between global usage patterns of all 256 possible tetranucleotide sequences in driver phage sequences (
<xref ref-type="table" rid="t1">Table 1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S1</xref>
), and all large contigs from human gut metagenomes
<xref ref-type="bibr" rid="b21">21</xref>
<xref ref-type="bibr" rid="b28">28</xref>
<xref ref-type="bibr" rid="b45">45</xref>
(
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
), were calculated according to the method of Teeling
<italic>et al.</italic>
<xref ref-type="bibr" rid="b46">46</xref>
, using the standalone TETRA 1.0 program. To ensure unambiguous tetranucleotide profiles were generated and recovered phage sequences could be distinguished, all metagenome contigs utilized were 10 kb or over in length
<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b46">46</xref>
. All sequences were extended by their reverse complement, and the divergence between observed and expected frequencies for each tetranucleotide were converted to
<italic>Z</italic>
-scores, which were compared pairwise between sequences to generate a Pearson’s similarity matrix of tetranucleotide usage correlation scores
<xref ref-type="bibr" rid="b46">46</xref>
. Metagenomic sequences exhibiting tetranucleotide correlation values of 0.6 or over
<xref ref-type="bibr" rid="b13">13</xref>
to any phage driver sequence were retained and protein encoding genes predicted using the RAST server, accessed through the myRAST interface
<xref ref-type="bibr" rid="b47">47</xref>
. For each metagenomic sequence, functional profiles were subsequently obtained by searches against the CDD
<xref ref-type="bibr" rid="b25">25</xref>
(1e
<sup>−2</sup>
or lower), using amino-acid sequences from predicted ORFs, and used to categorize each retrieved metagenomic contig as phage, non-phage or unclassified (UC) based on the following criteria: (i) phage: contains at least one unambiguous phage-related gene (for example, capsid, terminase, tail fibre, or annotated as phage related) and/or at least one phage-related ORF also present in one or more driver sequences; (ii) non-phage: absence of phage-related ORFs and/or dominated by ORFs-encoding functions commonly associated with chromosomal sequences; and (iii) UC: no ORFs with functions that provide clear indication of putative sequence type.</p>
</sec>
<sec disp-level="2">
<title>Annotation of PGSR phage sequences and designation of ORFs</title>
<p>Randomly selected PGSR phage sequences (
<italic>n</italic>
=20;
<xref ref-type="fig" rid="f2">Fig. 2a</xref>
) were annotated in Geneious 5.6.5 based on ORF predictions as described above. Amino-acid sequences for each ORF were used to search custom databases representing a broad collection of phage sequences using tBlastn (711 phage genomes and all contigs assembled from human gut viral metagenomes
<xref ref-type="bibr" rid="b11">11</xref>
), as well as the CDD
<xref ref-type="bibr" rid="b25">25</xref>
. Valid hits to other phage sequences (1e
<sup>−3</sup>
or lower), or the presence of conserved domains (1e
<sup>−2</sup>
or lower) with phage-related functions, were used to identify phage-related ORFs in each sequence (
<xref ref-type="fig" rid="f2">Fig. 2a</xref>
).</p>
</sec>
<sec disp-level="2">
<title>Calculation of ORF relative abundance</title>
<p>The relative abundance of ORFs in an extensive collection of chromosomal sequences (1,821 bacterial and archaeal chromosomes and all PGSR non-phage) as well as all phage sequences (711 phage genomes, viral metagenome assemblies and PGSR phage), was carried out as described previously
<xref ref-type="bibr" rid="b48">48</xref>
<xref ref-type="bibr" rid="b49">49</xref>
. Briefly, translated amino-acid sequences for each ORF were used to search data sets using tBlastn, and valid hits (minimum 35% identity over 30 aa or more, 1e
<sup>−5</sup>
or lower) used to calculate the relative abundance of each ORF in different data sets, expressed as hits per Mb (
<xref ref-type="fig" rid="f2">Fig. 2b</xref>
). Significant differences between relative abundances were assessed using the
<italic>χ</italic>
<sup>2</sup>
-test. Data sets and sequences utilized are described in
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
, Supplementary Data 3–6.</p>
</sec>
<sec disp-level="2">
<title>Alignment-driven survey of PGSR phage–host phylogeny</title>
<p>To compare the PGSR approach with conventional alignment-driven methods, for recovery of sequences closely related to driver phage, all large metagenome contigs (10 kb and over) were also searched using a variety of blast algorithms (Blastn, megablast, discontiguous megablast, tBlastn), with phage driver sequences as queries for nucleotide-level searches, and driver encoded capsid and terminase amino-acid sequences as queries for ORF level searches (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
). Blast searches were run with default parameters in all cases and implemented in Geneious 5.6.5 (Biomatters Ltd). All hits generating
<italic>e</italic>
-values of 1e
<sup>−3</sup>
or lower in each search were considered valid and the resulting search results were made non-redundant, with only the best hit (based on bit score) for each subject sequence retained. The resulting data were then used to calculate the number of sequences recovered, average % identity, and average % query coverage, as well as to identify the proportion of PGSR phage sequences identified in each blast search.</p>
</sec>
<sec disp-level="2">
<title>Clustering of sequences based on tetranucleotide usage</title>
<p>To test the phylogenetic inference afforded by the PGSR approach, PGSR sequences were compared with a selection of gut-associated chromosomal sequences (
<italic>n</italic>
=324) representing all major phylogenetic groups in the gut microbiome, and a large collection of phage genome sequences (
<italic>n</italic>
=647), as well as all large contigs from an independent assembly of 12 human gut viromes originally generated by Reyes
<italic>et al.</italic>
<xref ref-type="bibr" rid="b11">11</xref>
(
<italic>n</italic>
=188;
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
, Supplementary Data 3–6). All sequences utilized in this analysis were 10 kb in length of over. TUPs were calculated from all sequences as described above, using TETRA 1.0 (ref.
<xref ref-type="bibr" rid="b46">46</xref>
). For calculation of TUPs from draft chromosomes, contigs were first concatenated before analysis using TETRA
<xref ref-type="bibr" rid="b13">13</xref>
. Pearson’s dissimilarity matrices generated from TUPs were subsequently used to construct phylograms with the neighbor-joining algorithm in PHYLIP 3.69 (ref.
<xref ref-type="bibr" rid="b50">50</xref>
). Bootstrap analysis was performed based on methods described previously
<xref ref-type="bibr" rid="b22">22</xref>
, and conducted by sampling with replacement for each of the 256 TUPs, to produce 200 bootstrap replicates that were used to resolve the most probable topologies for each phylogram in Geneious 5.6.5. The final phylograms were visualized and annotated using Dendroscope 3.0.1 (ref.
<xref ref-type="bibr" rid="b51">51</xref>
).</p>
</sec>
<sec disp-level="2">
<title>Alignment-based affiliation of PGSR sequences</title>
<p>Alignments of PGSR phage nucleotide sequences and translated ORF sequences were conducted using Blastn and tBlastn, respectively, implemented in Geneious 5.6.5 and run with default parameters. PGSR sequences were compared with custom blast databases of 1,821 bacterial and archaeal chromosomal sequences from the NCBI and Human Microbiome Project (see
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
, Supplementary Data 3,4 for details and source of sequences). Only hits with 75% identity or over, and
<italic>e</italic>
-values of 1e
<sup>−5</sup>
or lower were considered valid. For nucleotide-level searches, alignments were also required to cover a minimum of 1 kb of PGSR query sequence to be considered valid. Top hits for each query (by bit score) were then used to affiliate each PGSR phage sequence or ORF with a bacterial genus (
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
) or order (
<xref ref-type="fig" rid="f4">Fig. 4c</xref>
). For taxonomic affiliation, ORF homologies were utilized only where no valid nucleotide-level alignments were generated (
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
). Where only ORF-based affiliation was considered, a minimum of two ORFs within a PGSR phage sequence were required to produce valid hits to bacterial species derived from the same order (
<xref ref-type="fig" rid="f4">Fig. 4c</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
). PGSR phage sequences were also compared with all phage-like sequences from the MetaHIT
<xref ref-type="bibr" rid="b21">21</xref>
data set independently identified by Stern
<italic>et al.</italic>
<xref ref-type="bibr" rid="b8">8</xref>
, and the host ranges they inferred for those sequences based on Blastn alignments or CRISPR spacer analysis (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S2</xref>
, Supplementary Data 2).</p>
</sec>
<sec disp-level="2">
<title>Representation of PGSR phage sequences in human gut viromes</title>
<p>To assess the level of representation of PGSR phage sequences in existing human gut viral metagenomes, pooled pyrosequencing reads from 12 human gut viromes
<xref ref-type="bibr" rid="b11">11</xref>
were mapped against PGSR phage sequences. Pyrosequencing reads were obtained from the NCBI short read archive and processed using CAMERA
<xref ref-type="bibr" rid="b52">52</xref>
workflows as previously described by Ogilvie
<italic>et al.</italic>
<xref ref-type="bibr" rid="b13">13</xref>
Briefly, low-quality reads and duplicates were removed using the 454 QC and 454 duplicate clustering workflows, respectively, with default parameters. The resulting collection of high-quality reads were mapped against PGSR phage sequences, and other phage sequence collections using the Geneious 5.6.5 map to reference tool with the following criteria: a minimum of 90% identity over 90% of the read length, and a maximum of 10% mismatches per read with no gaps permitted. Each read was only permitted to map to a single reference sequence per data set. For each reference data set, the total number of reads mapped to all sequences with the reference set was then normalized by the total size of the reference sequence data set in question, to provide reads mapped/Mb reference data. Significant differences in the proportion of reads mapping to distinct reference sequence sets were identified using the
<italic>χ</italic>
<sup>2</sup>
-test.</p>
</sec>
<sec disp-level="2">
<title>Habitat affiliation of PGSR phage sequences</title>
<p>To investigate the representation of PGSR phage sequences in other habitats, both viral metagenomes and conventional metagenomic data sets were surveyed (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
). For viral metagenomes, individual pyrosequencing reads were again mapped against PGSR phage and other reference data sets as describe above, but using relaxed criteria to afford conservative estimates of phage distribution: 70% identity over 25% of the read length, with a maximum of 10% mismatches and 10% gaps permitted per read. The percentage of reads from each virome mapping to a reference data set were normalized by reference data set size, as described above. In addition, assemblies of 12 conventional metagenomic data sets representing non-gut (terrestrial, freshwater and marine) and gut habitats, as well as 2 assembled viral metagenomes (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
), were also analysed for sequences with homology to PGSR and other phage. In this latter analysis, phage sequences were used to search each data set using Blast, and the number of valid hits from gut and non-gut metagenomes (minimum of 75% identity over 100 nt or more,
<italic>e</italic>
-value or 1e
<sup>−5</sup>
or lower) calculated, normalized by collective size of associated metagenomes, and used to affiliate each phage sequence to one of four categories based on relative representation in gut and non-gut data sets.</p>
</sec>
<sec disp-level="2">
<title>Functional profiling</title>
<p>For analysis of functions encoded by PGSR phage and non-phage sequences, all protein encoding genes in both sequence sets were annotated using the RAST server as described above, and amino-acid sequences from each group of sequences used to search the CDD
<xref ref-type="bibr" rid="b25">25</xref>
, the COG
<xref ref-type="bibr" rid="b26">26</xref>
, and the ACLAME databases
<xref ref-type="bibr" rid="b27">27</xref>
. Hits generating
<italic>e</italic>
-values of 1e
<sup>−2</sup>
or lower were considered valid in searches of CDD and ACLAME databases, and 1e
<sup>−3</sup>
or lower in COG searches. Valid hits were then used to compare functional profiles of PGSR sequences with other sequence sets. Comparisons were made at the Class level for COG searches, and element type (plasmid, virus and prophage) for ACLAME searches. For CDD searches, conserved domains detected in phage ORFs were binned into broad groups related to aspects of phage structure and replication (
<xref ref-type="fig" rid="f6">Fig. 6b</xref>
). Conserved domains not detected in phage sequences were categorized as non-phage. Significant differences between functional profiles for PGSR phage and non-phage sequence sets (both PGSR phage and all non-phage;
<xref ref-type="fig" rid="f6">Fig. 6</xref>
) were assessed using the
<italic>χ</italic>
<sup>2</sup>
-test.</p>
</sec>
<sec disp-level="2">
<title>Analysis of shotgun metaproteomes from human faecal microbes</title>
<p>Microbial cells recovered by Nycodenz extraction from stool samples (see
<italic>Recovery of bacterial cells from stool</italic>
) were suspended in 6  M guanidine isothiocyanate per 10  mM dithiothreitol/50 mM Tris pH 6.8 and processed for 4 × 30 s in a Fastprep FP120 cell disrupter (Thermo Fisher Scientific) to lyse cells and denature proteins. The guanidine isothiocyanate concentration was diluted to 1 M with 50 mM Tris (pH 6.8) and the complex sample fractionated by SDS–PAGE (12.5% gel). Protein bands were visualized by staining with colloidal Coomassie and post-separation each gel lane was divided into 28 equally sized slices (essentially as described by Schirle
<italic>et al.</italic>
<xref ref-type="bibr" rid="b53">53</xref>
) and subjected to trypsin in-gel digestion according to the method of Schevchenko
<italic>et al.</italic>
<xref ref-type="bibr" rid="b54">54</xref>
The supernatant from the digested samples was removed and acidified to 0.1% TFA, dried down and reconstituted in 0.1% TFA before LC MS/MS analysis. Tryptic peptides were fractionated on a 250 × 0.075 mm
<sup>2</sup>
reverse phase column (Acclaim PepMap100, C18, Dionex) using an Ultimate U3000 nano-LC system (Dionex) and a 2-h linear gradient from 95% solvent A (0.1% formic acid in water) and 5% B (0.1% formic acid in 95% acetonitrile) to 50% B at a flow rate of 250 nl min
<sup>−1</sup>
. Eluting peptides were directly analysed by tandem mass spectrometry using a LTQ Orbitrap XL hybrid FTMS (ThermoScientific). Derived MS/MS data (using a combined data set comprising total spectra derived from each of the 28 samples per cell pellet) were searched against databases generated from translated amino-acid sequences from all ORFs predicted in recovered PGSR contigs (
<italic>n</italic>
=2,918 ORFs for PGSR phage;
<italic>n</italic>
=6,168 ORFs for PGSR non-phage), and all contigs from human gut VLP viral metagenome assemblies
<xref ref-type="bibr" rid="b11">11</xref>
(
<italic>n</italic>
=16,055 ORFs). Searches were conducted using Sequest version SRF v5 as implemented in Bioworks v3.3.1 (Thermo Fisher Scientific), assuming carboxyamidomethylation (Cys), deamidation (Asn) and oxidation (Met) as variable modifications, and using a peptide tolerance of 10 p.p.m. and a fragment ion tolerance of 0.8 Da. Filtering criteria used for positive protein identifications were Xcorr values greater than 1.5 for +1 spectra, 2 for +2 spectra and 2.5 for +3 spectra and a delta correlation (DCn) cutoff of 0.1, with a minimum of two tryptic peptides required per protein.</p>
</sec>
<sec disp-level="2">
<title>Functionality of PGSR phage-encoded β-lactamases</title>
<p>Nucleotide sequences of PGSR phage encoding putative β-lactamase genes (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S4</xref>
) were aligned using ClustalW
<xref ref-type="bibr" rid="b55">55</xref>
, and regions of homology flanking β-lactamase ORFs in all sequences were identified. Primers targeting these flanking regions were designed using Primer3 (
<ext-link ext-link-type="uri" xlink:href="http://frodo.wi.mit.edu">http://frodo.wi.mit.edu</ext-link>
). The resulting primers (BLF 5′-TTACGGGAGGTATGGACTGC-3′; BLR 5′-TGGTTAAGCCCCTTGAACTG-3′) were used to amplify PGSR phage β-lactamase genes from total gut metagenomic DNA (See
<italic>Extraction of metagenomic DNA</italic>
). PCR amplicons were subsequently purified using the QIAquick Gel Extraction Kit (Qiagen Inc, UK), cloned into pPCR Script-Cam (Agilent, UK), and constructs transformed into
<italic>E. coli</italic>
XL10 gold. Resultant transformants were tested for their ability to grow in the presence of a range of β-lactam antibiotics (mecillinam 10 μg; ampicillin 25 μg, amoxicillin 25 μg, ceftazidime 30 μg) by disc diffusion assays conducted according to BSAC guidelines (
<ext-link ext-link-type="uri" xlink:href="http://bsac.org.uk/susceptibility/">http://bsac.org.uk/susceptibility/</ext-link>
). Presence of PGSR phage-derived β-lactamases in transformants conferring resistance was confirmed by direct sequencing of cloned amplicons using standard M13 primers, at GATC Sequencing Services, UK.</p>
</sec>
<sec disp-level="2">
<title>Inter-individual variation in
<italic>Bacteroidales</italic>
-like phage carriage</title>
<p>The representation of sequences homologous to PGSR phage in gut metagenome assemblies was estimated by calculating relative abundance, based on Blast searches, as described previously by Jones
<italic>et al.</italic>
<xref ref-type="bibr" rid="b48">48</xref>
<xref ref-type="bibr" rid="b49">49</xref>
PGSR phage sequences were used to search complete gut metagenomes using Blastn (assembled data sets containing all contigs regardless of length), for contigs with high levels of similarity. Hits exhibiting a minimum of 80% identity over at least 50% of the subject sequence, and an
<italic>e</italic>
-value of 1e
<sup>−5</sup>
or lower were considered valid, and used to calculate relative abundance (expressed as hits per Mb DNA). Subject sequence coverage thresholds were selected to minimize contribution from sequences with only limited regions of homology to PGSR phage, which are unlikely to be closely related. For the purposes of this analysis, PGSR phage contigs designated as part of the same scaffold (
<italic>n</italic>
=12) were treated as single-phage sequences and combined relative abundance calculated. To explore the potential existence of viral-enterotypes in gut microbiomes, individuals were progressively grouped according to relative abundance profiles of PGSR phage homologues, using a simple hierarchical heuristic. Starting with a randomly selected individual metagenome, individuals exhibiting similar profiles (regardless of levels of relative abundance) were assigned as ‘viral-enterotype A’, and the remainder of individuals assigned to subsequent groups in the same way until no further groupings could be made (UC). This process was repeated a second time to refine initial groupings beginning with the first individual in ‘group A’ and progressing to group D. PGSR sequences generating hits in 40% or greater of human gut metagenomes, representing the most broadly distributed phage (
<italic>n</italic>
=10), were treated as noise, and not considered during the heuristic ranking process. The existence of putative viral-enterotypes were also explored using non-metric MDS of a Bray–Curtis similarity matrix of relative abundance (hits per Mb DNA) of all PGSR sequences within each individual (including those PGSR phage sequences with homologues in 40% or more individual metagenomes and excluded from the heuristic ranking). Putative viral enterotype groupings (A, B, C, D and UC) generated from the hierarchical heuristic model were superimposed onto the MDS configuration of similarities plot and ANOSIM analysis conducted to test strength and significance of groupings (
<italic>P</italic>
<0.05;
<italic>R</italic>
statistic indicates increasing separation of groups as values approach 1). MDS and ANOSIM analysis was conducted using Primer v6 software
<xref ref-type="bibr" rid="b56">56</xref>
. Hierarchical heuristic ranking was carried out in Microsoft Excel.</p>
</sec>
<sec disp-level="2">
<title>Construction of Emergent Self-Organizing Maps (ESOM)</title>
<p>For broader analysis of PGSR sequence taxonomy based on tetranucleotide useage profiles (TUPs), sequences were compared with an extended collection of bacterial chromosomes (
<italic>n</italic>
=1,700) from a wide range of habitats, as well as all phage sequences used to construct phylograms (647 phage genomes and 188 large contigs from gut viromes) (
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 3–6</xref>
). Relationships between sequences in this data set based on TUPs were visualized by the construction of emergent self-organizing maps using the Databionics ESOM analyser
<xref ref-type="bibr" rid="b57">57</xref>
(
<ext-link ext-link-type="uri" xlink:href="http://databionic-esom.sourceforge.net">http://databionic-esom.sourceforge.net</ext-link>
). Tetranucleotide frequencies transformed by
<italic>Z</italic>
-score were used with the online training algorithm over 20 training epochs, with permutation of data on each training run. Maps were generated using the correlation data distance in torroidal 2D (borderless) form and the following default training parameters: Standard bestmatch (bm) search method, a local bm search radius of 8, Gaussian weight initialization and neighbourhood kernel function, linear cooling strategy for training (radius of 24 to 1), and linear strategy for learning rate (0.5–0.1). Maps were visualized using the UMatrix background with 128 colors and height cutoff (clip) of 65%.</p>
</sec>
<sec disp-level="2">
<title>Recovery of bacterial cells from stool</title>
<p>Microbial cells were extracted from faecal material obtained from a healthy 26-year-old male volunteer (sample collection was approved by the Clinical Research Ethics Committee of the Cork Teaching Hospitals) as described previously
<xref ref-type="bibr" rid="b58">58</xref>
. In summary, 10 g of stool sample was thoroughly homogenized in 20 ml phosphate buffered saline (PBS), centrifuged at 1,000 
<italic>g</italic>
for 5 min at 4 °C to pellet debris and the resulting supernatant removed to a fresh sterile tube. The faecal pellet was then washed gently three times with a single 5 ml PBS aliquot and pooled with the recovered supernatant. To separate bacterial cells from faeces, 15 ml aliquots of resulting homogenized faecal slurry were layered onto a 9.75 ml cushion of Nycodenz solution (Axis-Shield, Oslo, Norway) at a density of 1.3 g ml
<sup>−1</sup>
Tris EDTA solution (TE buffer; 10 mM Tris, 1 mM EDTA, pH 8). Bacterial cells were harvested by centrifugation at 10,000 
<italic>g</italic>
for 6 min at 4 °C and pooled, and stored as 10% glycerol stocks in 1 ml volumes at −80 °C until required.</p>
</sec>
<sec disp-level="2">
<title>Extraction of metagenomic DNA</title>
<p>Stocks of Nycodenz recovered cells (see
<italic>Recovery of bacterial cells from stool</italic>
) were thawed slowly on ice and 1 ml aliquots were centrifuged at 17,000 
<italic>g</italic>
for 1 min and then washed 3 × in PBS. To lyse cells, pellets were resuspended in 900 μl of TE buffer pH 8, 500 μl lysosyme (Sigma, UK; 50 mg ml
<sup>−1</sup>
TE, pH 8), 100 μl Mutanolysin (Sigma, UK; 1 mg ml
<sup>−1</sup>
) and incubated at 37 °C for 1 h with occasional inversion. To further enhance lysis, 200 μl Proteinase K (Sigma, UK; >800 units per ml) was added to the bacterial cells and incubated at 55 °C for 1 h. Supernatant was discarded and 800 μl of 2.5%
<italic>N</italic>
-Lauryl Sarcosine solution (Sigma, UK) was added to the cells and incubated for a further 15 min at 68 °C. Following lysis, proteins were precipitated by addition of 500 μl saturated ammonium acetate solution (Sigma, UK) for 1 h at room temperature. To extract DNA an equal volume of Chloroform (Thermo Fisher Scientific UK) was added, centrifuged at 12,000 
<italic>g</italic>
for 3 min and resulting extracts removed to a fresh tube and then repeated. Resulting DNA was precipitated with ice cold ethanol (absolute; Thermo Fisher Scientific) and dissolved in sterile nuclease free water (Cambio, UK), and stored at −20 °C until use.</p>
</sec>
</sec>
<sec disp-level="1">
<title>Author contributions</title>
<p>B.V.J. and L.A.O. conceived the study. All authors contributed to study design. B.V.J., L.A.O., L.D.B., C.D., E.C. and J.C. conducted the study and analysed the data. B.V.J. and L.A.O. wrote the manuscript and all authors edited the manuscript.</p>
</sec>
<sec disp-level="1">
<title>Additional information</title>
<p>
<bold>How to cite this article:</bold>
Ogilvie, L. A.
<italic>et al.</italic>
Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences.
<italic>Nat. Commun.</italic>
4:2420 doi: 10.1038/ncomms3420 (2013).</p>
</sec>
<sec sec-type="supplementary-material" id="S1">
<title>Supplementary Material</title>
<supplementary-material id="d33e18" content-type="local-data">
<caption>
<title>Supplementary Figures, Tables and References</title>
<p>Supplementary Figures S1-S5, Supplementary Tables S1-S4 and Supplementary References</p>
</caption>
<media xlink:href="ncomms3420-s1.pdf"></media>
</supplementary-material>
<supplementary-material id="d33e24" content-type="local-data">
<caption>
<title>Supplementary Data 1</title>
<p>Sequences recovered from gut metagenomes using the PGSR and categorised by functional profiling.</p>
</caption>
<media xlink:href="ncomms3420-s2.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e30" content-type="local-data">
<caption>
<title>Supplementary Data 2</title>
<p>Confirmation of TUP-based PGSR phage-host affiliation by alignments to bacterial chromosomes.</p>
</caption>
<media xlink:href="ncomms3420-s3.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e36" content-type="local-data">
<caption>
<title>Supplementary Data 3</title>
<p>Gut associated chromosomal sequences.</p>
</caption>
<media xlink:href="ncomms3420-s4.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e42" content-type="local-data">
<caption>
<title>Supplementary Data 4</title>
<p>All chromosomal sequences utilised.</p>
</caption>
<media xlink:href="ncomms3420-s5.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e48" content-type="local-data">
<caption>
<title>Supplementary Data 5</title>
<p>All phage genome sequences.</p>
</caption>
<media xlink:href="ncomms3420-s6.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e54" content-type="local-data">
<caption>
<title>Supplementary Data 6</title>
<p>Select sub-set of phage genome sequences.</p>
</caption>
<media xlink:href="ncomms3420-s7.xlsx"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>Dr L.A.O. is supported by funding from the Medical Research Council (Grant ID number G0901553 awarded to Dr B.V.J.). Research in the laboratory of Dr B.V.J. is also supported by funding from the Healthcare Infection Society, The Society for Applied Microbiology and The University of Brighton. We also thank Margaret Daniels, Heather Catty, Rowena Berterelli and Joe Hawthorn for technical assistance, and Dr Caroline Jones for constructive comments and criticism.</p>
</ack>
<ref-list>
<ref id="b1">
<mixed-citation publication-type="journal">
<name>
<surname>Suttle</surname>
<given-names>C. A.</given-names>
</name>
<article-title>Viruses in the sea</article-title>
.
<source>Nature</source>
<volume>437</volume>
,
<fpage>356</fpage>
<lpage>361</lpage>
(
<year>2005</year>
).
<pub-id pub-id-type="pmid">16163346</pub-id>
</mixed-citation>
</ref>
<ref id="b2">
<mixed-citation publication-type="journal">
<name>
<surname>Wommack</surname>
<given-names>K. E.</given-names>
</name>
&
<name>
<surname>Colwell</surname>
<given-names>R. R.</given-names>
</name>
<article-title>Virioplankton: viruses in aquatic ecosystems</article-title>
.
<source>Microbiol. Mol. Biol. Rev.</source>
<volume>64</volume>
,
<fpage>69</fpage>
<lpage>114</lpage>
(
<year>2000</year>
).
<pub-id pub-id-type="pmid">10704475</pub-id>
</mixed-citation>
</ref>
<ref id="b3">
<mixed-citation publication-type="journal">
<name>
<surname>Reyes</surname>
<given-names>A.</given-names>
</name>
,
<name>
<surname>Semenkovich</surname>
<given-names>N. P.</given-names>
</name>
,
<name>
<surname>Whiteson</surname>
<given-names>K.</given-names>
</name>
,
<name>
<surname>Rohwer</surname>
<given-names>F.</given-names>
</name>
&
<name>
<surname>Gordon</surname>
<given-names>J. I.</given-names>
</name>
<article-title>Going viral: next generation sequencing applied to phage populations in the human gut</article-title>
.
<source>Nat. Rev. Microbiol.</source>
<volume>10</volume>
,
<fpage>607</fpage>
<lpage>617</lpage>
(
<year>2012</year>
).
<pub-id pub-id-type="pmid">22864264</pub-id>
</mixed-citation>
</ref>
<ref id="b4">
<mixed-citation publication-type="journal">
<name>
<surname>Fuhrman</surname>
<given-names>J. A.</given-names>
</name>
<article-title>Marine viruses and their biogeochemical and ecological effects</article-title>
.
<source>Nature</source>
<volume>399</volume>
,
<fpage>541</fpage>
<lpage>548</lpage>
(
<year>1999</year>
).
<pub-id pub-id-type="pmid">10376593</pub-id>
</mixed-citation>
</ref>
<ref id="b5">
<mixed-citation publication-type="journal">
<name>
<surname>Brüssow</surname>
<given-names>H.</given-names>
</name>
,
<name>
<surname>Canchaya</surname>
<given-names>C.</given-names>
</name>
&
<name>
<surname>Hardt</surname>
<given-names>W.-D.</given-names>
</name>
<article-title>Phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion</article-title>
.
<source>Microbiol. Mol. Biol. Rev.</source>
<volume>68</volume>
,
<fpage>560</fpage>
<lpage>602</lpage>
(
<year>2004</year>
).
<pub-id pub-id-type="pmid">15353570</pub-id>
</mixed-citation>
</ref>
<ref id="b6">
<mixed-citation publication-type="journal">
<name>
<surname>Breitbart</surname>
<given-names>M.</given-names>
</name>
<italic>et al.</italic>
<article-title>Metagenomic analyses of an uncultured viral community from human feces</article-title>
.
<source>J. Bacteriol.</source>
<volume>185</volume>
,
<fpage>6220</fpage>
<lpage>6223</lpage>
(
<year>2003</year>
).
<pub-id pub-id-type="pmid">14526037</pub-id>
</mixed-citation>
</ref>
<ref id="b7">
<mixed-citation publication-type="journal">
<name>
<surname>Minot</surname>
<given-names>S.</given-names>
</name>
<italic>et al.</italic>
<article-title>The human gut virome: inter-individual variation and dynamic response to diet</article-title>
.
<source>Genome Res.</source>
<volume>21</volume>
,
<fpage>1616</fpage>
<lpage>1625</lpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">21880779</pub-id>
</mixed-citation>
</ref>
<ref id="b8">
<mixed-citation publication-type="journal">
<name>
<surname>Stern</surname>
<given-names>A.</given-names>
</name>
,
<name>
<surname>Mick</surname>
<given-names>E.</given-names>
</name>
,
<name>
<surname>Tirosh</surname>
<given-names>I.</given-names>
</name>
,
<name>
<surname>Sagy</surname>
<given-names>O.</given-names>
</name>
&
<name>
<surname>Sorek</surname>
<given-names>R.</given-names>
</name>
<article-title>CRISPR targeting reveals a reservoir of common phages associated with the human gut microbiome</article-title>
.
<source>Genome Res.</source>
<volume>22</volume>
,
<fpage>1985</fpage>
<lpage>1994</lpage>
(
<year>2012</year>
).
<pub-id pub-id-type="pmid">22732228</pub-id>
</mixed-citation>
</ref>
<ref id="b9">
<mixed-citation publication-type="journal">
<name>
<surname>Williamson</surname>
<given-names>S. J.</given-names>
</name>
<italic>et al.</italic>
<article-title>The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples</article-title>
.
<source>PLoS One</source>
<volume>3</volume>
,
<fpage>e1456</fpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">18213365</pub-id>
</mixed-citation>
</ref>
<ref id="b10">
<mixed-citation publication-type="journal">
<name>
<surname>Angly</surname>
<given-names>F. E.</given-names>
</name>
<italic>et al.</italic>
<article-title>The marine viromes of four oceanic regions</article-title>
.
<source>PLoS Biol.</source>
<volume>4</volume>
,
<fpage>e368</fpage>
(
<year>2006</year>
).
<pub-id pub-id-type="pmid">17090214</pub-id>
</mixed-citation>
</ref>
<ref id="b11">
<mixed-citation publication-type="journal">
<name>
<surname>Reyes</surname>
<given-names>A.</given-names>
</name>
<italic>et al.</italic>
<article-title>Viruses in the faecal microbiota of monozygotic twins and their mothers</article-title>
.
<source>Nature</source>
<volume>466</volume>
,
<fpage>334</fpage>
<lpage>338</lpage>
(
<year>2010</year>
).
<pub-id pub-id-type="pmid">20631792</pub-id>
</mixed-citation>
</ref>
<ref id="b12">
<mixed-citation publication-type="journal">
<name>
<surname>Caporaso</surname>
<given-names>J. G.</given-names>
</name>
,
<name>
<surname>Knight</surname>
<given-names>R.</given-names>
</name>
&
<name>
<surname>Kelley</surname>
<given-names>S. T.</given-names>
</name>
<article-title>Host-associated and free-living phage communities differ profoundly in phylogenetic composition</article-title>
.
<source>PLoS One</source>
<volume>6</volume>
,
<fpage>e16900</fpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">21383980</pub-id>
</mixed-citation>
</ref>
<ref id="b13">
<mixed-citation publication-type="journal">
<name>
<surname>Ogilvie</surname>
<given-names>L. A.</given-names>
</name>
<italic>et al.</italic>
<article-title>Comparative (meta)genomic analysis and ecological profiling of human gut-specific bacteriophage ϕB124-14</article-title>
.
<source>PLoS One</source>
<volume>7</volume>
,
<fpage>e35053</fpage>
(
<year>2012</year>
).
<pub-id pub-id-type="pmid">22558115</pub-id>
</mixed-citation>
</ref>
<ref id="b14">
<mixed-citation publication-type="journal">
<name>
<surname>Lepage</surname>
<given-names>P.</given-names>
</name>
<italic>et al.</italic>
<article-title>Dysbiosis in inflammatory bowel disease: a role for bacteriophages?</article-title>
<source>Gut</source>
<volume>57</volume>
,
<fpage>424</fpage>
<lpage>425</lpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">18268057</pub-id>
</mixed-citation>
</ref>
<ref id="b15">
<mixed-citation publication-type="journal">
<name>
<surname>Jones</surname>
<given-names>B. V.</given-names>
</name>
<article-title>The human gut mobile metagenome: a metazoan perspective</article-title>
.
<source>Gut Microbe.</source>
<volume>1</volume>
,
<fpage>415</fpage>
<lpage>431</lpage>
(
<year>2010</year>
).</mixed-citation>
</ref>
<ref id="b16">
<mixed-citation publication-type="journal">
<name>
<surname>Gorski</surname>
<given-names>A.</given-names>
</name>
<italic>et al.</italic>
<article-title>New insights into the possible role of bacteriophages in host defense and disease</article-title>
.
<source>Med. Immunol.</source>
<volume>2</volume>
,
<fpage>2</fpage>
(
<year>2003</year>
).
<pub-id pub-id-type="pmid">12625836</pub-id>
</mixed-citation>
</ref>
<ref id="b17">
<mixed-citation publication-type="journal">
<name>
<surname>Colomer-Lluch</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Jofre</surname>
<given-names>J.</given-names>
</name>
&
<name>
<surname>Muniesa</surname>
<given-names>M.</given-names>
</name>
<article-title>Antibiotic resistance genes in the bacteriophage DNA fraction of environmental samples</article-title>
.
<source>PLoS One</source>
<volume>6</volume>
,
<fpage>e17549</fpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">21390233</pub-id>
</mixed-citation>
</ref>
<ref id="b18">
<mixed-citation publication-type="journal">
<name>
<surname>Waldor</surname>
<given-names>M. K.</given-names>
</name>
&
<name>
<surname>Mekalanos</surname>
<given-names>J. J.</given-names>
</name>
<article-title>Lysogenic conversion by a filamentous phage encoding cholera toxin</article-title>
.
<source>Science</source>
<volume>272</volume>
,
<fpage>1910</fpage>
<lpage>1914</lpage>
(
<year>1996</year>
).
<pub-id pub-id-type="pmid">8658163</pub-id>
</mixed-citation>
</ref>
<ref id="b19">
<mixed-citation publication-type="journal">
<name>
<surname>Rohwer</surname>
<given-names>F.</given-names>
</name>
,
<name>
<surname>Prangishvili</surname>
<given-names>D.</given-names>
</name>
&
<name>
<surname>Lindell</surname>
<given-names>D.</given-names>
</name>
<article-title>Roles of viruses in the environment</article-title>
.
<source>Environ. Microbiol.</source>
<volume>11</volume>
,
<fpage>2771</fpage>
<lpage>2774</lpage>
(
<year>2009</year>
).
<pub-id pub-id-type="pmid">19878268</pub-id>
</mixed-citation>
</ref>
<ref id="b20">
<mixed-citation publication-type="journal">
<name>
<surname>Thurber</surname>
<given-names>R. V.</given-names>
</name>
,
<name>
<surname>Haynes</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Breitbart</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Wegley</surname>
<given-names>L.</given-names>
</name>
&
<name>
<surname>Rohwer</surname>
<given-names>F.</given-names>
</name>
<article-title>Laboratory procedures to generate viral metagenomes</article-title>
.
<source>Nat. Protoc.</source>
<volume>4</volume>
,
<fpage>470</fpage>
<lpage>483</lpage>
(
<year>2009</year>
).
<pub-id pub-id-type="pmid">19300441</pub-id>
</mixed-citation>
</ref>
<ref id="b21">
<mixed-citation publication-type="journal">
<name>
<surname>Qin</surname>
<given-names>J.</given-names>
</name>
<italic>et al.</italic>
<article-title>A human gut microbial gene catalogue established by metagenomic sequencing</article-title>
.
<source>Nature</source>
<volume>464</volume>
,
<fpage>59</fpage>
<lpage>65</lpage>
(
<year>2010</year>
).
<pub-id pub-id-type="pmid">20203603</pub-id>
</mixed-citation>
</ref>
<ref id="b22">
<mixed-citation publication-type="journal">
<name>
<surname>Pride</surname>
<given-names>D. T.</given-names>
</name>
,
<name>
<surname>Meinersmann</surname>
<given-names>R. J.</given-names>
</name>
,
<name>
<surname>Wassenaar</surname>
<given-names>T. M.</given-names>
</name>
&
<name>
<surname>Blaser</surname>
<given-names>M. J.</given-names>
</name>
<article-title>Evolutionary implications of microbial genome tetranucleotide frequency biases</article-title>
.
<source>Genome Res.</source>
<volume>13</volume>
,
<fpage>145</fpage>
<lpage>158</lpage>
(
<year>2003</year>
).
<pub-id pub-id-type="pmid">12566393</pub-id>
</mixed-citation>
</ref>
<ref id="b23">
<mixed-citation publication-type="journal">
<name>
<surname>Pride</surname>
<given-names>D. T.</given-names>
</name>
,
<name>
<surname>Wassenaar</surname>
<given-names>T. M.</given-names>
</name>
,
<name>
<surname>Ghose</surname>
<given-names>C.</given-names>
</name>
&
<name>
<surname>Blaser</surname>
<given-names>M. J.</given-names>
</name>
<article-title>Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses</article-title>
.
<source>BMC Genomics</source>
<volume>7</volume>
,
<fpage>8</fpage>
(
<year>2006</year>
).
<pub-id pub-id-type="pmid">16417644</pub-id>
</mixed-citation>
</ref>
<ref id="b24">
<mixed-citation publication-type="journal">
<name>
<surname>Deschavanne</surname>
<given-names>P.</given-names>
</name>
,
<name>
<surname>DuBow</surname>
<given-names>M. S.</given-names>
</name>
&
<name>
<surname>Regeard</surname>
<given-names>C.</given-names>
</name>
<article-title>The use of genomic signature distance between bacteriophages and their hosts displays evolutionary relationships and phage growth cycle determination</article-title>
.
<source>Virology J.</source>
<volume>7</volume>
,
<fpage>163</fpage>
(
<year>2010</year>
).
<pub-id pub-id-type="pmid">20637121</pub-id>
</mixed-citation>
</ref>
<ref id="b25">
<mixed-citation publication-type="journal">
<name>
<surname>Marchler-Bauer</surname>
<given-names>A.</given-names>
</name>
<italic>et al.</italic>
<article-title>CDD: a Conserved Domain Database for the functional annotation of proteins</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>39</volume>
,
<fpage>D225</fpage>
<lpage>D229</lpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">21109532</pub-id>
</mixed-citation>
</ref>
<ref id="b26">
<mixed-citation publication-type="journal">
<name>
<surname>Tatusov</surname>
<given-names>R. L.</given-names>
</name>
<italic>et al.</italic>
<article-title>The COG database: an updated version includes eukaryotes</article-title>
.
<source>BMC Bioinformatics</source>
<volume>4</volume>
,
<fpage>41</fpage>
(
<year>2003</year>
).
<pub-id pub-id-type="pmid">12969510</pub-id>
</mixed-citation>
</ref>
<ref id="b27">
<mixed-citation publication-type="journal">
<name>
<surname>Leplae</surname>
<given-names>R.</given-names>
</name>
,
<name>
<surname>Hebrant</surname>
<given-names>A.</given-names>
</name>
,
<name>
<surname>Wodak</surname>
<given-names>S. J.</given-names>
</name>
&
<name>
<surname>Toussaint</surname>
<given-names>A.</given-names>
</name>
<article-title>ACLAME: a classification of Mobile genetic Elements</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>32</volume>
,
<fpage>D45</fpage>
<lpage>D49</lpage>
(
<year>2004</year>
).
<pub-id pub-id-type="pmid">14681355</pub-id>
</mixed-citation>
</ref>
<ref id="b28">
<mixed-citation publication-type="journal">
<name>
<surname>Kurokawa</surname>
<given-names>K.</given-names>
</name>
<italic>et al.</italic>
<article-title>Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes</article-title>
.
<source>DNA Res.</source>
<volume>14</volume>
,
<fpage>169</fpage>
<lpage>181</lpage>
(
<year>2007</year>
).
<pub-id pub-id-type="pmid">17916580</pub-id>
</mixed-citation>
</ref>
<ref id="b29">
<mixed-citation publication-type="journal">
<name>
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
<italic>et al.</italic>
<article-title>Evolution of symbiotic bacteria in the distal human intestine</article-title>
.
<source>PLoS Biol.</source>
<volume>5</volume>
,
<fpage>e156</fpage>
(
<year>2007</year>
).
<pub-id pub-id-type="pmid">17579514</pub-id>
</mixed-citation>
</ref>
<ref id="b30">
<mixed-citation publication-type="journal">
<name>
<surname>Murphy</surname>
<given-names>K. C.</given-names>
</name>
<italic>et al.</italic>
<article-title>Dam methyltransferase is required for stable lysogeny of the Shiga toxin (Stx2)-encoding bacteriophage 933W of enterohemorrhagic
<italic>Escherichia coli</italic>
O157:H7</article-title>
.
<source>J. Bacteriol.</source>
<volume>190</volume>
,
<fpage>438</fpage>
<lpage>441</lpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">17981979</pub-id>
</mixed-citation>
</ref>
<ref id="b31">
<mixed-citation publication-type="journal">
<name>
<surname>Kruger</surname>
<given-names>D. H.</given-names>
</name>
&
<name>
<surname>Bickle</surname>
<given-names>T. A.</given-names>
</name>
<article-title>Bacteriophage survival: multiple mechanisms for avoiding the deoxyribonucleic acid restriction systems of their hosts</article-title>
.
<source>Microbiol. Rev.</source>
<volume>47</volume>
,
<fpage>345</fpage>
<lpage>360</lpage>
(
<year>1983</year>
).
<pub-id pub-id-type="pmid">6314109</pub-id>
</mixed-citation>
</ref>
<ref id="b32">
<mixed-citation publication-type="journal">
<name>
<surname>Groth</surname>
<given-names>A. C.</given-names>
</name>
&
<name>
<surname>Calos</surname>
<given-names>M. P.</given-names>
</name>
<article-title>Phage integrases: biology and applications</article-title>
.
<source>J. Mol. Biol.</source>
<volume>335</volume>
,
<fpage>667</fpage>
<lpage>678</lpage>
(
<year>2004</year>
).
<pub-id pub-id-type="pmid">14687564</pub-id>
</mixed-citation>
</ref>
<ref id="b33">
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
&
<name>
<surname>Pop</surname>
<given-names>M.</given-names>
</name>
<article-title>ARDB-Antibiotic resistance genes database</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>37</volume>
,
<fpage>D443</fpage>
<lpage>D447</lpage>
(
<year>2009</year>
).
<pub-id pub-id-type="pmid">18832362</pub-id>
</mixed-citation>
</ref>
<ref id="b34">
<mixed-citation publication-type="journal">
<name>
<surname>Lund</surname>
<given-names>F.</given-names>
</name>
&
<name>
<surname>Tybring</surname>
<given-names>L.</given-names>
</name>
<article-title>6-Amidinopenicillanic acids--a new group of antibiotics</article-title>
.
<source>Nat. N. Biol.</source>
<volume>236</volume>
,
<fpage>135</fpage>
<lpage>137</lpage>
(
<year>1972</year>
).</mixed-citation>
</ref>
<ref id="b35">
<mixed-citation publication-type="journal">
<name>
<surname>Wootton</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Walsh</surname>
<given-names>T. R.</given-names>
</name>
,
<name>
<surname>Macfarlane</surname>
<given-names>L.</given-names>
</name>
&
<name>
<surname>Howe</surname>
<given-names>R. A.</given-names>
</name>
<article-title>Activity of mecillinam against
<italic>Escherichia coli</italic>
resistant to third-generation cephalosporins</article-title>
.
<source>J. Antimicrob. Chemother.</source>
<volume>65</volume>
,
<fpage>79</fpage>
<lpage>81</lpage>
(
<year>2010</year>
).
<pub-id pub-id-type="pmid">19915068</pub-id>
</mixed-citation>
</ref>
<ref id="b36">
<mixed-citation publication-type="journal">
<name>
<surname>Arumugam</surname>
<given-names>M.</given-names>
</name>
<italic>et al.</italic>
<article-title>Enterotypes of the human gut microbiome</article-title>
.
<source>Nature</source>
<volume>473</volume>
,
<fpage>174</fpage>
<lpage>180</lpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">21508958</pub-id>
</mixed-citation>
</ref>
<ref id="b37">
<mixed-citation publication-type="journal">
<name>
<surname>Dick</surname>
<given-names>G. J.</given-names>
</name>
<italic>et al.</italic>
<article-title>Community-wide analysis of microbial genome sequence signatures</article-title>
.
<source>Genome Biol.</source>
<volume>10</volume>
,
<fpage>R85</fpage>
(
<year>2009</year>
).
<pub-id pub-id-type="pmid">19698104</pub-id>
</mixed-citation>
</ref>
<ref id="b38">
<mixed-citation publication-type="journal">
<name>
<surname>Duhaime</surname>
<given-names>M. B.</given-names>
</name>
,
<name>
<surname>Wichels</surname>
<given-names>A.</given-names>
</name>
,
<name>
<surname>Waldmann</surname>
<given-names>J.</given-names>
</name>
,
<name>
<surname>Teeling</surname>
<given-names>H.</given-names>
</name>
&
<name>
<surname>Glöckner</surname>
<given-names>F. O.</given-names>
</name>
<article-title>Ecogenomics and genome landscapes of marine
<italic>Pseudoalteromonas</italic>
phage H105/1</article-title>
.
<source>ISME J.</source>
<volume>5</volume>
,
<fpage>107</fpage>
<lpage>112</lpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">20613791</pub-id>
</mixed-citation>
</ref>
<ref id="b39">
<mixed-citation publication-type="journal">
<name>
<surname>Saeed</surname>
<given-names>I.</given-names>
</name>
,
<name>
<surname>Tang</surname>
<given-names>S.-L.</given-names>
</name>
&
<name>
<surname>Halgamuge</surname>
<given-names>S. K.</given-names>
</name>
<article-title>Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition</article-title>
.
<source>Nucleic Acid Res.</source>
<volume>40</volume>
,
<fpage>e34</fpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">22180538</pub-id>
</mixed-citation>
</ref>
<ref id="b40">
<mixed-citation publication-type="journal">
<name>
<surname>Teeling</surname>
<given-names>H.</given-names>
</name>
,
<name>
<surname>Meyerdierks</surname>
<given-names>A.</given-names>
</name>
,
<name>
<surname>Bauer</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Amann</surname>
<given-names>R.</given-names>
</name>
&
<name>
<surname>Glöckner</surname>
<given-names>F. O.</given-names>
</name>
<article-title>Application of tetranucleotide frequencies for the assignment of genomic fragments</article-title>
.
<source>Environ. Microbiol.</source>
<volume>6</volume>
,
<fpage>938</fpage>
<lpage>947</lpage>
(
<year>2004</year>
).
<pub-id pub-id-type="pmid">15305919</pub-id>
</mixed-citation>
</ref>
<ref id="b41">
<mixed-citation publication-type="journal">
<name>
<surname>Ghai</surname>
<given-names>R.</given-names>
</name>
<italic>et al.</italic>
<article-title>New abundant microbial groups in aquatic hypersaline environments</article-title>
.
<source>Sci. Rep.</source>
<volume>1</volume>
,
<fpage>135</fpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">22355652</pub-id>
</mixed-citation>
</ref>
<ref id="b42">
<mixed-citation publication-type="journal">
<name>
<surname>Pignatelli</surname>
<given-names>M.</given-names>
</name>
<italic>et al.</italic>
<article-title>Metagenomics reveals our incomplete knowledge of global diversity</article-title>
.
<source>Bioinformatics</source>
<volume>24</volume>
,
<fpage>2124</fpage>
<lpage>2125</lpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">18625611</pub-id>
</mixed-citation>
</ref>
<ref id="b43">
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
,
<name>
<surname>Rahman</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Seol</surname>
<given-names>S. Y.</given-names>
</name>
,
<name>
<surname>Yoon</surname>
<given-names>S. S.</given-names>
</name>
&
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<article-title>
<italic>Pseudomonas aeruginosa</italic>
bacteriophage PA1Ø requires type IV pili for infection and shows broad bactericidal and biofilm removal activities</article-title>
.
<source>Appl. Environ. Microbiol.</source>
<volume>78</volume>
,
<fpage>6380</fpage>
<lpage>6385</lpage>
(
<year>2012</year>
).
<pub-id pub-id-type="pmid">22752161</pub-id>
</mixed-citation>
</ref>
<ref id="b44">
<mixed-citation publication-type="journal">
<name>
<surname>Ebdon</surname>
<given-names>J.</given-names>
</name>
,
<name>
<surname>Muniesa</surname>
<given-names>M.</given-names>
</name>
&
<name>
<surname>Taylor</surname>
<given-names>H.</given-names>
</name>
<article-title>The application of a recently isolated strain of Bacteroides (GB-124) to identify human sources of faecal pollution in a temperate river catchment</article-title>
.
<source>Water Res.</source>
<volume>41</volume>
,
<fpage>3683</fpage>
<lpage>3690</lpage>
(
<year>2007</year>
).
<pub-id pub-id-type="pmid">17275065</pub-id>
</mixed-citation>
</ref>
<ref id="b45">
<mixed-citation publication-type="journal">
<name>
<surname>Gill</surname>
<given-names>S. R.</given-names>
</name>
<italic>et al.</italic>
<article-title>Metagenomic analysis of the human distal gut microbiome</article-title>
.
<source>Science</source>
<volume>312</volume>
,
<fpage>1355</fpage>
<lpage>1359</lpage>
(
<year>2006</year>
).
<pub-id pub-id-type="pmid">16741115</pub-id>
</mixed-citation>
</ref>
<ref id="b46">
<mixed-citation publication-type="journal">
<name>
<surname>Teeling</surname>
<given-names>H.</given-names>
</name>
,
<name>
<surname>Waldmann</surname>
<given-names>J.</given-names>
</name>
,
<name>
<surname>Lombardot</surname>
<given-names>T.</given-names>
</name>
,
<name>
<surname>Bauer</surname>
<given-names>M.</given-names>
</name>
&
<name>
<surname>Glöckner</surname>
<given-names>F. O.</given-names>
</name>
<article-title>TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences</article-title>
.
<source>BMC Bioinformatics</source>
<volume>5</volume>
,
<fpage>163</fpage>
(
<year>2004</year>
).
<pub-id pub-id-type="pmid">15507136</pub-id>
</mixed-citation>
</ref>
<ref id="b47">
<mixed-citation publication-type="journal">
<name>
<surname>Aziz</surname>
<given-names>R. K.</given-names>
</name>
<italic>et al.</italic>
<article-title>The RAST Server: rapid annotations using subsystems technology</article-title>
.
<source>BMC Genomics</source>
<volume>9</volume>
,
<fpage>75</fpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">18261238</pub-id>
</mixed-citation>
</ref>
<ref id="b48">
<mixed-citation publication-type="journal">
<name>
<surname>Jones</surname>
<given-names>B. V.</given-names>
</name>
,
<name>
<surname>Begley</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Hill</surname>
<given-names>C.</given-names>
</name>
,
<name>
<surname>Gahan</surname>
<given-names>C. G. M.</given-names>
</name>
&
<name>
<surname>Marchesi</surname>
<given-names>J. R.</given-names>
</name>
<article-title>Functional and comparative metagenomic analysis of bile salt hydrolase activity in the human gut microbiome</article-title>
.
<source>Proc. Natl Acad. Sci. USA</source>
<volume>105</volume>
,
<fpage>13580</fpage>
<lpage>13585</lpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">18757757</pub-id>
</mixed-citation>
</ref>
<ref id="b49">
<mixed-citation publication-type="journal">
<name>
<surname>Jones</surname>
<given-names>B. V.</given-names>
</name>
,
<name>
<surname>Sun</surname>
<given-names>F.</given-names>
</name>
&
<name>
<surname>Marchesi</surname>
<given-names>J. R.</given-names>
</name>
<article-title>Comparative metagenomic analysis of plasmid encoded functions in the human gut microbiome</article-title>
.
<source>BMC Genomics</source>
<volume>11</volume>
,
<fpage>46</fpage>
(
<year>2010</year>
).
<pub-id pub-id-type="pmid">20085629</pub-id>
</mixed-citation>
</ref>
<ref id="b50">
<mixed-citation publication-type="journal">
<name>
<surname>Felsenstein</surname>
<given-names>J.</given-names>
</name>
<source>PHYLIP (Phylogeny Inference Package) version 3.6</source>
Distributed by the author (Department of Genome Sciences, University of Washington: Seattle, USA, (
<year>2005</year>
).</mixed-citation>
</ref>
<ref id="b51">
<mixed-citation publication-type="journal">
<name>
<surname>Huson</surname>
<given-names>D. H.</given-names>
</name>
&
<name>
<surname>Scornavacca</surname>
<given-names>C.</given-names>
</name>
<article-title>Dendroscope 3: An interactive tool for rooted phylogenetic trees and networks</article-title>
.
<source>Syst. Biol.</source>
<volume>61</volume>
,
<fpage>1061</fpage>
<lpage>1067</lpage>
(
<year>2012</year>
).
<pub-id pub-id-type="pmid">22780991</pub-id>
</mixed-citation>
</ref>
<ref id="b52">
<mixed-citation publication-type="journal">
<name>
<surname>Sun</surname>
<given-names>S.</given-names>
</name>
<italic>et al.</italic>
<article-title>Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>39</volume>
,
<fpage>D546</fpage>
<lpage>D551</lpage>
(
<year>2011</year>
).
<pub-id pub-id-type="pmid">21045053</pub-id>
</mixed-citation>
</ref>
<ref id="b53">
<mixed-citation publication-type="journal">
<name>
<surname>Schirle</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Heurtier</surname>
<given-names>M.</given-names>
</name>
&
<name>
<surname>Kuster</surname>
<given-names>B.</given-names>
</name>
<article-title>Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry</article-title>
.
<source>Mol. Cell. Proteom.</source>
<volume>2</volume>
,
<fpage>1297</fpage>
<lpage>1305</lpage>
(
<year>2003</year>
).</mixed-citation>
</ref>
<ref id="b54">
<mixed-citation publication-type="journal">
<name>
<surname>Schevchenko</surname>
<given-names>A.</given-names>
</name>
,
<name>
<surname>Tomas</surname>
<given-names>H.</given-names>
</name>
,
<name>
<surname>Havli</surname>
<given-names>J.</given-names>
</name>
,
<name>
<surname>Olsen</surname>
<given-names>J. V.</given-names>
</name>
&
<name>
<surname>Mann</surname>
<given-names>M.</given-names>
</name>
<article-title>In-gel digestion for mass spectrometric characterization of proteins and proteomes</article-title>
.
<source>Nat. Protoc.</source>
<volume>1</volume>
,
<fpage>2856</fpage>
<lpage>2860</lpage>
(
<year>2007</year>
).</mixed-citation>
</ref>
<ref id="b55">
<mixed-citation publication-type="journal">
<name>
<surname>Thompson</surname>
<given-names>J. D.</given-names>
</name>
,
<name>
<surname>Higgins</surname>
<given-names>D. G.</given-names>
</name>
&
<name>
<surname>Gibson</surname>
<given-names>T. J.</given-names>
</name>
<article-title>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>22</volume>
,
<fpage>4673</fpage>
<lpage>4680</lpage>
(
<year>1994</year>
).
<pub-id pub-id-type="pmid">7984417</pub-id>
</mixed-citation>
</ref>
<ref id="b56">
<mixed-citation publication-type="journal">
<name>
<surname>Clarke</surname>
<given-names>K. R.</given-names>
</name>
&
<name>
<surname>Gorley</surname>
<given-names>R. N.</given-names>
</name>
<source>PRIMER v6: User Manual/Tutorial</source>
PRIMER-E: Plymouth, (
<year>2006</year>
).</mixed-citation>
</ref>
<ref id="b57">
<mixed-citation publication-type="journal">
<name>
<surname>Ultsch</surname>
<given-names>A.</given-names>
</name>
&
<name>
<surname>Moerchen</surname>
<given-names>F.</given-names>
</name>
<source>ESOM-Maps:tools for clustering, visualisation, and classification with Emergent ESOM, Technical Report Dept. of Mathematics and Computer Science</source>
University of Marburg: Germany,
<volume>No. 46</volume>
, (
<year>2005</year>
).</mixed-citation>
</ref>
<ref id="b58">
<mixed-citation publication-type="journal">
<name>
<surname>Jones</surname>
<given-names>B.V.</given-names>
</name>
&
<name>
<surname>Marchesi</surname>
<given-names>J. R.</given-names>
</name>
<article-title>Transposon-aided capture (TRACA) of plasmids resident in the human gut mobile metagenome</article-title>
.
<source>Nat. Methods</source>
<volume>4</volume>
,
<fpage>55</fpage>
<lpage>61</lpage>
(
<year>2007</year>
).
<pub-id pub-id-type="pmid">17128268</pub-id>
</mixed-citation>
</ref>
<ref id="b59">
<mixed-citation publication-type="journal">
<name>
<surname>Hawkins</surname>
<given-names>S. A.</given-names>
</name>
,
<name>
<surname>Layton</surname>
<given-names>A. C.</given-names>
</name>
,
<name>
<surname>Ripp</surname>
<given-names>S.</given-names>
</name>
,
<name>
<surname>Williams</surname>
<given-names>D.</given-names>
</name>
&
<name>
<surname>Sayler</surname>
<given-names>G. S.</given-names>
</name>
<article-title>Genome sequence of the
<italic>Bacteroides fragilis</italic>
phage ATCC 51477-B1</article-title>
.
<source>Virol. J.</source>
<volume>5</volume>
,
<fpage>97</fpage>
(
<year>2008</year>
).
<pub-id pub-id-type="pmid">18710568</pub-id>
</mixed-citation>
</ref>
<ref id="b60">
<mixed-citation publication-type="journal">
<name>
<surname>Puig</surname>
<given-names>M.</given-names>
</name>
,
<name>
<surname>Jofre</surname>
<given-names>J.</given-names>
</name>
&
<name>
<surname>Girones</surname>
<given-names>R.</given-names>
</name>
<article-title>Detection of phages infecting
<italic>Bacteroides fragilis</italic>
HSP40 using a specific DNA probe</article-title>
.
<source>J. Virol. Methods</source>
<volume>88</volume>
,
<fpage>163</fpage>
<lpage>173</lpage>
(
<year>2000</year>
).
<pub-id pub-id-type="pmid">10960704</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="f1">
<label>Figure 1</label>
<caption>
<title>Overview of the PGSR approach.</title>
<p>TUPs of all large fragments (10 kb or over) from 139 human gut metagenomes were calculated, and compared with those of phage genome sequences used as drivers. All metagenomic fragments producing tetranucleotide correlation values of 0.6 or over to any driver sequence were retained, and subjected to functional profiling to resolve phage and non-phage sequences captured. See
<xref ref-type="table" rid="t1">Table 1</xref>
and
<xref ref-type="supplementary-material" rid="S1">Supplementary Figs S1–S3</xref>
for details of driver sequences. See
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
for details of human gut metagenomes utilized. *Tetranucleotide usage patterns and correlations were calculated using TETRA 1.0 (ref.
<xref ref-type="bibr" rid="b46">46</xref>
).</p>
</caption>
<graphic xlink:href="ncomms3420-f1"></graphic>
</fig>
<fig id="f2">
<label>Figure 2</label>
<caption>
<title>Analysis of chromosomal contamination in PGSR phage sequences.</title>
<p>Owing to the dominance of chromosomal sequences in the metagenomic data sets analysed and the likelihood that many PGSR phage represent integrated prophage, PGSR phage were examined for the presence of terminal chromosomal regions. (
<bold>a</bold>
) Physical maps of 20 randomly selected PGSR phage sequences indicating ORFs with homologues in other phage sequences. Graphs associated with each phage sequence show % G+C across the sequence. ORF homologues in phage data sets were identified based on tBlastn searches (1e
<sup>−3</sup>
or lower) of 711 complete or partial phage genomes, and all contigs assembled from human gut viral metagenomes
<xref ref-type="bibr" rid="b11">11</xref>
. ORFs highlighted in cyan have homologues in phage genomes. ORFs highlighted in red generated no valid hits to phage sequences but encode conserved domains with phage-related functions (for example, capsid, integrase and recombination/replication). (
<bold>b</bold>
) Relative abundance of ORFs homologous to those encoded by PGSR phage and PGSR non-phage contigs, in phage sequences (711 phage genomes, PGSR phage sequences and assemblies of human gut viromes) and chromosomes (1,821 chromosomes and all PGSR non-phage) expressed as hits per Mb DNA (valid hits=minimum 35% identity over 30 aa or more, 1e
<sup>−5</sup>
or lower). ***
<italic>P</italic>
≤0.001 (
<italic>χ</italic>
<sup>2</sup>
-test). Data sets and sequences utilized are described in
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
, Supplementary Data 3–6).</p>
</caption>
<graphic xlink:href="ncomms3420-f2"></graphic>
</fig>
<fig id="f3">
<label>Figure 3</label>
<caption>
<title>Recovery of PGSR phage sequences from metagenomic data sets.</title>
<p>Commonly used alignment-driven approaches to analyse metagenomes were evaluated for their ability to identify PGSR phage sequences. The same metagenomic data sets surveyed using the PGSR approach were also subjected to a range of alignment-based searches, including gene-centric searches with unambiguous phage-encoded ORFs (capsid and terminase genes). In addition, 991 non-redundant phage contigs also identified in searches of these datasets by Stern
<italic>et al.,</italic>
using the recently developed CRISPR strategy, were compared
<xref ref-type="bibr" rid="b8">8</xref>
. Pie charts depicted show the proportion of PGSR phage sequences captured by each strategy, as well as the total proportion of PGSR phage identified by all strategies in combination (percentages shown). Blastn, Megablast, Discontiguous Megablast: show the proportions of PGSR phage captured in alignments with different blast algorithms when metagenomes were queried at the nucleotide level using whole-PGSR phage driver sequences (1e
<sup>−3</sup>
or lower considered significant and retained). tBlastn: shows proportion of PGSR phage sequences identified using gene-centric surveys of metagenomes with all capsid and terminase genes encoded by driver sequences (1e
<sup>−3</sup>
or lower considered significant). CRISPR: proportion of PGSR phage sequences identified in the 991 phage-like contigs identified by Stern
<italic>et al.</italic>
<xref ref-type="bibr" rid="b8">8</xref>
, in recent surveys of the same metagenomes using CRISPR spacer regions. All searches: shows the total proportion of PGSR phage identified in the combined output of all searches conducted above.</p>
</caption>
<graphic xlink:href="ncomms3420-f3"></graphic>
</fig>
<fig id="f4">
<label>Figure 4</label>
<caption>
<title>Inference of PGSR phage host-range.</title>
<p>PGSR sequences were compared with a wide range of bacterial chromosomes and phage genomes, using both tetranucleotide profiles and alignment-based methods (Blast). (
<bold>a</bold>
) Phylogram showing relationships between PGSR sequences, human gut-associated chromosomes (
<italic>n</italic>
=324) and all large contigs from assembled gut viral metagenomes (
<italic>n</italic>
=188, 10 kb or over), based on tetranucleotide profiles. Clusters I–IV indicate regions populated by PGSR phage and driver sequences, and associated pie charts provide the proportion of total PGSR phage sequences in each cluster, designated by black segments. NT (nucleotide): shows genus-level taxonomic assignments for PGSR phage in each cluster based on Blastn searches, and figures in parentheses show total number of PGSR phage affiliated with each genus (≥75% identity, 1e
<sup>−5</sup>
or lower, alignment length of 1 kb or more). ORF: shows genus-level taxonomic assignments for PGSR phage in each cluster based on tBlastn alignments of individual PGSR phage ORFs with 1,700 complete bacterial chromsomes (≥75% identity, 1e
<sup>−5</sup>
or lower). Figures in parentheses show total number of PGSR phage ORFs affiliated with each genus listed. (
<bold>b</bold>
) Phylogram showing relationships between PGSR phage sequences, large fragments from gut viral metagenomes, and complete phage genomes (
<italic>n</italic>
=647 genomes, 10 kb or over), based on tetranucleotide profiles. For phage genome sequences assigned phylogeny reflects that of host species where known. Scale bars for parts
<bold>a</bold>
and
<bold>b</bold>
show distance in arbitrary units, and all phylograms represent the most probable topologies based on 200 bootstrap replicates. (
<bold>c</bold>
) Total proportion of PGSR sequences and viral metagenome contigs represented in part
<bold>a</bold>
affiliated to phylum-level taxonomic groups based on alignments against 1,821 bacterial and archaeal chromsomes. Nucleotide: shows the proportion of sequences affiliated to each phylum based on valid Blastn hits (minimum 75% identity over 1 kb or more, 1e
<sup>−5</sup>
or lower). Amino acid: shows affiliation of all putative protein encoding genes from each data set based on tBlastn searches (minimum 75% identity or over, 1e
<sup>−5</sup>
or lower). See also
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
. The source and further details of sequences used in the analyses presented in
<bold>a</bold>
<bold>c</bold>
is provided in
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 3–6</xref>
.</p>
</caption>
<graphic xlink:href="ncomms3420-f4"></graphic>
</fig>
<fig id="f5">
<label>Figure 5</label>
<caption>
<title>PGSR phage representation in human gut viral metagenomes.</title>
<p>The representation of PGSR phage sequences in existing gut viral metagenomes, as well as viral and chromosomal metagenomes from other habitats, was assessed and compared with other phage sequence sets. (
<bold>a</bold>
) Representation of phage sequence sets in human gut viral metagenomes
<xref ref-type="bibr" rid="b11">11</xref>
. Individual pyrosequencing reads were mapped to respective phage sequence sets with high stringency (a minimum of 90% identity over 90% of the read). The number of reads mapped was normalized for size of reference data sets (expressed as reads mapped/Mb reference sequence). (
<bold>b</bold>
) Heat map showing relative representation of PGSR phage and other phage sequence sets in viromes from gut and non-gut habitats. Reads from each virome were mapped to reference phage sequence sets as for part
<bold>a</bold>
, but using low stringency criteria (minimum 70% identity over 25% of the read). The percentage of reads mapped was normalized for size of reference data sets (expressed as % reads mapped/Mb reference sequence). (
<bold>c</bold>
) Proportion of phage with homology to sequences in standard metagenomes and virome assemblies, derived from gut and non-gut habitats. Phage sequences from each collection were used to search metagenomic data sets with Blastn, and valid hits (minimum 75% identity over 100 nt or more, 1e
<sup>−5</sup>
or lower) were used to assign each sequence to one of five categories. GT (gut): phage sequences producing valid hits only in gut data sets; NG (non-gut): phage sequences producing valid hits only in non-gut data sets; GAH (gut-associated high): phage sequences producing valid hits in both gut and non-gut data sets, but with the majority derived from gut metagenomes. GAL (gut-associated low): phage sequences generating valid hits in both gut and non-gut data sets, but with the majority originating from non-gut metagenomes; UNCLASS: sequences producing no valid hits in any metagenome examined. Gut vir >500 bp—all contigs from human gut virome assemblies over 500 bp in length; Gut vir bact assoc.—all contigs from human gut virome assemblies affiliated with
<italic>Bacteroidales</italic>
driver sequences based on PGSR search criteria (as used to identify PGSR phage sequences in gut metagenomes); PGSR phage—all 85
<italic>Bacteroidales</italic>
-like PGSR sequences classified as phage; marine phage—99 phage genome sequences from marine phage; NCBI phage—612 complete phage genomes available from the NCBI phage refseq collection. **
<italic>P</italic>
≤0.01 (
<italic>χ</italic>
<sup>2</sup>
-test). Details of viromes, metagenomes and phage genomes utilized are provided in
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 3–6</xref>
.</p>
</caption>
<graphic xlink:href="ncomms3420-f5"></graphic>
</fig>
<fig id="f6">
<label>Figure 6</label>
<caption>
<title>Functional profiles of PGSR sequences.</title>
<p>The functional profiles of PGSR phage and non-phage sequences were compared with those found in phage genomes (
<italic>n</italic>
=711), gut virome fragments (all contigs assembled from 12 individual gut viromes
<xref ref-type="bibr" rid="b11">11</xref>
), and 70 chromosomes from gut-associated
<italic>Bacteroidales</italic>
species (See
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 3–6</xref>
for source and details of sequence data). Amino-acid sequences from all predicted ORFs in each data set were used to search the COG
<xref ref-type="bibr" rid="b26">26</xref>
database, the CDD
<xref ref-type="bibr" rid="b25">25</xref>
, and the ACLAME database
<xref ref-type="bibr" rid="b27">27</xref>
. The proportion of assignable ORFs affiliated to distinct categories in each database is displayed in horizontal bars, and associated pie charts show the total proportion of ORFs in each sequence set generating valid hits in database searches (black segments). (
<bold>a</bold>
) Results from searches of the COG database, showing proportions of ORFs assignable to COG classes. (
<bold>b</bold>
) Results for searches of the CDD, showing proportions of ORFs encoding conserved domain architectures related to phage and non-phage associated functions. (
<bold>c</bold>
) Results from searches of the ACLAME database, showing proportions of ORFs generating valid hits to genes encoded by distinct types of mobile genetic element represented in the database (plasmid, virus and prophage). All phage shows combined results from PGSR-phage, NCBI phage, Marine phage and Gut virome fragments. All non-phage shows combined results from PGSR non-phage and
<italic>Bacteroidales</italic>
chromosomes. Stars highlight the position of PGSR phage and non-phage sequences in charts.</p>
</caption>
<graphic xlink:href="ncomms3420-f6"></graphic>
</fig>
<fig id="f7">
<label>Figure 7</label>
<caption>
<title>Representation of PGSR phage sequences in the human gut metaproteome.</title>
<p>To further explore the functional profile of PGSR
<italic>Bacteroidales</italic>
-like phage, and their contribution to the human gut metaproteome, a shotgun metaproteome was generated from a human faecal microbiome and the resulting 177,729 mass spectra used to search custom databases of all putative proteins encoded PGSR phage, PGSR non-phage and VLP-derived contigs from human gut viral metagenomes
<xref ref-type="bibr" rid="b11">11</xref>
. (
<bold>a</bold>
) Shows relative hit rates in the gut metaproteome, for amino-acid sequences originating in each data set used to query mass spectra (PGSR phage, PGSR non-phage, VLP-derived gut virome). Relative hit rates were calculated by normalizing the number of proteins from each data set detected in the gut metaproteome by the total number of ORFs in parental data sets (expressed as hits per total number of predicted proteins in each data set). Symbols above bars indicate statistically significant differences in relative hit rate with the data set of corresponding symbol colour (**
<italic>P</italic>
=0.01 or lower; ***
<italic>P</italic>
=0.001 or lower;
<italic>χ</italic>
<sup>2</sup>
-test). Putative functions of identified proteins were based on COG searches (1e
<sup>−2</sup>
or lower;
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S3</xref>
). (
<bold>b</bold>
) Heat map shows relative abundance of sequences homologous to those detected in the gut metaproteome, within a broad cross section of bacterial and archaeal chromosomal sequences (
<italic>n</italic>
=1,821, PGSR non-phage), and phage sequences (711 phage genomes, PGSR phage sequences and assemblies of human gut viromes), expressed as hits per Mb DNA
<xref ref-type="bibr" rid="b48">48</xref>
<xref ref-type="bibr" rid="b49">49</xref>
(valid hits=minimum 35% identity over 30 aa or more, 1e
<sup>−5</sup>
or lower). See
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
,
<xref ref-type="supplementary-material" rid="S1">Supplementary Data 3–6</xref>
for sources and details of sequences used.</p>
</caption>
<graphic xlink:href="ncomms3420-f7"></graphic>
</fig>
<fig id="f8">
<label>Figure 8</label>
<caption>
<title>Inter-individual variation of
<italic>Bacteroidales</italic>
-like viral-enterotypes.</title>
<p>Inter-individual variation in carriage of PGSR phage and related sequences was assessed by calculating relative abundance of sequences with homology to PGSR phage in individual gut metagenomes (minimum 80% identity over 50% of subject sequence, 1e
<sup>−5</sup>
or lower). (
<bold>a</bold>
,
<bold>b</bold>
) Heat maps illustrating relative abundance of PGSR phage sequences in human gut metagenomes. Columns represent individual metagenomes and rows represent PGSR phage sequences. Intensity of shading in each cell indicates relative abundance of sequences homologous to each PGSR phage sequence, in each individual metagenome (hits per Mb). Associated histograms show average relative abundance of homologues to each PGSR phage sequence across all individuals (left histogram), average relative abundance of all PGSR phage homologues per individual (top histogram), and incidence of sequences homologous to each PGSR phage sequence as a % of positive metagenomes (Right histogram). Map
<bold>a</bold>
shows results ranked by average relative abundance across all PGSR phage and individuals. Map
<bold>b</bold>
shows results of heuristic hierarchical grouping of individuals based on phage relative abundance profiles into ‘viral-enterotypes’ A, B, C, D or unclassified (UC). The most broadly distributed PGSR phage (with an incidence of 40% or over), shown in the lower segment of this heat map, were not utilized for heuristic ranking. (
<bold>c</bold>
) The validity of putative viral-enterotypes was tested by ordination of individual relative abundance profiles using unsupervised non-metric MDS. Points represent individual gut metagenomes, and colours correspond to viral-enterotypes assigned in heat map
<bold>b</bold>
. (
<bold>d</bold>
) Shows values for the ANOSIM
<italic>R</italic>
statistic obtained from comparisons of groupings obtained in MDS plots (part
<bold>c</bold>
), which indicates increasing separation of groups as values approach 1. *** Denotes significant separation between groups (
<italic>P</italic>
=0.002). The sources of human gut metagenomes used in these analyses are provided in
<xref ref-type="supplementary-material" rid="S1">Supplementary Table S1</xref>
.</p>
</caption>
<graphic xlink:href="ncomms3420-f8"></graphic>
</fig>
<table-wrap position="float" id="t1">
<label>Table 1</label>
<caption>
<title>Origin and phylogeny of driver sequences used in PGSR-based analysis of human gut metagenomes.</title>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="center"></col>
</colgroup>
<thead valign="bottom">
<tr>
<th align="left" valign="top" charoff="50">
<bold>Driver sequence name</bold>
<xref ref-type="fn" rid="t1-fn2">*</xref>
</th>
<th align="left" valign="top" charoff="50">
<bold>Host</bold>
</th>
<th align="left" valign="top" charoff="50">
<bold>Comments/source</bold>
</th>
<th align="center" valign="top" charoff="50">
<bold>Citations</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="top" charoff="50">Phage B124–14 (accession no: HE608841)</td>
<td align="left" valign="top" charoff="50">
<italic>Bacteroides fragilis</italic>
GB-124 and closely related strains</td>
<td align="left" valign="top" charoff="50">Indicated as human gut specific</td>
<td align="center" valign="top" charoff="50">
<xref ref-type="bibr" rid="b13">13</xref>
<xref ref-type="bibr" rid="b44">44</xref>
</td>
</tr>
<tr>
<td align="left" valign="top" charoff="50">Phage B40–8 (accession no: FJ008913.1)</td>
<td align="left" valign="top" charoff="50">
<italic>Bacteroides fragilis</italic>
HSP40</td>
<td align="left" valign="top" charoff="50">Indicated as human gut specific</td>
<td align="center" valign="top" charoff="50">
<xref ref-type="bibr" rid="b59">59</xref>
<xref ref-type="bibr" rid="b60">60</xref>
</td>
</tr>
<tr>
<td align="left" valign="top" charoff="50">F2-X000044</td>
<td align="left" valign="top" charoff="50">Unconfirmed—predicted
<italic>Bacteroides</italic>
.Closely related to B124–14 and B40–8 by:Large subunit terminase gene phylogeny (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S1</xref>
)Tetranucleotide profile (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S2</xref>
)Gene architecture (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S3</xref>
)</td>
<td align="left" valign="top" charoff="50">Recovered from Japanese human gut metagenomes by terminase gene homology</td>
<td align="center" valign="top" charoff="50">
<xref ref-type="bibr" rid="b13">13</xref>
<xref ref-type="bibr" rid="b28">28</xref>
</td>
</tr>
<tr>
<td align="left" valign="top" charoff="50">Scaffold19676_1_MH0058Scaffold70287_3_V1.UC-8Scaffold89938_1_MH0059</td>
<td align="left" valign="top" charoff="50">Unconfirmed—predicted
<italic>Bacteroides</italic>
.Closely related to B124–14 and B40–8 by:Large subunit terminase gene phylogeny (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S1</xref>
)Tetranucleotide profile (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S2</xref>
)Gene architecture (
<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. S3</xref>
)</td>
<td align="left" valign="top" charoff="50">Recovered from MetaHIT human gut metagenomes by terminase gene homology</td>
<td align="center" valign="top" charoff="50">
<xref ref-type="bibr" rid="b13">13</xref>
<xref ref-type="bibr" rid="b21">21</xref>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="t1-fn1">
<p>PGSR, phage genome signature-based recovery.</p>
</fn>
<fn id="t1-fn2">
<p>
<sup>*</sup>
For driver sequences recovered from human gut metagenomes in previous analyses
<xref ref-type="bibr" rid="b13">13</xref>
, nomenclature relates directly to sequence/contig designation within metagenomes of origin. See
<xref ref-type="supplementary-material" rid="S1">Supplementary Figs S1–S3</xref>
for further information on driver sequences.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000030  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000030  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024