Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

VirSorter: mining viral signal from microbial genomic data

Identifieur interne : 000089 ( Pmc/Corpus ); précédent : 000088; suivant : 000090

VirSorter: mining viral signal from microbial genomic data

Auteurs : Simon Roux ; Francois Enault ; Bonnie L. Hurwitz ; Matthew B. Sullivan

Source :

RBID : PMC:4451026

Abstract

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.


Url:
DOI: 10.7717/peerj.985
PubMed: 26038737
PubMed Central: 4451026

Links to Exploration step

PMC:4451026

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">VirSorter: mining viral signal from microbial genomic data</title>
<author>
<name sortKey="Roux, Simon" sort="Roux, Simon" uniqKey="Roux S" first="Simon" last="Roux">Simon Roux</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Ecology and Evolutionary Biology, University of Arizona</institution>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Enault, Francois" sort="Enault, Francois" uniqKey="Enault F" first="Francois" last="Enault">Francois Enault</name>
<affiliation>
<nlm:aff id="aff-2">
<institution>Clermont Université, Université Blaise Pascal, Laboratoire “Microorganismes: Génome et Environnement,”</institution>
<addr-line>Clermont-Ferrand</addr-line>
,
<country>France</country>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff-3">
<institution>CNRS UMR 6023, LMGE</institution>
,
<addr-line>Aubière</addr-line>
,
<country>France</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hurwitz, Bonnie L" sort="Hurwitz, Bonnie L" uniqKey="Hurwitz B" first="Bonnie L." last="Hurwitz">Bonnie L. Hurwitz</name>
<affiliation>
<nlm:aff id="aff-4">
<institution>Department of Agricultural and Biosystems Engineering, University of Arizona</institution>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sullivan, Matthew B" sort="Sullivan, Matthew B" uniqKey="Sullivan M" first="Matthew B." last="Sullivan">Matthew B. Sullivan</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Ecology and Evolutionary Biology, University of Arizona</institution>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26038737</idno>
<idno type="pmc">4451026</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4451026</idno>
<idno type="RBID">PMC:4451026</idno>
<idno type="doi">10.7717/peerj.985</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000089</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">VirSorter: mining viral signal from microbial genomic data</title>
<author>
<name sortKey="Roux, Simon" sort="Roux, Simon" uniqKey="Roux S" first="Simon" last="Roux">Simon Roux</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Ecology and Evolutionary Biology, University of Arizona</institution>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Enault, Francois" sort="Enault, Francois" uniqKey="Enault F" first="Francois" last="Enault">Francois Enault</name>
<affiliation>
<nlm:aff id="aff-2">
<institution>Clermont Université, Université Blaise Pascal, Laboratoire “Microorganismes: Génome et Environnement,”</institution>
<addr-line>Clermont-Ferrand</addr-line>
,
<country>France</country>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff-3">
<institution>CNRS UMR 6023, LMGE</institution>
,
<addr-line>Aubière</addr-line>
,
<country>France</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hurwitz, Bonnie L" sort="Hurwitz, Bonnie L" uniqKey="Hurwitz B" first="Bonnie L." last="Hurwitz">Bonnie L. Hurwitz</name>
<affiliation>
<nlm:aff id="aff-4">
<institution>Department of Agricultural and Biosystems Engineering, University of Arizona</institution>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sullivan, Matthew B" sort="Sullivan, Matthew B" uniqKey="Sullivan M" first="Matthew B." last="Sullivan">Matthew B. Sullivan</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Ecology and Evolutionary Biology, University of Arizona</institution>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PeerJ</title>
<idno type="eISSN">2167-8359</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Akhter, S" uniqKey="Akhter S">S Akhter</name>
</author>
<author>
<name sortKey="Aziz, Rk" uniqKey="Aziz R">RK Aziz</name>
</author>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Albertsen, M" uniqKey="Albertsen M">M Albertsen</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Skarshewski, A" uniqKey="Skarshewski A">A Skarshewski</name>
</author>
<author>
<name sortKey="Nielsen, Kl" uniqKey="Nielsen K">KL Nielsen</name>
</author>
<author>
<name sortKey="Tyson, Gw" uniqKey="Tyson G">GW Tyson</name>
</author>
<author>
<name sortKey="Nielsen, Ph" uniqKey="Nielsen P">PH Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author>
<name sortKey="Sch Ffer, Aa" uniqKey="Sch Ffer A">AA Schäffer</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Anantharaman, K" uniqKey="Anantharaman K">K Anantharaman</name>
</author>
<author>
<name sortKey="Duhaime, Mb" uniqKey="Duhaime M">MB Duhaime</name>
</author>
<author>
<name sortKey="Breier, Ja" uniqKey="Breier J">JA Breier</name>
</author>
<author>
<name sortKey="Wendt, K" uniqKey="Wendt K">K Wendt</name>
</author>
<author>
<name sortKey="Toner, Bm" uniqKey="Toner B">BM Toner</name>
</author>
<author>
<name sortKey="Dick, Gj" uniqKey="Dick G">GJ Dick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boyd, Ef" uniqKey="Boyd E">EF Boyd</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M Breitbart</name>
</author>
<author>
<name sortKey="Thompson, Lr" uniqKey="Thompson L">LR Thompson</name>
</author>
<author>
<name sortKey="Suttle, Ca" uniqKey="Suttle C">CA Suttle</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M Breitbart</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brum, Jr" uniqKey="Brum J">JR Brum</name>
</author>
<author>
<name sortKey="Ignacio Espinoza, Jc" uniqKey="Ignacio Espinoza J">JC Ignacio-Espinoza</name>
</author>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Doulcier, G" uniqKey="Doulcier G">G Doulcier</name>
</author>
<author>
<name sortKey="Acinas, Sg" uniqKey="Acinas S">SG Acinas</name>
</author>
<author>
<name sortKey="Alberti, A" uniqKey="Alberti A">A Alberti</name>
</author>
<author>
<name sortKey="Chaffron, S" uniqKey="Chaffron S">S Chaffron</name>
</author>
<author>
<name sortKey="Coppola, L" uniqKey="Coppola L">L Coppola</name>
</author>
<author>
<name sortKey="Cruaud, C" uniqKey="Cruaud C">C Cruaud</name>
</author>
<author>
<name sortKey="De Vargas, C" uniqKey="De Vargas C">C de Vargas</name>
</author>
<author>
<name sortKey="Gasol, Jm" uniqKey="Gasol J">JM Gasol</name>
</author>
<author>
<name sortKey="Gorsky, G" uniqKey="Gorsky G">G Gorsky</name>
</author>
<author>
<name sortKey="Gregory, Ac" uniqKey="Gregory A">AC Gregory</name>
</author>
<author>
<name sortKey="Guidi, L" uniqKey="Guidi L">L Guidi</name>
</author>
<author>
<name sortKey="Hingamp, P" uniqKey="Hingamp P">P Hingamp</name>
</author>
<author>
<name sortKey="Iudicone, D" uniqKey="Iudicone D">D Iudicone</name>
</author>
<author>
<name sortKey="Not, F" uniqKey="Not F">F Not</name>
</author>
<author>
<name sortKey="Ogata, H" uniqKey="Ogata H">H Ogata</name>
</author>
<author>
<name sortKey="Pesant, S" uniqKey="Pesant S">S Pesant</name>
</author>
<author>
<name sortKey="Poulos, Bt" uniqKey="Poulos B">BT Poulos</name>
</author>
<author>
<name sortKey="Schwenck, Sm" uniqKey="Schwenck S">SM Schwenck</name>
</author>
<author>
<name sortKey="Speich, S" uniqKey="Speich S">S Speich</name>
</author>
<author>
<name sortKey="Dimier, C" uniqKey="Dimier C">C Dimier</name>
</author>
<author>
<name sortKey="Picheral, M" uniqKey="Picheral M">M Picheral</name>
</author>
<author>
<name sortKey="Searson, S" uniqKey="Searson S">S Searson</name>
</author>
<author>
<name sortKey="Kandels Lewis, S" uniqKey="Kandels Lewis S">S Kandels-Lewis</name>
</author>
<author>
<name sortKey="Coordinators, To" uniqKey="Coordinators T">TO Coordinators</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author>
<name sortKey="Bowler, C" uniqKey="Bowler C">C Bowler</name>
</author>
<author>
<name sortKey="Karsenti, E" uniqKey="Karsenti E">E Karsenti</name>
</author>
<author>
<name sortKey="Sunagawa, S" uniqKey="Sunagawa S">S Sunagawa</name>
</author>
<author>
<name sortKey="Wincker, P" uniqKey="Wincker P">P Wincker</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brum, Jr" uniqKey="Brum J">JR Brum</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Busby, B" uniqKey="Busby B">B Busby</name>
</author>
<author>
<name sortKey="Kristensen, Dm" uniqKey="Kristensen D">DM Kristensen</name>
</author>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bush, K" uniqKey="Bush K">K Bush</name>
</author>
<author>
<name sortKey="Courvalin, P" uniqKey="Courvalin P">P Courvalin</name>
</author>
<author>
<name sortKey="Dantas, G" uniqKey="Dantas G">G Dantas</name>
</author>
<author>
<name sortKey="Davies, J" uniqKey="Davies J">J Davies</name>
</author>
<author>
<name sortKey="Eisenstein, B" uniqKey="Eisenstein B">B Eisenstein</name>
</author>
<author>
<name sortKey="Huovinen, P" uniqKey="Huovinen P">P Huovinen</name>
</author>
<author>
<name sortKey="Jacoby, Ga" uniqKey="Jacoby G">GA Jacoby</name>
</author>
<author>
<name sortKey="Kishony, R" uniqKey="Kishony R">R Kishony</name>
</author>
<author>
<name sortKey="Kreiswirth, Bn" uniqKey="Kreiswirth B">BN Kreiswirth</name>
</author>
<author>
<name sortKey="Kutter, E" uniqKey="Kutter E">E Kutter</name>
</author>
<author>
<name sortKey="Lerner, Sa" uniqKey="Lerner S">SA Lerner</name>
</author>
<author>
<name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author>
<name sortKey="Lewis, K" uniqKey="Lewis K">K Lewis</name>
</author>
<author>
<name sortKey="Lomovskaya, O" uniqKey="Lomovskaya O">O Lomovskaya</name>
</author>
<author>
<name sortKey="Miller, Jh" uniqKey="Miller J">JH Miller</name>
</author>
<author>
<name sortKey="Mobashery, S" uniqKey="Mobashery S">S Mobashery</name>
</author>
<author>
<name sortKey="Piddock, Ljv" uniqKey="Piddock L">LJV Piddock</name>
</author>
<author>
<name sortKey="Projan, S" uniqKey="Projan S">S Projan</name>
</author>
<author>
<name sortKey="Thomas, Cm" uniqKey="Thomas C">CM Thomas</name>
</author>
<author>
<name sortKey="Tomasz, A" uniqKey="Tomasz A">A Tomasz</name>
</author>
<author>
<name sortKey="Tulkens, Pm" uniqKey="Tulkens P">PM Tulkens</name>
</author>
<author>
<name sortKey="Walsh, Tr" uniqKey="Walsh T">TR Walsh</name>
</author>
<author>
<name sortKey="Watson, Jd" uniqKey="Watson J">JD Watson</name>
</author>
<author>
<name sortKey="Witkowski, J" uniqKey="Witkowski J">J Witkowski</name>
</author>
<author>
<name sortKey="Witte, W" uniqKey="Witte W">W Witte</name>
</author>
<author>
<name sortKey="Wright, G" uniqKey="Wright G">G Wright</name>
</author>
<author>
<name sortKey="Yeh, P" uniqKey="Yeh P">P Yeh</name>
</author>
<author>
<name sortKey="Zgurskaya, Hi" uniqKey="Zgurskaya H">HI Zgurskaya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Canchaya, C" uniqKey="Canchaya C">C Canchaya</name>
</author>
<author>
<name sortKey="Fournous, G" uniqKey="Fournous G">G Fournous</name>
</author>
<author>
<name sortKey="Brussow, H" uniqKey="Brussow H">H Brüssow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Casjens, S" uniqKey="Casjens S">S Casjens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Phillippy, Am" uniqKey="Phillippy A">AM Phillippy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Emerson, Jb" uniqKey="Emerson J">JB Emerson</name>
</author>
<author>
<name sortKey="Thomas, Bc" uniqKey="Thomas B">BC Thomas</name>
</author>
<author>
<name sortKey="Andrade, K" uniqKey="Andrade K">K Andrade</name>
</author>
<author>
<name sortKey="Allen, Ee" uniqKey="Allen E">EE Allen</name>
</author>
<author>
<name sortKey="Heidelberg, Kb" uniqKey="Heidelberg K">KB Heidelberg</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Enright, Aj" uniqKey="Enright A">AJ Enright</name>
</author>
<author>
<name sortKey="Van Dongen, S" uniqKey="Van Dongen S">S Van Dongen</name>
</author>
<author>
<name sortKey="Ouzounis, Ca" uniqKey="Ouzounis C">CA Ouzounis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fouts, De" uniqKey="Fouts D">DE Fouts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fuhrman, Ja" uniqKey="Fuhrman J">JA Fuhrman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
<author>
<name sortKey="Vaughn, M" uniqKey="Vaughn M">M Vaughn</name>
</author>
<author>
<name sortKey="Mckay, S" uniqKey="Mckay S">S McKay</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Stapleton, Ae" uniqKey="Stapleton A">AE Stapleton</name>
</author>
<author>
<name sortKey="Gessler, D" uniqKey="Gessler D">D Gessler</name>
</author>
<author>
<name sortKey="Matasci, N" uniqKey="Matasci N">N Matasci</name>
</author>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L Wang</name>
</author>
<author>
<name sortKey="Hanlon, M" uniqKey="Hanlon M">M Hanlon</name>
</author>
<author>
<name sortKey="Lenards, A" uniqKey="Lenards A">A Lenards</name>
</author>
<author>
<name sortKey="Muir, A" uniqKey="Muir A">A Muir</name>
</author>
<author>
<name sortKey="Merchant, N" uniqKey="Merchant N">N Merchant</name>
</author>
<author>
<name sortKey="Lowry, S" uniqKey="Lowry S">S Lowry</name>
</author>
<author>
<name sortKey="Mock, S" uniqKey="Mock S">S Mock</name>
</author>
<author>
<name sortKey="Helmke, M" uniqKey="Helmke M">M Helmke</name>
</author>
<author>
<name sortKey="Kubach, A" uniqKey="Kubach A">A Kubach</name>
</author>
<author>
<name sortKey="Narro, M" uniqKey="Narro M">M Narro</name>
</author>
<author>
<name sortKey="Hopkins, N" uniqKey="Hopkins N">N Hopkins</name>
</author>
<author>
<name sortKey="Micklos, D" uniqKey="Micklos D">D Micklos</name>
</author>
<author>
<name sortKey="Hilgert, U" uniqKey="Hilgert U">U Hilgert</name>
</author>
<author>
<name sortKey="Gonzales, M" uniqKey="Gonzales M">M Gonzales</name>
</author>
<author>
<name sortKey="Jordan, C" uniqKey="Jordan C">C Jordan</name>
</author>
<author>
<name sortKey="Skidmore, E" uniqKey="Skidmore E">E Skidmore</name>
</author>
<author>
<name sortKey="Dooley, R" uniqKey="Dooley R">R Dooley</name>
</author>
<author>
<name sortKey="Cazes, J" uniqKey="Cazes J">J Cazes</name>
</author>
<author>
<name sortKey="Mclay, R" uniqKey="Mclay R">R McLay</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author>
<name sortKey="Pasternak, S" uniqKey="Pasternak S">S Pasternak</name>
</author>
<author>
<name sortKey="Koesterke, L" uniqKey="Koesterke L">L Koesterke</name>
</author>
<author>
<name sortKey="Piel, Wh" uniqKey="Piel W">WH Piel</name>
</author>
<author>
<name sortKey="Grene, R" uniqKey="Grene R">R Grene</name>
</author>
<author>
<name sortKey="Noutsos, C" uniqKey="Noutsos C">C Noutsos</name>
</author>
<author>
<name sortKey="Gendler, K" uniqKey="Gendler K">K Gendler</name>
</author>
<author>
<name sortKey="Feng, X" uniqKey="Feng X">X Feng</name>
</author>
<author>
<name sortKey="Tang, C" uniqKey="Tang C">C Tang</name>
</author>
<author>
<name sortKey="Lent, M" uniqKey="Lent M">M Lent</name>
</author>
<author>
<name sortKey="Kim, S J" uniqKey="Kim S">S-J Kim</name>
</author>
<author>
<name sortKey="Kvilekval, K" uniqKey="Kvilekval K">K Kvilekval</name>
</author>
<author>
<name sortKey="Manjunath, Bs" uniqKey="Manjunath B">BS Manjunath</name>
</author>
<author>
<name sortKey="Tannen, V" uniqKey="Tannen V">V Tannen</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Sanderson, M" uniqKey="Sanderson M">M Sanderson</name>
</author>
<author>
<name sortKey="Welch, Sm" uniqKey="Welch S">SM Welch</name>
</author>
<author>
<name sortKey="Cranston, Ka" uniqKey="Cranston K">KA Cranston</name>
</author>
<author>
<name sortKey="Soltis, P" uniqKey="Soltis P">P Soltis</name>
</author>
<author>
<name sortKey="Soltis, D" uniqKey="Soltis D">D Soltis</name>
</author>
<author>
<name sortKey="O Eara, B" uniqKey="O Eara B">B O’Meara</name>
</author>
<author>
<name sortKey="Ane, C" uniqKey="Ane C">C Ane</name>
</author>
<author>
<name sortKey="Brutnell, T" uniqKey="Brutnell T">T Brutnell</name>
</author>
<author>
<name sortKey="Kleibenstein, Dj" uniqKey="Kleibenstein D">DJ Kleibenstein</name>
</author>
<author>
<name sortKey="White, Jw" uniqKey="White J">JW White</name>
</author>
<author>
<name sortKey="Leebens Mack, J" uniqKey="Leebens Mack J">J Leebens-Mack</name>
</author>
<author>
<name sortKey="Donoghue, Mj" uniqKey="Donoghue M">MJ Donoghue</name>
</author>
<author>
<name sortKey="Spalding, Ep" uniqKey="Spalding E">EP Spalding</name>
</author>
<author>
<name sortKey="Vision, Tj" uniqKey="Vision T">TJ Vision</name>
</author>
<author>
<name sortKey="Myers, Cr" uniqKey="Myers C">CR Myers</name>
</author>
<author>
<name sortKey="Lowenthal, D" uniqKey="Lowenthal D">D Lowenthal</name>
</author>
<author>
<name sortKey="Enquist, Bj" uniqKey="Enquist B">BJ Enquist</name>
</author>
<author>
<name sortKey="Boyle, B" uniqKey="Boyle B">B Boyle</name>
</author>
<author>
<name sortKey="Akoglu, A" uniqKey="Akoglu A">A Akoglu</name>
</author>
<author>
<name sortKey="Andrews, G" uniqKey="Andrews G">G Andrews</name>
</author>
<author>
<name sortKey="Ram, S" uniqKey="Ram S">S Ram</name>
</author>
<author>
<name sortKey="Ware, D" uniqKey="Ware D">D Ware</name>
</author>
<author>
<name sortKey="Stein, L" uniqKey="Stein L">L Stein</name>
</author>
<author>
<name sortKey="Stanzione, D" uniqKey="Stanzione D">D Stanzione</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hurwitz, Bl" uniqKey="Hurwitz B">BL Hurwitz</name>
</author>
<author>
<name sortKey="Brum, Jr" uniqKey="Brum J">JR Brum</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hurwitz, Bl" uniqKey="Hurwitz B">BL Hurwitz</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jia, B" uniqKey="Jia B">B Jia</name>
</author>
<author>
<name sortKey="Xuan, L" uniqKey="Xuan L">L Xuan</name>
</author>
<author>
<name sortKey="Cai, K" uniqKey="Cai K">K Cai</name>
</author>
<author>
<name sortKey="Hu, Z" uniqKey="Hu Z">Z Hu</name>
</author>
<author>
<name sortKey="Ma, L" uniqKey="Ma L">L Ma</name>
</author>
<author>
<name sortKey="Wei, C" uniqKey="Wei C">C Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kamke, J" uniqKey="Kamke J">J Kamke</name>
</author>
<author>
<name sortKey="Sczyrba, A" uniqKey="Sczyrba A">A Sczyrba</name>
</author>
<author>
<name sortKey="Ivanova, N" uniqKey="Ivanova N">N Ivanova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kashtan, N" uniqKey="Kashtan N">N Kashtan</name>
</author>
<author>
<name sortKey="Roggensack, Se" uniqKey="Roggensack S">SE Roggensack</name>
</author>
<author>
<name sortKey="Rodrigue, S" uniqKey="Rodrigue S">S Rodrigue</name>
</author>
<author>
<name sortKey="Thompson, Jw" uniqKey="Thompson J">JW Thompson</name>
</author>
<author>
<name sortKey="Biller, Sj" uniqKey="Biller S">SJ Biller</name>
</author>
<author>
<name sortKey="Coe, A" uniqKey="Coe A">A Coe</name>
</author>
<author>
<name sortKey="Ding, H" uniqKey="Ding H">H Ding</name>
</author>
<author>
<name sortKey="Marttinen, P" uniqKey="Marttinen P">P Marttinen</name>
</author>
<author>
<name sortKey="Malmstrom, Rr" uniqKey="Malmstrom R">RR Malmstrom</name>
</author>
<author>
<name sortKey="Stocker, R" uniqKey="Stocker R">R Stocker</name>
</author>
<author>
<name sortKey="Follows, Mj" uniqKey="Follows M">MJ Follows</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Chisholm, Sw" uniqKey="Chisholm S">SW Chisholm</name>
</author>
<author>
<name sortKey="Biller, J" uniqKey="Biller J">J Biller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
<author>
<name sortKey="Senkevich, Tg" uniqKey="Senkevich T">TG Senkevich</name>
</author>
<author>
<name sortKey="Dolja, Vv" uniqKey="Dolja V">VV Dolja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Labonte, Jm" uniqKey="Labonte J">JM Labonté</name>
</author>
<author>
<name sortKey="Swan, Bk" uniqKey="Swan B">BK Swan</name>
</author>
<author>
<name sortKey="Poulos, Bt" uniqKey="Poulos B">BT Poulos</name>
</author>
<author>
<name sortKey="Luo, H" uniqKey="Luo H">H Luo</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Wommack, Ek" uniqKey="Wommack E">EK Wommack</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Letarov, A" uniqKey="Letarov A">A Letarov</name>
</author>
<author>
<name sortKey="Kulikov, E" uniqKey="Kulikov E">E Kulikov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lima Mendez, G" uniqKey="Lima Mendez G">G Lima-Mendez</name>
</author>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J Van Helden</name>
</author>
<author>
<name sortKey="Toussaint, A" uniqKey="Toussaint A">A Toussaint</name>
</author>
<author>
<name sortKey="Leplae, R" uniqKey="Leplae R">R Leplae</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lindell, D" uniqKey="Lindell D">D Lindell</name>
</author>
<author>
<name sortKey="Jaffe, Jd" uniqKey="Jaffe J">JD Jaffe</name>
</author>
<author>
<name sortKey="Johnson, Zi" uniqKey="Johnson Z">ZI Johnson</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
<author>
<name sortKey="Chisholm, Sw" uniqKey="Chisholm S">SW Chisholm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Minot, S" uniqKey="Minot S">S Minot</name>
</author>
<author>
<name sortKey="Bryson, A" uniqKey="Bryson A">A Bryson</name>
</author>
<author>
<name sortKey="Chehoud, C" uniqKey="Chehoud C">C Chehoud</name>
</author>
<author>
<name sortKey="Wu, Gd" uniqKey="Wu G">GD Wu</name>
</author>
<author>
<name sortKey="Lewis, Jd" uniqKey="Lewis J">JD Lewis</name>
</author>
<author>
<name sortKey="Bushman, Fd" uniqKey="Bushman F">FD Bushman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Narasingarao, P" uniqKey="Narasingarao P">P Narasingarao</name>
</author>
<author>
<name sortKey="Podell, S" uniqKey="Podell S">S Podell</name>
</author>
<author>
<name sortKey="Ugalde, Ja" uniqKey="Ugalde J">JA Ugalde</name>
</author>
<author>
<name sortKey="Brochier Armanet, C" uniqKey="Brochier Armanet C">C Brochier-Armanet</name>
</author>
<author>
<name sortKey="Emerson, Jb" uniqKey="Emerson J">JB Emerson</name>
</author>
<author>
<name sortKey="Brocks, Jj" uniqKey="Brocks J">JJ Brocks</name>
</author>
<author>
<name sortKey="Heidelberg, Kb" uniqKey="Heidelberg K">KB Heidelberg</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
<author>
<name sortKey="Allen, Ee" uniqKey="Allen E">EE Allen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nobrega, Fl" uniqKey="Nobrega F">FL Nobrega</name>
</author>
<author>
<name sortKey="Costa, Ar" uniqKey="Costa A">AR Costa</name>
</author>
<author>
<name sortKey="Kluskens, Ld" uniqKey="Kluskens L">LD Kluskens</name>
</author>
<author>
<name sortKey="Azeredo, J" uniqKey="Azeredo J">J Azeredo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Noguchi, H" uniqKey="Noguchi H">H Noguchi</name>
</author>
<author>
<name sortKey="Taniguchi, T" uniqKey="Taniguchi T">T Taniguchi</name>
</author>
<author>
<name sortKey="Itoh, T" uniqKey="Itoh T">T Itoh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hcm" uniqKey="Leung H">HCM Leung</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author>
<name sortKey="Chin, Fyl" uniqKey="Chin F">FYL Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pride, Dt" uniqKey="Pride D">DT Pride</name>
</author>
<author>
<name sortKey="Salzman, J" uniqKey="Salzman J">J Salzman</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Davis Long, C" uniqKey="Davis Long C">C Davis-Long</name>
</author>
<author>
<name sortKey="White, Ra" uniqKey="White R">RA White</name>
</author>
<author>
<name sortKey="Loomer, P" uniqKey="Loomer P">P Loomer</name>
</author>
<author>
<name sortKey="Armitage, Gc" uniqKey="Armitage G">GC Armitage</name>
</author>
<author>
<name sortKey="Relman, Da" uniqKey="Relman D">DA Relman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rappe, Ms" uniqKey="Rappe M">MS Rappé</name>
</author>
<author>
<name sortKey="Giovannoni, Sj" uniqKey="Giovannoni S">SJ Giovannoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reyes, A" uniqKey="Reyes A">A Reyes</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
<author>
<name sortKey="Hanson, N" uniqKey="Hanson N">N Hanson</name>
</author>
<author>
<name sortKey="Angly, Fe" uniqKey="Angly F">FE Angly</name>
</author>
<author>
<name sortKey="Heath, Ac" uniqKey="Heath A">AC Heath</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reyes, A" uniqKey="Reyes A">A Reyes</name>
</author>
<author>
<name sortKey="Semenkovich, Np" uniqKey="Semenkovich N">NP Semenkovich</name>
</author>
<author>
<name sortKey="Whiteson, K" uniqKey="Whiteson K">K Whiteson</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rinke, C" uniqKey="Rinke C">C Rinke</name>
</author>
<author>
<name sortKey="Schwientek, P" uniqKey="Schwientek P">P Schwientek</name>
</author>
<author>
<name sortKey="Sczyrba, A" uniqKey="Sczyrba A">A Sczyrba</name>
</author>
<author>
<name sortKey="Ivanova, Nn" uniqKey="Ivanova N">NN Ivanova</name>
</author>
<author>
<name sortKey="Anderson, Ij" uniqKey="Anderson I">IJ Anderson</name>
</author>
<author>
<name sortKey="Cheng, J F" uniqKey="Cheng J">J-F Cheng</name>
</author>
<author>
<name sortKey="Darling, A" uniqKey="Darling A">A Darling</name>
</author>
<author>
<name sortKey="Malfatti, S" uniqKey="Malfatti S">S Malfatti</name>
</author>
<author>
<name sortKey="Swan, Bk" uniqKey="Swan B">BK Swan</name>
</author>
<author>
<name sortKey="Gies, Ea" uniqKey="Gies E">EA Gies</name>
</author>
<author>
<name sortKey="Dodsworth, Ja" uniqKey="Dodsworth J">JA Dodsworth</name>
</author>
<author>
<name sortKey="Hedlund, Bp" uniqKey="Hedlund B">BP Hedlund</name>
</author>
<author>
<name sortKey="Tsiamis, G" uniqKey="Tsiamis G">G Tsiamis</name>
</author>
<author>
<name sortKey="Sievert, Sm" uniqKey="Sievert S">SM Sievert</name>
</author>
<author>
<name sortKey="Liu, W T" uniqKey="Liu W">W-T Liu</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodriguez Valera, F" uniqKey="Rodriguez Valera F">F Rodriguez-Valera</name>
</author>
<author>
<name sortKey="Martin Cuadrado, A B" uniqKey="Martin Cuadrado A">A-B Martin-Cuadrado</name>
</author>
<author>
<name sortKey="Rodriguez Brito, B" uniqKey="Rodriguez Brito B">B Rodriguez-Brito</name>
</author>
<author>
<name sortKey="Pasi, L" uniqKey="Pasi L">L Pasić</name>
</author>
<author>
<name sortKey="Thingstad, Tf" uniqKey="Thingstad T">TF Thingstad</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Mira, A" uniqKey="Mira A">A Mira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Thurber, Rv" uniqKey="Thurber R">RV Thurber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Krupovic, M" uniqKey="Krupovic M">M Krupovic</name>
</author>
<author>
<name sortKey="Debroas, D" uniqKey="Debroas D">D Debroas</name>
</author>
<author>
<name sortKey="Forterre, P" uniqKey="Forterre P">P Forterre</name>
</author>
<author>
<name sortKey="Enault, F" uniqKey="Enault F">F Enault</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Tournayre, J" uniqKey="Tournayre J">J Tournayre</name>
</author>
<author>
<name sortKey="Mahul, A" uniqKey="Mahul A">A Mahul</name>
</author>
<author>
<name sortKey="Debroas, D" uniqKey="Debroas D">D Debroas</name>
</author>
<author>
<name sortKey="Enault, F" uniqKey="Enault F">F Enault</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Hawley, Ak" uniqKey="Hawley A">AK Hawley</name>
</author>
<author>
<name sortKey="Torres Beltran, M" uniqKey="Torres Beltran M">M Torres Beltran</name>
</author>
<author>
<name sortKey="Scofield, M" uniqKey="Scofield M">M Scofield</name>
</author>
<author>
<name sortKey="Schwientek, P" uniqKey="Schwientek P">P Schwientek</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharon, I" uniqKey="Sharon I">I Sharon</name>
</author>
<author>
<name sortKey="Alperovitch, A" uniqKey="Alperovitch A">A Alperovitch</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
<author>
<name sortKey="Glaser, F" uniqKey="Glaser F">F Glaser</name>
</author>
<author>
<name sortKey="Atamna Ismaeel, N" uniqKey="Atamna Ismaeel N">N Atamna-Ismaeel</name>
</author>
<author>
<name sortKey="Pinter, Ry" uniqKey="Pinter R">RY Pinter</name>
</author>
<author>
<name sortKey="Partensky, F" uniqKey="Partensky F">F Partensky</name>
</author>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
<author>
<name sortKey="Wolf, Yi" uniqKey="Wolf Y">YI Wolf</name>
</author>
<author>
<name sortKey="Nelson, N" uniqKey="Nelson N">N Nelson</name>
</author>
<author>
<name sortKey="Beja, O" uniqKey="Beja O">O Béjà</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharon, I" uniqKey="Sharon I">I Sharon</name>
</author>
<author>
<name sortKey="Battchikova, N" uniqKey="Battchikova N">N Battchikova</name>
</author>
<author>
<name sortKey="Aro, E M" uniqKey="Aro E">E-M Aro</name>
</author>
<author>
<name sortKey="Giglione, C" uniqKey="Giglione C">C Giglione</name>
</author>
<author>
<name sortKey="Meinnel, T" uniqKey="Meinnel T">T Meinnel</name>
</author>
<author>
<name sortKey="Glaser, F" uniqKey="Glaser F">F Glaser</name>
</author>
<author>
<name sortKey="Pinter, Ry" uniqKey="Pinter R">RY Pinter</name>
</author>
<author>
<name sortKey="Breitbart, M" uniqKey="Breitbart M">M Breitbart</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Beja, O" uniqKey="Beja O">O Béjà</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
<author>
<name sortKey="Lindell, D" uniqKey="Lindell D">D Lindell</name>
</author>
<author>
<name sortKey="Lee, Ja" uniqKey="Lee J">JA Lee</name>
</author>
<author>
<name sortKey="Thompson, Lr" uniqKey="Thompson L">LR Thompson</name>
</author>
<author>
<name sortKey="Bielawski, Jp" uniqKey="Bielawski J">JP Bielawski</name>
</author>
<author>
<name sortKey="Chisholm, Sw" uniqKey="Chisholm S">SW Chisholm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suttle, C" uniqKey="Suttle C">C Suttle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suttle, Ca" uniqKey="Suttle C">CA Suttle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Swan, Bk" uniqKey="Swan B">BK Swan</name>
</author>
<author>
<name sortKey="Martinez Garcia, M" uniqKey="Martinez Garcia M">M Martinez-Garcia</name>
</author>
<author>
<name sortKey="Preston, Cm" uniqKey="Preston C">CM Preston</name>
</author>
<author>
<name sortKey="Sczyrba, A" uniqKey="Sczyrba A">A Sczyrba</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Lamy, D" uniqKey="Lamy D">D Lamy</name>
</author>
<author>
<name sortKey="Reinthaler, T" uniqKey="Reinthaler T">T Reinthaler</name>
</author>
<author>
<name sortKey="Poulton, Nj" uniqKey="Poulton N">NJ Poulton</name>
</author>
<author>
<name sortKey="Masland, Edp" uniqKey="Masland E">EDP Masland</name>
</author>
<author>
<name sortKey="Gomez, Ml" uniqKey="Gomez M">ML Gomez</name>
</author>
<author>
<name sortKey="Sieracki, Me" uniqKey="Sieracki M">ME Sieracki</name>
</author>
<author>
<name sortKey="Delong, Ef" uniqKey="Delong E">EF DeLong</name>
</author>
<author>
<name sortKey="Herndl, Gj" uniqKey="Herndl G">GJ Herndl</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thompson, Lr" uniqKey="Thompson L">LR Thompson</name>
</author>
<author>
<name sortKey="Zeng, Q" uniqKey="Zeng Q">Q Zeng</name>
</author>
<author>
<name sortKey="Kelly, L" uniqKey="Kelly L">L Kelly</name>
</author>
<author>
<name sortKey="Huang, Kh" uniqKey="Huang K">KH Huang</name>
</author>
<author>
<name sortKey="Singer, Au" uniqKey="Singer A">AU Singer</name>
</author>
<author>
<name sortKey="Stubbe, J" uniqKey="Stubbe J">J Stubbe</name>
</author>
<author>
<name sortKey="Chisholm, Sw" uniqKey="Chisholm S">SW Chisholm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waldor, Mk" uniqKey="Waldor M">MK Waldor</name>
</author>
<author>
<name sortKey="Mekalanos, Jj" uniqKey="Mekalanos J">JJ Mekalanos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weinbauer, Mg" uniqKey="Weinbauer M">MG Weinbauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Winstanley, C" uniqKey="Winstanley C">C Winstanley</name>
</author>
<author>
<name sortKey="Langille, Mgi" uniqKey="Langille M">MGI Langille</name>
</author>
<author>
<name sortKey="Fothergill, Jl" uniqKey="Fothergill J">JL Fothergill</name>
</author>
<author>
<name sortKey="Kukavica Ibrulj, I" uniqKey="Kukavica Ibrulj I">I Kukavica-Ibrulj</name>
</author>
<author>
<name sortKey="Paradis Bleau, C" uniqKey="Paradis Bleau C">C Paradis-Bleau</name>
</author>
<author>
<name sortKey="Sanschagrin, F" uniqKey="Sanschagrin F">F Sanschagrin</name>
</author>
<author>
<name sortKey="Thomson, Nr" uniqKey="Thomson N">NR Thomson</name>
</author>
<author>
<name sortKey="Winsor, Gl" uniqKey="Winsor G">GL Winsor</name>
</author>
<author>
<name sortKey="Quail, Ma" uniqKey="Quail M">MA Quail</name>
</author>
<author>
<name sortKey="Lennard, N" uniqKey="Lennard N">N Lennard</name>
</author>
<author>
<name sortKey="Bignell, A" uniqKey="Bignell A">A Bignell</name>
</author>
<author>
<name sortKey="Clarke, L" uniqKey="Clarke L">L Clarke</name>
</author>
<author>
<name sortKey="Seeger, K" uniqKey="Seeger K">K Seeger</name>
</author>
<author>
<name sortKey="Saunders, D" uniqKey="Saunders D">D Saunders</name>
</author>
<author>
<name sortKey="Harris, D" uniqKey="Harris D">D Harris</name>
</author>
<author>
<name sortKey="Parkhill, J" uniqKey="Parkhill J">J Parkhill</name>
</author>
<author>
<name sortKey="Hancock, Rew" uniqKey="Hancock R">REW Hancock</name>
</author>
<author>
<name sortKey="Brinkman, Fsl" uniqKey="Brinkman F">FSL Brinkman</name>
</author>
<author>
<name sortKey="Levesque, Rc" uniqKey="Levesque R">RC Levesque</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wommack, Ke" uniqKey="Wommack K">KE Wommack</name>
</author>
<author>
<name sortKey="Colwell, Rr" uniqKey="Colwell R">RR Colwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yoon, Hs" uniqKey="Yoon H">HS Yoon</name>
</author>
<author>
<name sortKey="Price, Dc" uniqKey="Price D">DC Price</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Rajah, Vd" uniqKey="Rajah V">VD Rajah</name>
</author>
<author>
<name sortKey="Sieracki, Me" uniqKey="Sieracki M">ME Sieracki</name>
</author>
<author>
<name sortKey="Wilson, Wh" uniqKey="Wilson W">WH Wilson</name>
</author>
<author>
<name sortKey="Yang, Ec" uniqKey="Yang E">EC Yang</name>
</author>
<author>
<name sortKey="Duffy, S" uniqKey="Duffy S">S Duffy</name>
</author>
<author>
<name sortKey="Bhattacharya, D" uniqKey="Bhattacharya D">D Bhattacharya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Y" uniqKey="Zhou Y">Y Zhou</name>
</author>
<author>
<name sortKey="Liang, Y" uniqKey="Liang Y">Y Liang</name>
</author>
<author>
<name sortKey="Lynch, Kh" uniqKey="Lynch K">KH Lynch</name>
</author>
<author>
<name sortKey="Dennis, Jj" uniqKey="Dennis J">JJ Dennis</name>
</author>
<author>
<name sortKey="Wishart, Ds" uniqKey="Wishart D">DS Wishart</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PeerJ</journal-id>
<journal-id journal-id-type="iso-abbrev">PeerJ</journal-id>
<journal-id journal-id-type="pmc">PeerJ</journal-id>
<journal-id journal-id-type="publisher-id">PeerJ</journal-id>
<journal-title-group>
<journal-title>PeerJ</journal-title>
</journal-title-group>
<issn pub-type="epub">2167-8359</issn>
<publisher>
<publisher-name>PeerJ Inc.</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26038737</article-id>
<article-id pub-id-type="pmc">4451026</article-id>
<article-id pub-id-type="publisher-id">985</article-id>
<article-id pub-id-type="doi">10.7717/peerj.985</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioinformatics</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Genomics</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Virology</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>VirSorter: mining viral signal from microbial genomic data</article-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name>
<surname>Roux</surname>
<given-names>Simon</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="author-notes" rid="aufn-1">
<sup>*</sup>
</xref>
</contrib>
<contrib id="author-2" contrib-type="author">
<name>
<surname>Enault</surname>
<given-names>Francois</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref>
</contrib>
<contrib id="author-3" contrib-type="author">
<name>
<surname>Hurwitz</surname>
<given-names>Bonnie L.</given-names>
</name>
<xref ref-type="aff" rid="aff-4">4</xref>
</contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name>
<surname>Sullivan</surname>
<given-names>Matthew B.</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="author-notes" rid="aufn-1">
<sup>*</sup>
</xref>
<email>mbsulli@email.arizona.edu</email>
</contrib>
<aff id="aff-1">
<label>1</label>
<institution>Ecology and Evolutionary Biology, University of Arizona</institution>
,
<country>USA</country>
</aff>
<aff id="aff-2">
<label>2</label>
<institution>Clermont Université, Université Blaise Pascal, Laboratoire “Microorganismes: Génome et Environnement,”</institution>
<addr-line>Clermont-Ferrand</addr-line>
,
<country>France</country>
</aff>
<aff id="aff-3">
<label>3</label>
<institution>CNRS UMR 6023, LMGE</institution>
,
<addr-line>Aubière</addr-line>
,
<country>France</country>
</aff>
<aff id="aff-4">
<label>4</label>
<institution>Department of Agricultural and Biosystems Engineering, University of Arizona</institution>
,
<country>USA</country>
</aff>
</contrib-group>
<contrib-group>
<contrib id="editor-1" contrib-type="editor">
<name>
<surname>Bishop-Lilly</surname>
<given-names>Kimberly</given-names>
</name>
</contrib>
</contrib-group>
<author-notes>
<fn id="aufn-1" fn-type="current-aff">
<label>*</label>
<p>Current affiliation: Department of Microbiology, The Ohio State University, USA</p>
</fn>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2015-05-28">
<day>28</day>
<month>5</month>
<year iso-8601-date="2015">2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>3</volume>
<elocation-id>e985</elocation-id>
<history>
<date date-type="received" iso-8601-date="2015-04-09">
<day>9</day>
<month>4</month>
<year iso-8601-date="2015">2015</year>
</date>
<date date-type="accepted" iso-8601-date="2015-05-08">
<day>8</day>
<month>5</month>
<year iso-8601-date="2015">2015</year>
</date>
</history>
<permissions>
<copyright-statement>© 2015 Roux et al.</copyright-statement>
<copyright-year>2015</copyright-year>
<copyright-holder>Roux et al.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://peerj.com/articles/985"></self-uri>
<abstract>
<p>Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Virus</kwd>
<kwd>Bacteriophage</kwd>
<kwd>Prophage</kwd>
<kwd>Single-cell amplified genome</kwd>
<kwd>Metagenomics</kwd>
<kwd>Viral metagenomics</kwd>
</kwd-group>
<funding-group>
<award-group id="fund-1">
<funding-source>Gordon and Betty Moore Foundation</funding-source>
<award-id>#3790</award-id>
</award-group>
<award-group id="fund-2">
<funding-source>University of Arizona Ecosystem Genomics Institute</funding-source>
</award-group>
<funding-statement>This work was performed under the auspices of the Gordon and Betty Moore Foundation (#3790) through grants awarded to Matthew B. Sullivan. Simon Roux was partially supported by the University of Arizona Ecosystem Genomics Institute through a grant from the Technology and Research Initiative Fund through the Water, Environmental and Energy Solutions Initiative. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Viruses of microbes, mainly infecting bacteria and archaea, are ubiquitous and abundant in every type of biome sampled thus far, where virus-host interactions alter ecosystem function ranging from geochemical cycling to human health (
<xref rid="ref-20" ref-type="bibr">Fuhrman, 1999</xref>
;
<xref rid="ref-57" ref-type="bibr">Wommack & Colwell, 2000</xref>
;
<xref rid="ref-55" ref-type="bibr">Weinbauer, 2004</xref>
;
<xref rid="ref-7" ref-type="bibr">Breitbart & Rohwer, 2005</xref>
;
<xref rid="ref-16" ref-type="bibr">Edwards & Rohwer, 2005</xref>
;
<xref rid="ref-51" ref-type="bibr">Suttle, 2007</xref>
;
<xref rid="ref-43" ref-type="bibr">Rohwer & Thurber, 2009</xref>
;
<xref rid="ref-29" ref-type="bibr">Letarov & Kulikov, 2009</xref>
;
<xref rid="ref-42" ref-type="bibr">Rodriguez-Valera et al., 2009</xref>
;
<xref rid="ref-40" ref-type="bibr">Reyes et al., 2012</xref>
;
<xref rid="ref-9" ref-type="bibr">Brum & Sullivan, 2015</xref>
). In the oceans, for example, viruses infecting cyanobacteria kill approximately 3% of their hosts per day (
<xref rid="ref-50" ref-type="bibr">Suttle, 2002</xref>
), while also impacting cyanobacterial photosynthesis locally and globally through the expression and transfer of virus-encoded photosystem core genes (
<xref rid="ref-31" ref-type="bibr">Lindell et al., 2005</xref>
;
<xref rid="ref-49" ref-type="bibr">Sullivan et al., 2006</xref>
). Such modulation of host microbial metabolisms during infection appears to be a generalized strategy wherein oceanic viral communities encode genes with the potential to modulate key microbial carbon, nitrogen, phosphate and sulfur metabolisms (
<xref rid="ref-6" ref-type="bibr">Breitbart et al., 2007</xref>
;
<xref rid="ref-47" ref-type="bibr">Sharon et al., 2009</xref>
;
<xref rid="ref-48" ref-type="bibr">Sharon et al., 2011</xref>
;
<xref rid="ref-53" ref-type="bibr">Thompson et al., 2011</xref>
;
<xref rid="ref-23" ref-type="bibr">Hurwitz, Hallam & Sullivan, 2013</xref>
;
<xref rid="ref-4" ref-type="bibr">Anantharaman et al., 2014</xref>
;
<xref rid="ref-46" ref-type="bibr">Roux et al., 2014b</xref>
;
<xref rid="ref-22" ref-type="bibr">Hurwitz, Brum & Sullivan, 2015</xref>
). In humans, viruses of microbes appear dynamic (
<xref rid="ref-39" ref-type="bibr">Reyes et al., 2010</xref>
;
<xref rid="ref-37" ref-type="bibr">Pride et al., 2011</xref>
;
<xref rid="ref-32" ref-type="bibr">Minot et al., 2013</xref>
), and again likely play key ecosystem roles, particularly affecting virulence of facultative pathogens (
<xref rid="ref-5" ref-type="bibr">Boyd, 2012</xref>
;
<xref rid="ref-10" ref-type="bibr">Busby, Kristensen & Koonin, 2013</xref>
) with a striking example being the requirement of a phage infection for the full virulence of
<italic>Vibrio cholerae</italic>
(
<xref rid="ref-54" ref-type="bibr">Waldor & Mekalanos, 1996</xref>
). Microbial viruses may also help fight antibiotic-resistant pathogens, leading to a recent resurgence in research exploring the use of viruses for “phage therapy” in humans (
<xref rid="ref-11" ref-type="bibr">Bush et al., 2011</xref>
;
<xref rid="ref-34" ref-type="bibr">Nobrega et al., 2015</xref>
).</p>
<p>In spite of this importance, our understanding of viral diversity remains limited to a tiny fraction of that occurring in nature. This is because most microbes known to exist from barcode surveys are not yet in culture (
<xref rid="ref-38" ref-type="bibr">Rappé & Giovannoni, 2003</xref>
), and even if microbial hosts were cultivated, not all viruses are amenable to cultivation (
<xref rid="ref-16" ref-type="bibr">Edwards & Rohwer, 2005</xref>
). In the oceans alone, the lack of reference genomes leads to surveys of viral communities returning mostly (63–93%) unknown sequences (
<xref rid="ref-9" ref-type="bibr">Brum & Sullivan, 2015</xref>
), and most (99%) of 5,476 surface ocean viral populations remaining taxonomically unidentifiable beyond the “order” level (
<xref rid="ref-8" ref-type="bibr">Brum et al., 2015</xref>
). This is not surprising, given that 86% of the 1,531 genomes of viruses that infect bacteria and archaea available at RefSeq are associated with only 3 of 61 known host phyla (based on the viral genomes available in NCBI RefseqVirus v69, January 2015).</p>
<p>One way forward is to better detect and catalog viral sequence data from rapidly expanding microbial genomic datasets. First, prophages, which result from the integration of a temperate virus genome into a microbial host genome, are present in ∼60% of sequenced bacteria (
<xref rid="ref-13" ref-type="bibr">Casjens, 2003</xref>
;
<xref rid="ref-12" ref-type="bibr">Canchaya, Fournous & Brüssow, 2004</xref>
). Second, Single-cell Amplified Genome (SAG) datasets are now routinely generated to provide genome sequence data and inferences about metabolic capacity for novel microbes (
<xref rid="ref-52" ref-type="bibr">Swan et al., 2011</xref>
;
<xref rid="ref-25" ref-type="bibr">Kamke, Sczyrba & Ivanova, 2013</xref>
;
<xref rid="ref-41" ref-type="bibr">Rinke et al., 2013</xref>
;
<xref rid="ref-26" ref-type="bibr">Kashtan et al., 2014</xref>
), and offer a rich source of novel viral sequences. These data will include prophage sequences, as well as viruses from actively lytic infections. Such SAG-based viral signal has already provided insights into marine viral diversity and virus-host interactions in uncultivated protists, bacteria and archea (
<xref rid="ref-58" ref-type="bibr">Yoon et al., 2011</xref>
;
<xref rid="ref-46" ref-type="bibr">Roux et al., 2014b</xref>
;
<xref rid="ref-28" ref-type="bibr">Labonté et al., 2015</xref>
). Third, large genome fragments of uncultivated microbes and associated viruses can now be assembled from microbial metagenomes (
<xref rid="ref-47" ref-type="bibr">Sharon et al., 2009</xref>
;
<xref rid="ref-48" ref-type="bibr">Sharon et al., 2011</xref>
;
<xref rid="ref-33" ref-type="bibr">Narasingarao et al., 2012</xref>
;
<xref rid="ref-2" ref-type="bibr">Albertsen et al., 2013</xref>
;
<xref rid="ref-4" ref-type="bibr">Anantharaman et al., 2014</xref>
). Finally, viral metagenomics (viromics) can be used to survey the sequence data associated with purified viral particles and can also result in assembly of large viral genome fragments (
<xref rid="ref-17" ref-type="bibr">Emerson et al., 2012</xref>
;
<xref rid="ref-32" ref-type="bibr">Minot et al., 2013</xref>
;
<xref rid="ref-44" ref-type="bibr">Roux et al., 2013</xref>
;
<xref rid="ref-8" ref-type="bibr">Brum et al., 2015</xref>
).</p>
<p>Numerous approaches are available to identify prophages in complete microbial genomes including Phage_Finder (
<xref rid="ref-19" ref-type="bibr">Fouts, 2006</xref>
), Prophinder (
<xref rid="ref-30" ref-type="bibr">Lima-Mendez et al., 2008</xref>
), PHAST (
<xref rid="ref-59" ref-type="bibr">Zhou et al., 2011</xref>
), and PhiSpy (
<xref rid="ref-1" ref-type="bibr">Akhter, Aziz & Edwards, 2012</xref>
). Overall, prophage predictors rely on the detection of sequence similarities between regions of the microbial genome and known viral genes. In addition, PhiSpy also identifies “viral-like” genomic features (AT and GC skew, protein length and transcription strand directionality) to enable the detection of viruses absent from databases (
<xref rid="ref-1" ref-type="bibr">Akhter, Aziz & Edwards, 2012</xref>
). Prophage predictors also look for prophage “ends” by identifying the attachment sites in the microbial genome for each predicted prophage. These tools are either designed for a user to download and run locally (PhiSpy, Phage_Finder) or to access through a web-server (PHAST).</p>
<p>However, new tools are needed that (i) advance viral detection beyond prophages and instances where new viruses closely match those available in databases, and (ii) can handle fragmented and larger-scale microbial genomic datasets. Here, we present VirSorter, an automated tool designed to detect viral signal in genomic datasets, and make this new tool and the associated databases freely available in the Discovery Environment of the iPlant Cyberinfrastructure (
<xref rid="ref-21" ref-type="bibr">Goff et al., 2011</xref>
). Overall, we demonstrate that VirSorter detects prophages in complete microbial genomes as well as current prophage tools, but also offers capabilities to detect viral sequences in fragmented genomic datasets including incomplete genomes, SAGs or metagenomic assemblies, and can be used to flag potential cellular contamination in viromes for removal.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials & Methods</title>
<sec>
<title>Building reference databases for bacterial and archaeal viruses</title>
<p>Two reference databases of viral protein sequences were built for VirSorter and are available in the iPlant Discovery Environment (Data/Community_Data/iVirus/VirSorter/Database). The first includes 114,297 proteins from viruses infecting bacteria or archaea in RefSeqVirus genomes (as of January 2014), hereafter named “RefSeqABVir.” Protein clusters (PCs) were defined using MCL clustering (
<xref rid="ref-18" ref-type="bibr">Enright, Van Dongen & Ouzounis, 2002</xref>
) of these proteins (inflation 2.0) based on their reciprocal blastp comparisons (threshold of 50 on bit score and 10
<sup>−03</sup>
on E-value). The 9,735 PCs with at least 3 sequences were used to define a profile database searchable with HMMER3 tools (
<xref rid="ref-15" ref-type="bibr">Eddy, 2011</xref>
). The remaining 34,668 unclustered sequences were formatted for a blastp search. All PCs that did not contain any sequences from
<italic>Caudovirales</italic>
and unclustered sequences from viruses other than
<italic>Caudovirales</italic>
were marked as “Non-
<italic>Caudovirales</italic>
.”</p>
<p>The RefSeqABVir database was then augmented by virome sequences sampled from freshwater, seawater, and human gut, lung and saliva, resulting in an extended version of the reference database (hereafter named “Viromes”) which includes both virome and RefSeqABVir sequences. This combined reference dataset should help to detect new viruses for which no cultivated reference sequence is available. When only raw reads were available, viromes were assembled using Newbler (threshold of 98% identity on 35bp). The resulting contigs were then checked for the presence of cellular genome sequences, and only the 68 viromes for which no 16S rRNA genes were retained (see
<xref ref-type="supplementary-material" rid="supp-1">Table S1</xref>
for a complete list of these viromes). Contigs assembled from these 68 viromes were then manually inspected (through annotations generated by Metavir;
<xref rid="ref-45" ref-type="bibr">Roux et al., 2014a</xref>
) and revealed no identifiable cellular genome sequences (i.e., no sequence contained more than 2 genes that matched a cellular genome and were not found in any known virus). A total of 146,521 complete predicted proteins from this quality-controlled dataset were then clustered with the 114,297 proteins from RefSeqABVir, leading to 15,673 clusters with 3 sequences or more, and 88,052 unclustered sequences. PCs from the combined Viromes database were used to create a profile database searchable with HMMER3, and the 34,338 unclustered sequences from RefseqABVir were formatted for BLAST search (unclustered sequences from viromes were not added to the database to prevent the inclusion of contaminating sequences).</p>
<table-wrap id="table-1" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.985/table-1</object-id>
<label>Table 1</label>
<caption>
<title>Comparison of VirSorter predictions with prophage predictors on
<italic>Pseudomonasaeruginosa</italic>
LES B58 genome (
<ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="NC_011770">NC_011770</ext-link>
).</title>
<p>The coordinates of each prophage known on
<italic>Pseudomonas aeruginosa</italic>
LES B58 genome and detection for the different tools are indicated, with absence of detection highlighted in red. For VirSorter and PHAST, the category of detection (1, 2 or 3 for VirSorter, intact, incomplete or questionable for PHAST) is also indicated. False-positive detections of genomic islands as putative prophages are highlighted in orange.</p>
</caption>
<alternatives>
<graphic xlink:href="peerj-03-985-g004"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1">Feature</th>
<th rowspan="1" colspan="1">Coordinates</th>
<th rowspan="1" colspan="1">VirSorter</th>
<th rowspan="1" colspan="1">PHAST</th>
<th rowspan="1" colspan="1">PhiSpy</th>
<th rowspan="1" colspan="1">Phage_Finder</th>
</tr>
</thead>
<tbody>
<tr>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage 1</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">665,272–680,608</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – 2</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – questionable</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage 2</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">863,875–906,018</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – 2</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – questionable</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
</tr>
<tr>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage 3</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">1,433,756–1,476,547</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – 2</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – questionable</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
</tr>
<tr>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage 4</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">1,684,045–1,720,850</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – 2</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – questionable</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
</tr>
<tr>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">Genomic Island 1</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">2,504,700–2,551,100</td>
<td style="background-color:#EB613D;" rowspan="1" colspan="1">Prophage – 3</td>
<td style="background-color:#EB613D;" rowspan="1" colspan="1">Prophage – questionable</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage 5</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">2,690,450–2,740,350</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – 1</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – intact</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage</td>
</tr>
<tr>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">Genomic Island 2</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">2,751,800–2,783,500</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">Genomic Island 3</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">2,796,836–2,907,406</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">Prophage</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">Genomic Island 4</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">3,392,800–3,432,228</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage 6</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">4,545,190–4,552,788</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – 2</td>
<td style="background-color:#94BD5E;" rowspan="1" colspan="1">Prophage – intact</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">Genomic Island 5</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1">4,931,528–4,960,941</td>
<td style="background-color:#EB613D;" rowspan="1" colspan="1">Prophage – 3</td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
<td style="background-color:#FFFFFF;" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>Within these databases, viral “hallmark” genes were defined though a text-searching script looking for “major capsid protein,” “portal,” “terminase large subunit,” “spike,” “tail,” “virion formation” or “coat” annotations. After a manual curation step removing genes with more general annotation such as “protease” or “chaperone,” 826 PCs or single genes were identified as “viral hallmark genes.” This latter point meant removing domains also matching “protease” or “chaperone” domains and was conducted to minimize false positives for our viral hallmark genes category by extra-cautiously avoiding PCs that might include domains that could derive from either both viruses or microbes.</p>
</sec>
<sec>
<title>VirSorter sequence pre-processing</title>
<p>VirSorter was inspired by previous algorithms and tools developed to detect prophages (viral sequences integrated in cellular genomes), especially Prophinder (
<xref rid="ref-30" ref-type="bibr">Lima-Mendez et al., 2008</xref>
). For each (set of) genome(s) and/or contig(s) (for draft genomes) provided as raw nucleotide sequences, the initial stages of VirSorter include a detection of circular sequences (i.e., sequences with matching ends likely representing circular templates;
<xref rid="ref-45" ref-type="bibr">Roux et al., 2014a</xref>
), gene prediction on each sequence with MetageneAnnotator (
<xref rid="ref-35" ref-type="bibr">Noguchi, Taniguchi & Itoh, 2008</xref>
), and selection of all sequences with more than 2 genes predicted. VirSorter also removes all poor-quality predicted protein sequences (predicted protein sequences with more than 50 consecutive X, F, A, K or P residues) likely originating from gene prediction across low-complexity or poorly defined genome regions (e.g., “bridges” between contigs generated during scaffolding) and yielding false-positive matches when compared to protein domain databases.</p>
<p>Predicted protein sequences are then compared to PFAM (v27) and to the viral database selected by the user (either RefSeqABVir or Viromes) with hmmsearch (
<xref rid="ref-15" ref-type="bibr">Eddy, 2011</xref>
) and blastp (
<xref rid="ref-3" ref-type="bibr">Altschul et al., 1997</xref>
) and each gene is affiliated to its most significant hit based on alignment score. Thresholds for significant hits are as follows: minimum score of 40 and maximum E-value of 10
<sup>−05</sup>
for hmmsearch, and minimum score of 50 and maximum E-value of 10
<sup>−03</sup>
for blastp.</p>
</sec>
<sec>
<title>VirSorter metrics computation</title>
<p>Following the sequence pre-processing, viral regions are detected through computation of multiple metrics using sliding windows. The metrics used are (i) presence of viral hallmark genes (
<xref rid="ref-27" ref-type="bibr">Koonin, Senkevich & Dolja, 2006</xref>
;
<xref rid="ref-46" ref-type="bibr">Roux et al., 2014b</xref>
), (ii) enrichment in viral-like genes (i.e., genes with best hit against the viral reference database, either RefSeqABVir or Viromes), (iii) depletion in PFAM affiliated genes, (iv) enrichment in uncharacterized genes (i.e., predicted genes with no hits either in PFAM or the viral reference database), (v) enrichment in short genes (genes with a size within the 10% shorter genes of the genome), and (vi) depletion in strand switching (i.e., change of coding strand between two consecutive genes).</p>
<p>For all the enrichment and depletion metrics, a score comparable to the one of Prophinder was used (
<xref rid="ref-30" ref-type="bibr">Lima-Mendez et al., 2008</xref>
). First, a global value for each metric is estimated for the whole genome set (global rate of viral-like genes, global rate of PFAM-affiliated genes,
<italic>etc</italic>
). Then, for each window, the number of observed events (e.g. number of viral-like genes) is compared to an expected number deduced from the global value of the metric (modeled with a binomial law). A
<italic>p</italic>
-value is computed, reflecting the probability of observing
<italic>n</italic>
events or more (for enrichment) or
<italic>n</italic>
events or fewer (for depletion) at random, thus corresponding to a risk of generating false positives. These
<italic>p-</italic>
values are multiplied by the total number of comparisons (here the total number of sliding windows observed on a sequence), and a negative logarithmic transformation (−log
<sub>10</sub>
) defines the associated significance score, again as in the Prophinder algorithm (
<xref rid="ref-30" ref-type="bibr">Lima-Mendez et al., 2008</xref>
).</p>
<p>For the detection of viral-like genes enrichment, two different values are computed for each dataset: one based on genes in the entire database (RefSeqABVir or Viromes), and another based on non-
<italic>Caudovirales</italic>
genes only. Indeed,
<italic>Caudovirales</italic>
genomes represent 81% of RefSeqABVir, and the remaining viral families usually have only a handful of reference genomes. The global rate of viral-like genes in cellular genomes is thus usually one order of magnitude lower when considering only non-
<italic>Caudovirales</italic>
genes (viral-like genes ratio across the bacterial and archaeal class for which complete genomes are available at NCBI RefSeq and WGS ranges from 4.8 to 16%, with an average of 10.6%, whereas the ratio of non-
<italic>Caudovirales</italic>
genes in these same genomes ranges from 0.01 to 1.4%, with an average of 0.16%). Hence, the same number of genes in a region would be considered as non-significant when matching
<italic>Caudovirales</italic>
(compared to the global rate of
<italic>Caudovirales</italic>
-like genes in the whole genomes), but would be significant when only composed of non-
<italic>Caudovirales</italic>
genes.</p>
</sec>
<sec>
<title>Sequence metrics summary</title>
<p>Each metric is computed using sliding windows from 10 to 100 genes wide, starting at every gene along the sequence, and all scores greater than 2 are stored. Local maxima of significance score are then searched and the associated set of genes is defined as a putative viral region. These different predictions (based on the metrics above) are then merged when overlapping (extending the regions to include all predicted windows), leading to a list of putative viral regions associated with a (set of) metric(s). These regions are classified into three categories: (i)
<italic>category 1</italic>
(“most confident” predictions) regions have significant enrichment in viral-like genes or non-
<italic>Caudovirales</italic>
genes on the whole region and at least one hallmark viral gene detected; (ii)
<italic>category 2</italic>
(“likely” predictions) regions have either enrichment in viral-like or non-
<italic>Caudovirales</italic>
genes, or a viral hallmark gene detected, associated with at least one other metric (depletion in PFAM affiliation, enrichment in uncharacterized genes, enrichment in short genes, depletions in strand switch); and (iii)
<italic>category 3</italic>
(“possible” predictions) regions have neither a viral hallmark gene nor enrichment in viral-like or non-
<italic>Caudovirales</italic>
genes, but display at least two of the other metrics with at least one significance score greater than 4. Finally, if a predicted region spans more than 80% of predicted genes on a contig, the entire contig is considered viral. A summary of VirSorter detection types is displayed in
<xref ref-type="fig" rid="fig-1">Fig. 1B</xref>
.</p>
<fig id="fig-1" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.985/fig-1</object-id>
<label>Figure 1</label>
<caption>
<title>VirSorter process: overview (A) and examples of viral sequence detection (B).</title>
<p>(A) Overview of VirSorter process. The top part described the different parts of the sequence analysis pipeline, and the bottom frame summarizes the classification in three categories of decreasing confidence based on the different metrics being significant (green dot) or not (black cross). Viral “hallmark” genes or protein clusters (PCs) were identified by looking for genes typically of viral origin that are annotated as “major capsid protein,” “portal,” “terminase large subunit,” “spike,” “tail,” “virion formation” or “coat” and manually removing all protein domains with a potential overlap with microbial functions. (B) Examples of viral sequence detection by VirSorter. On top is the clearest case, in which a sequence harbors several viral hallmark genes as well as enrichment in viral-like genes (or virome-like when the genes are most similar to a viral metagenome sequence, when using the Viromes database). This type of detection is considered as the most confident. The three examples below are different cases in which only one of the primary metrics is significant. Notably, these examples display how VirSorter can detect new viruses based on a significant depletion in characterized genes associated with a viral hallmark gene (case 3), and how the same number of genes can be a non-significant enrichment when considering all viruses, yet significant when looking at only the non-
<italic>Caudovirales</italic>
(case 4). These detections are still considered confident, although less sure than case 1. Finally, a last example (case 5) displays a more ambiguous situation, in which a sequence displays only secondary viral metrics but neither viral gene enrichment nor a viral hallmark gene. For these detections, one of the metrics (at least) must have an E-value lower than 10
<sup>−04</sup>
(note that significance scores used in VirSorter output files are computed as negative log
<sub>10</sub>
transformations of E-values, and would here correspond to a score of 4 or more).</p>
</caption>
<graphic xlink:href="peerj-03-985-g001"></graphic>
</fig>
<p>Next, higher confidence predictions are used to refine the sequence space search. Specifically, sequences from all open reading frames from
<italic>category 1</italic>
predictions that do not match a viral protein cluster are clustered and added to the reference database (RefSeqABVir or Viromes depending on the initial user choice). This updated database is then used in another round of search by VirSorter. This iteration where
<italic>category 1</italic>
sequences are used to refine the searches is continued until no new genes are added to the database. Once no new genes are added, the final VirSorter output is provided to the user and includes nucleotide sequences of all predicted viral sequences in fasta files, an automatic annotation of each prediction in genbank file format, and a summary table displaying for each prediction the associated category and significance scores of all metrics. By providing the predictions and the underlying significance scoring, users can evaluate each prediction and apply custom thresholds on significance scores through a simple text-parsing script, even for large-scale datasets.</p>
<p>VirSorter is available as an application (App) in the iPlant discovery environment (
<uri xlink:href="https://de.iplantcollaborative.org/de/">https://de.iplantcollaborative.org/de/</uri>
) under Apps/Experimental/iVirus (see
<xref ref-type="supplementary-material" rid="supp-1">Fig. S1</xref>
for a step-by-step guide of VirSorter app on iPlant). This application allows users to search any set of contigs for viral sequences using either the RefSeqABVir or the Viromes database. The reference values of VirSorter metrics will be evaluated on the complete set of input sequences, hence mixed datasets should be sorted (when possible) by type of bacteria or archaea in order to get the most accurate result possible. In addition to these reference databases, the VirSorter App on iPlant allows users to input their own reference viral genome sequence already assembled or to-be assembled using iPlant Apps prior to analysis with VirSorter. Assembled sequences are processed as follows: (i) genes are predicted with MetaGeneAnnotator (
<xref rid="ref-35" ref-type="bibr">Noguchi, Taniguchi & Itoh, 2008</xref>
), (ii) predicted proteins are clustered with sequences from the user-selected database (either RefSeqABVir or Viromes), and (iii) unclustered proteins are added to the “unclustered” pool. VirSorter scripts are also available through the github repository
<uri xlink:href="https://github.com/simroux/VirSorter.git">https://github.com/simroux/VirSorter.git</uri>
.</p>
</sec>
<sec>
<title>Comparison of VirSorter with other prophage predictors</title>
<p>We first evaluated VirSorter results against the manually curated prophages from (
<xref rid="ref-13" ref-type="bibr">Casjens, 2003</xref>
). Each genome was processed with VirSorter, PhiSpy (
<xref rid="ref-1" ref-type="bibr">Akhter, Aziz & Edwards, 2012</xref>
), Phage_Finder (
<xref rid="ref-19" ref-type="bibr">Fouts, 2006</xref>
) and PHAST (
<xref rid="ref-59" ref-type="bibr">Zhou et al., 2011</xref>
). For each tool, a prophage was considered as “detected” when a prediction covered more than 75% of the known prophage. For a more detailed example case of prophage detection in a complete bacterial genome including both prophages and genomic islands, the same tools were applied to the manually annotated
<italic>Pseudomonas aeruginosa</italic>
LES B58 genome (
<xref rid="ref-56" ref-type="bibr">Winstanley et al., 2009</xref>
).</p>
<p>VirSorter was then compared with the same prophage detection tools on the set of simulated SAGs. In that case, a viral sequence was considered as detected if predicted as completely viral or as a prophage. All the additional detections were manually checked to verify if the region was indeed viral (originating from a prophage in one of the microbial genomes rather than from a viral genome) or a false positive. The same approach was used for the simulated microbial and viral metagenomes results.</p>
<p>For each set of predictions, two metrics are computed. First, the Recall value corresponds to the number of viral sequences correctly predicted divided by the total number of known viral sequences in the dataset, and reflects the ability of the tool to find every known viral sequence in the dataset. Second, the Precision value is computed as the total number of viral sequences correctly predicted divided by the total number of viral sequences predicted, and indicates how accurate the tool is in its identification of viral signal.</p>
</sec>
<sec>
<title>Simulation of draft genomes and metagenomes</title>
<p>A total of 10 Single-cell amplified genomes, 10 microbial metagenomes and 10 viral metagenomes were simulated with NeSSM (
<xref rid="ref-24" ref-type="bibr">Jia et al., 2013</xref>
). Microbial genomes were randomly picked within the bacterial and archaeal genomes available in RefSeq and WGS (as of January 2014). Viral genomes were picked within the most recently submitted genomes (since June 2014), thus are not in VirSorter reference database. Simulated inputs for each genome group (viral and microbial) followed a power-law distribution of abundances within the microbial and viral communities. The proportion of viral reads varied from 5 to 20% for microbial metagenome, and from 75 to 99% for viral metagenomes (
<xref ref-type="supplementary-material" rid="supp-1">Tables S4</xref>
and
<xref ref-type="supplementary-material" rid="supp-1">S5</xref>
). For each simulated dataset, 100bp paired-end reads similar to those obtained with HiSeq Illumina were generated (100,000 for SAGs, 1,000,000 for metagenomes), QC’d with fastq_quality_trimmer with a threshold of 30 (part of the fastx_toolkit,
<uri xlink:href="http://hannonlab.cshl.edu/fastx_toolkit/">http://hannonlab.cshl.edu/fastx_toolkit/</uri>
), and assembled with Idba_ud (
<xref rid="ref-36" ref-type="bibr">Peng et al., 2012</xref>
).</p>
<p>To identify viral sequences in the assemblies, the resulting contigs were compared to the viral genomes with nucmer (
<xref rid="ref-14" ref-type="bibr">Delcher, Salzberg & Phillippy, 2003</xref>
), and all sequences matching one of the viral genomes at 97% nucleotide identity or more were considered as viral. All simulated contigs and composition table (i.e., relative abundance of each genome in the simulated dataset) are available in the iPlant Discovery Environment alongisde VirSorter results for each of these simulated datasets (/iplant/home/shared/imicrobe/VirSorter/Benchmark_datasets and Benchmark_results respectively).</p>
</sec>
</sec>
<sec>
<title>Results & Discussion</title>
<sec>
<title>Reference-dependent and general genome features used to detect viruses</title>
<p>VirSorter is designed to predict viral sequences in complete or fragmented genome sequence data from bacteria and archaea. Viral sequences are identified through a combination of “primary metrics” linked to the detection of significant similarities with known viral sequences and “secondary metrics” associated with viral-like genome structure (
<xref ref-type="fig" rid="fig-1">Fig. 1A</xref>
). VirSorter first builds a probabilistic model for each metric using the microbial genomic data provided by the user (i.e., the complete genome or the entire contig dataset for draft genomes or metagenomes) that is then used as reference to calculate enrichment/depletion statistics. A “statistical enrichment in viral gene content” for a set of genes thus indicates that the region evaluated displays more viral-like genes than would be expected by chance alone based on the overall frequency of viral-like genes in the whole dataset. Viral-like genes are identified through comparison to RefSeq viral genomes (“RefSeqABVir” database hereafter) or to a custom database built from RefSeqABVir to which curated virome datasets were added to improve novel virus detection capabilities (hereafter “Viromes” database,
<xref ref-type="fig" rid="fig-1">Fig. 1A</xref>
and
<xref ref-type="supplementary-material" rid="supp-1">Table S1</xref>
). Through the VirSorter application (App) on iPlant, users can also add their own viral genome sequence(s) (in fasta format), which predicted protein will be added to the user-selected database (either RefSeqABVir or Viromes).</p>
</sec>
<sec>
<title>Viral signal mining process</title>
<p>Viral regions are predicted based on a summary of primary and secondary metrics evaluated on each genomic sequence. Each prediction is categorized from 1 to 3 in order of decreased confidence (
<xref ref-type="fig" rid="fig-1">Figs. 1A</xref>
and
<xref ref-type="fig" rid="fig-1">1B</xref>
). Sequences for which the predicted viral region spans more than 80% of the contig length are considered as entirely viral. Biologically, we interpret these different categories as sequences similar to known viral references (
<italic>category 1</italic>
), sequences divergent from references with mostly genes yet to be detected in viral genomes or partial sequences lacking viral hallmark genes which may include defective prophages (
<italic>category 2</italic>
), and sequences or regions with a genome structure similar to viral genomes, but lacking any similarity to known viruses or viromes (
<italic>category 3</italic>
). These latter,
<italic>category 3</italic>
predictions are thus essentially “aberrant” cellular genomic regions, and as such should be carefully examined as this category also routinely includes hypervariable microbial genomic islands and other mobile genetic elements in addition to novel viral sequences. However, we include
<italic>category 3</italic>
predictions since when coupled to manual inspection, researchers can use these predictions to uncover novel biology, particularly when analyzing the small contigs and highly novel viruses likely to derive from fragmented draft genomes or SAGs.</p>
</sec>
<sec>
<title>Virsorter prophage prediction is comparable to existing tools</title>
<p>To evaluate VirSorter performances, we first examined its prophage prediction capability as compared to existing tools. Specifically, we used a set of 267 manually annotated prophages from 54 bacterial genomes (
<xref rid="ref-13" ref-type="bibr">Casjens, 2003</xref>
) to compare the prophage prediction performances of VirSorter, PhiSpy, Phage_Finder, and PHAST. We evaluate performance using two metrics: (i) “Recall,” the number of viral regions detected divided by the total number of viral regions (also known as “Sensitivity”) and (ii) “Precision,” the number of correct predictions divided by the total number of predictions (also known as “Positive Predictive Value”).</p>
<p>All of the tested prophage prediction tools perform well on these complete genome datasets as Recall values range from 64 to 85%, and Precision values range from 74 to 93% (
<xref ref-type="fig" rid="fig-2">Fig. 2</xref>
and
<xref ref-type="supplementary-material" rid="supp-1">Table S2</xref>
). Two of the tools also associate their predictions with a level of confidence: PHAST predictions are noted as “intact,” “incomplete,” or “questionable” based on the number and type of phage genes detected, and VirSorter categorizes predictions as described above. To see how these confidence categories impacted results, we computed scores with and without the least confident predictions for both of these tools (
<xref ref-type="fig" rid="fig-2">Fig. 2</xref>
). For PHAST, adding the questionable detections increased detection sensitivity (Recall increased from 70 to 84%) without altering the Precision (both sets of predictions display a Precision of 83%). Conversely, including the least confident
<italic>category 3</italic>
predictions for VirSorter only slightly increased Recall (73 to 79%), but did so at the cost of Precision (dropping from 93 to 72%). Hence, for VirSorter, prophages predicted as
<italic>category 3</italic>
from complete microbial genomes are prone to “false-positive” detections—notably because they can also include other genomic regions with unusual sequence composition features such as genomic islands or mobile genetic elements (see below).</p>
<fig id="fig-2" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.985/fig-2</object-id>
<label>Figure 2</label>
<caption>
<title>Accuracy of viral sequence predictions of VirSorter, PHAST, Phage_finder and PhiSpy on (A) complete microbial genomes, and (B) draft genomes from simulated SAGs including a microbial and viral genome.</title>
<p>For each set of predictions (i.e., each tool and set of option when applicable), the two metrics used to evaluate the tool performance are Recall (
<italic>x</italic>
-axis, proportion of known viral sequences or regions detected) and Precision (
<italic>y</italic>
-axis, proportion of predictions that corresponded to known viral sequences or regions). Prophages identified in the complete microbial genomes are compared to the list of manually curated prophages from
<xref rid="ref-13" ref-type="bibr">Casjens (2003)</xref>
.</p>
</caption>
<graphic xlink:href="peerj-03-985-g002"></graphic>
</fig>
<p>Next, we focused on the case of prophages prediction in the manually annotated
<italic>Pseudomonas aeruginosa</italic>
LES B58 genome, which includes both prophages and genomic islands (
<xref rid="ref-56" ref-type="bibr">Winstanley et al., 2009</xref>
), to better explore how these tools deal with divergent prophages and unusual genomic regions (
<xref ref-type="table" rid="table-1">Table 1</xref>
). All 6 known prophages in this genome were detected by VirSorter (categories 1 or 2), and PHAST (though 4 were considered “questionable”), whereas PhiSpy and Phage_Finder detected only 5 and 4, respectively. These missed prophages were the shortest ones (12 and 19 genes compared to >40 genes for all the other prophages), and one (Prophage_6) also corresponded to an unusual phage from the
<italic>Inoviridae</italic>
familiy, under-represented in viral genome databases. Beyond prophages, this microbial genome also displayed 5 manually curated genomic islands. None of these genomic islands were detected as a prophage by Phage_Finder, while PhiSpy and PHAST each wrongly identifies one of these genomic islands as a prophage, and VirSorter identifies two of them as
<italic>category 3</italic>
predictions (i.e., putative prophage or other unusual genomic feature,
<xref ref-type="table" rid="table-1">Table 1</xref>
). This example illustrates that
<italic>category 3</italic>
predictions from VirSorter help capture even divergent prophages, but also detect hypervariable regions in microbial genomes, such as genomic islands or plasmids.</p>
</sec>
<sec>
<title>VirSorter is more efficient at mining viral signal from single-cell amplified genomes (SAGs)</title>
<p>To evaluate the capacity of prophage predictors and VirSorter to detect viral sequences in SAG datasets, we generated 10 simulated datasets of 100,000 reads (100bp) from one microbial and one viral genome, with 5 to 10% of the reads originating from the viral genome (
<xref ref-type="supplementary-material" rid="supp-1">Table S3</xref>
). For each simulated dataset, reads were assembled into contigs (averages = 556 contigs per SAG ∼3.3 kb in length), from which viral sequences or prophages were then predicted (the viral genomes used in the simulated datasets being absent from the VirSorter reference database).</p>
<p>On these SAGs, VirSorter outperformed all other tools as the only one maintaining comparable Recall and Precision values to those from complete microbial genomes (
<xref ref-type="fig" rid="fig-2">Fig. 2B</xref>
). VirSorter
<italic>categories 1</italic>
&
<italic>2</italic>
(higher confidence predictions) displayed a Recall of 65% and a Precision of 100%, while adding in category 3 predictions increased Recall (88%) but reduced Precision (81%). Thus, for fragmented genomes,
<italic>category 3</italic>
predictions help recover more viral sequences, but do so at the cost of increased false-positives. In comparison, PHAST (with or without the “questionable” predictions) performed at 40–50% Recall and 38–41% Precision, whereas PhiSpy and Phage_Finder had a lower Recall (36 and 20%, respectively) but high Precision (90 and 83%, respectively). Considering that the prophage detection tools were optimized for viral sequence detection in complete microbial genomes, it is not surprising that VirSorter performs better for fragmented genomes.</p>
<p>We also applied VirSorter and the prophage predictors to a set of 127 SAGs from the uncultivated bacteria SUP05 for which viral sequences were previously manually identified and curated (
<xref rid="ref-46" ref-type="bibr">Roux et al., 2014b</xref>
). Of the 69 known viral contigs in this dataset, 62 were detected by VirSorter (including 29 as
<italic>category 3</italic>
), with the remaining 7 being too short (5.1kb on average) to provide significant enrichment scores. In contrast, PHAST, PhiSpy and Phage_Finder detected only 15, 1 and none of these sequences, respectively. Beyond the fragmented nature of these SUP05 SAGs, these data likely represent a worst case scenario for the prophage prediction tools as these 69 SUP05 viruses represented new viral genera, and thus no closely related reference sequence were available in databases (
<xref rid="ref-46" ref-type="bibr">Roux et al., 2014b</xref>
).</p>
</sec>
<sec>
<title>VirSorter alone is able to mine viral signal from bacterial and viral metagenomes</title>
<p>Next, we evaluated VirSorter’s capability to recover viral sequences in fragmented genomes assembled from metagenomic datasets. To this end, we created 10 ‘metagenomes’ from 15 microbial and 15 viral genomes at varying representative abundances (
<xref ref-type="supplementary-material" rid="supp-1">Table S4</xref>
and see ‘Methods’). These simulated datasets total 192,941 contigs, so the scale is quite large—none of the prophage predictors were able to even process the data in a reasonable time (i.e., less than several days). Given that metagenome-derived contigs also represented fragmented genomes, we expect that performance would have been poor for prophage prediction tools on these datasets—likely even worse than the SAGs performance testing above.</p>
<p>VirSorter, however, was designed for and is thus able to handle such datasets. For contigs greater than 500bp, VirSorter predictions displayed good Precision (93–100%) but low Recall (33%,
<xref ref-type="fig" rid="fig-3">Fig. 3A</xref>
and
<xref ref-type="supplementary-material" rid="supp-1">Table S4</xref>
). However, as the size of the contigs increased, Recall increased to 79–84% for contigs >3kb and 95–97% for contigs >10kb, with no Precision loss (
<xref ref-type="fig" rid="fig-3">Fig. 3B</xref>
).</p>
<fig id="fig-3" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.985/fig-3</object-id>
<label>Figure 3</label>
<caption>
<title>Detection of viral sequences in microbial metagenomes by VirSorter.</title>
<p>(A) Average Recall (
<italic>x</italic>
-axis) and Precision (
<italic>y</italic>
-axis) of viral sequence detection by VirSorter in 10 simulated microbial metagenomes for different contig size thresholds. (B) Detection of viral sequences by VirSorter in simulated microbial metagenomes by contig size fraction.</p>
</caption>
<graphic xlink:href="peerj-03-985-g003"></graphic>
</fig>
<p>Finally, we evaluated the potential of VirSorter to detect viral sequences in viral metagenomes contaminated with cellular genomes. Such cellular sequence in viral-fraction metagenomes can derive from co-purified encapsidated DNA (in gene transfer agents or generalized transducing phages) or contamination, and represents a common challenge in making inferences from viromes (
<xref rid="ref-44" ref-type="bibr">Roux et al., 2013</xref>
). We thus simulated 10 viral metagenomes of 1,000,000 reads (100bp) from a mix of 15 microbial and 60 viral genomes. This time, we simulated metagenomes where viral reads represented a larger proportion of the dataset, ranging from 75 to 99% (
<xref ref-type="supplementary-material" rid="supp-1">Table S5</xref>
). Here, all microbial genomes available in RefSeq and WGS (as of January 2014) were used by VirSorter to model microbial genomic metrics instead of the whole dataset, since viromes largely lack microbial sequences. This usage case of VirSorter is implemented in the iPlant application and is available by checking the box “virome decontamination” in the submission form.</p>
<p>As found above for prediction of viral sequence data from the microbial metagenome simulations, VirSorter performance as a ‘virome decontaminator’ improves as contig size increases (
<xref ref-type="table" rid="table-2">Table 2</xref>
and
<xref ref-type="supplementary-material" rid="supp-1">Table S5</xref>
). When considering all contigs (>500bp), the Recall of viral sequences is 32% on average, but increases up to 86% for contigs >3kb and 97% for contigs >10kb. When
<italic>category 3</italic>
predictions are included, these Recall values increase slightly to 33%, 90% and 99.8% for the increasing contig sizes, respectively. At the same time, the Precision of viral sequence detection stays high for all contig sizes, even when including
<italic>category 3</italic>
predictions (99% and more,
<xref ref-type="table" rid="table-2">Table 2</xref>
).</p>
<table-wrap id="table-2" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.985/table-2</object-id>
<label>Table 2</label>
<caption>
<title>Results of VirSorter viral sequence detection on simulated viral metagenomes with a limited contamination by cellular genomes (1 to 25% of raw reads).</title>
<p>Metrics presented are Recall (proportion of viral sequences detected) and Precision (proportion of predictions corresponding to viral sequences).</p>
</caption>
<alternatives>
<graphic xlink:href="peerj-03-985-g005"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1"></th>
<th align="center" colspan="2" rowspan="1">VirSorter—categories 1 & 2</th>
<th align="center" colspan="2" rowspan="1">VirSorter—all categories</th>
</tr>
<tr>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1">
<italic>Recall</italic>
</th>
<th rowspan="1" colspan="1">
<italic>Precision</italic>
</th>
<th rowspan="1" colspan="1">
<italic>Recall</italic>
</th>
<th rowspan="1" colspan="1">
<italic>Precision</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">
<bold>All contigs (>500bp)</bold>
</td>
<td rowspan="1" colspan="1">31.71%</td>
<td rowspan="1" colspan="1">99.89%</td>
<td rowspan="1" colspan="1">32.96%</td>
<td rowspan="1" colspan="1">99.79%</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>Contigs >3kb</bold>
</td>
<td rowspan="1" colspan="1">85.64%</td>
<td rowspan="1" colspan="1">99.80%</td>
<td rowspan="1" colspan="1">90.29%</td>
<td rowspan="1" colspan="1">99.62%</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>Contigs >10kb</bold>
</td>
<td rowspan="1" colspan="1">97.14%</td>
<td rowspan="1" colspan="1">99.48%</td>
<td rowspan="1" colspan="1">99.82%</td>
<td rowspan="1" colspan="1">98.99%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
<sec>
<title>VirSorter’s strengths and weaknesses</title>
<p>VirSorter represents a novel, scalable, and community-available tool for detecting and identifying viral genome sequences from diverse microbial datasets. Its performance for prophage prediction is largely comparable to that of available prophage prediction tools when applied to complete microbial genomes, but it outperforms available tools when making predictions from modern microbial datasets which tend to be fragmented and larger-scale or when searching for viruses beyond those “known” in current databases. Thus, VirSorter complements existing tools to help elucidate bacterial and archaeal viral sequences among myriad modern microbial genomic data types.</p>
<p>However, VirSorter does have limitations. First, VirSorter was designed and optimized for detection of bacterial and archaeal viruses, so it does not detect eukaryotic viruses well. This is because the database lacks eukaryote viruses, and the viral genome features were only evaluated on bacterial and archaeal viruses. VirSorter will still detect eukaryote viruses, often as
<italic>category 3</italic>
because of their singular genome composition (compared to a typical cellular genome), but its capacity is extremely limited in its current build. Second, short (<3kb) viral contigs will tend to only be detected by VirSorter when they contain hallmark genes. Pragmatically, this means that viral signal detection in non-assembled reads or in contigs assembled from (meta)transcriptome data will usually be inefficient. Third, prophage prediction tools also look for additional signs of prophages such as the presence of integrase genes,
<italic>att</italic>
sites, or repeat features to demarcate the ‘ends’ of a prophage genome, none of these features are examined by VirSorter. Thus, prophage prediction tools likely remain the best means to most accurately annotate prophages in a complete microbial genome, whereas VirSorter is best used for high-throughput analyses and for detecting viral signal in fragmented genomes. Finally,
<italic>category 3</italic>
detections represent sequences and regions that are unique within the genome(s) being compared, so while many can be viral, these predictions could also represent other mobile genetic elements or hypervariable genomic islands and require manual curation. The only case where
<italic>category 3</italic>
predictions may be considered without manual curation are viral metagenome decontaminations as these predictions increase Recall while only marginally lowering Precision.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="supplemental-information">
<title>Supplemental Information</title>
<supplementary-material content-type="local-data" id="supp-1">
<object-id pub-id-type="doi">10.7717/peerj.985/supp-1</object-id>
<label>Supplementary Tables</label>
<caption>
<title>Supplementary Tables</title>
<p>
<bold>Table S1: List of viromes which predicted proteins were added to the ones of bacterial and archaeal viral genomes from Refseq in the “Virome” database.</bold>
Only contigs >500bp were considered, and a virome was added to the list only if no 16S rDNA genes were detected in these contigs. Only the complete predicted proteins (i.e. with a start and a stop codon) were added to the database.</p>
<p>
<bold>Table S2: List of prophages identified in
<xref rid="ref-13" ref-type="bibr">Casjens (2003)</xref>
, and results of prophage predictions on the same genomes for the different prophage predictors and VirSorter.</bold>
</p>
<p>
<bold>Table S3: List and composition of simulated Single-Cell Amplified Genome datasets, with results of viral sequence detection from prophage detectors and VirSorter.</bold>
The results of viral sequences detection are indicated as “number of correct detections / number of false positives”. All simulated reads were HiSeq Illumina paired-end 100bp reads.</p>
<p>
<bold>Table S4: List and composition of microbial metagenome simulated datasets, with results of viral sequence detection from VirSorter.</bold>
The results of viral sequences detection are indicated as Recall (number of correct predictions divided by the total number of viral sequences in the dataset) and Precision (number of correct predictions divided by the total number of predictions). All simulated reads were HiSeq Illumina paired-end 100bp reads.</p>
<p>
<bold>Table S5: List and composition of viral metagenome simulated datasets, with results of viral sequence detection from VirSorter.</bold>
The results of viral sequences detection are indicated as Recall (number of correct predictions divided by the total number of viral sequences in the dataset) and Precision (number of correct predictions divided by the total number of predictions). All simulated reads were HiSeq Illumina paired-end 100bp reads.</p>
</caption>
<media xlink:href="peerj-03-985-s001.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-2">
<object-id pub-id-type="doi">10.7717/peerj.985/supp-2</object-id>
<label>Figure S1</label>
<caption>
<title>Step-by-step guide for the use of VirSorter in the Discovery Environment (iPlant)</title>
</caption>
<media xlink:href="peerj-03-985-s002.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>We thank Nirav Merchant, Darren Boss, and Ken Youens-Clark for aiding in setting up VirSorter on the iPlant platform, Rachel Whitaker and Whitney England for their assistance with
<italic>Pseudomonas aeruginosa</italic>
LES B58 genome annotation, as well as TMPL members for comments on the manuscript.</p>
</ack>
<sec sec-type="additional-information">
<title>Additional Information and Declarations</title>
<fn-group content-type="competing-interests">
<title>Competing Interests</title>
<fn id="conflict-1" fn-type="conflict">
<p>The authors declare there are no competing interests.</p>
</fn>
</fn-group>
<fn-group content-type="author-contributions">
<title>Author Contributions</title>
<fn id="contribution-1" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-1">Simon Roux</xref>
conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.</p>
</fn>
<fn id="contribution-2" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-2">Francois Enault</xref>
conceived and designed the experiments, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.</p>
</fn>
<fn id="contribution-3" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-3">Bonnie L. Hurwitz</xref>
performed the experiments, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.</p>
</fn>
<fn id="contribution-4" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-4">Matthew B. Sullivan</xref>
conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.</p>
</fn>
</fn-group>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1">
<label>Akhter, Aziz & Edwards (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akhter</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Aziz</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies</article-title>
<source>Nucleic Acids Research</source>
<year>2012</year>
<volume>40</volume>
<fpage>1</fpage>
<lpage>13</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gks406</pub-id>
<pub-id pub-id-type="pmid">21908400</pub-id>
</element-citation>
</ref>
<ref id="ref-2">
<label>Albertsen et al. (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Albertsen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Skarshewski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>PH</given-names>
</name>
</person-group>
<article-title>Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes</article-title>
<source>Nature Biotechnology</source>
<year>2013</year>
<volume>31</volume>
<fpage>533</fpage>
<lpage>538</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2579</pub-id>
</element-citation>
</ref>
<ref id="ref-3">
<label>Altschul et al. (1997)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Schäffer</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>
<source>Nucleic Acids Research</source>
<year>1997</year>
<volume>25</volume>
<fpage>3389</fpage>
<lpage>3402</lpage>
<pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id>
<pub-id pub-id-type="pmid">9254694</pub-id>
</element-citation>
</ref>
<ref id="ref-4">
<label>Anantharaman et al. (2014)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anantharaman</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Duhaime</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Breier</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Wendt</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Toner</surname>
<given-names>BM</given-names>
</name>
<name>
<surname>Dick</surname>
<given-names>GJ</given-names>
</name>
</person-group>
<article-title>Sulfur oxidation genes in diverse deep-sea viruses</article-title>
<source>Science</source>
<year>2014</year>
<volume>344</volume>
<fpage>757</fpage>
<lpage>760</lpage>
<pub-id pub-id-type="doi">10.3354/meps145269</pub-id>
<pub-id pub-id-type="pmid">24789974</pub-id>
</element-citation>
</ref>
<ref id="ref-5">
<label>Boyd (2012)</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Boyd</surname>
<given-names>EF</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Łobocka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Szybalski</surname>
<given-names>WT</given-names>
</name>
</person-group>
<article-title>Bacteriophage-encoded bacterial virulence factors and phage-pathogenicity island interactions</article-title>
<source>Advances in virus research</source>
<volume>vol. 82</volume>
<year>2012</year>
<publisher-loc>Amsterdam</publisher-loc>
<publisher-name>Elsevier</publisher-name>
<fpage>91</fpage>
<lpage>118</lpage>
</element-citation>
</ref>
<ref id="ref-6">
<label>Breitbart et al. (2007)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breitbart</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>LR</given-names>
</name>
<name>
<surname>Suttle</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Exploring the vast diversity of marine viruses</article-title>
<source>Oceanography</source>
<year>2007</year>
<volume>20</volume>
<fpage>135</fpage>
<lpage>139</lpage>
<pub-id pub-id-type="doi">10.5670/oceanog.2007.58</pub-id>
</element-citation>
</ref>
<ref id="ref-7">
<label>Breitbart & Rohwer (2005)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breitbart</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Here a virus, there a virus, everywhere the same virus?</article-title>
<source>Trends in Microbiology</source>
<year>2005</year>
<volume>13</volume>
<fpage>278</fpage>
<lpage>284</lpage>
<pub-id pub-id-type="doi">10.1016/j.tim.2005.04.003</pub-id>
<pub-id pub-id-type="pmid">15936660</pub-id>
</element-citation>
</ref>
<ref id="ref-8">
<label>Brum et al. (2015)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brum</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Ignacio-Espinoza</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Roux</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Doulcier</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Acinas</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Alberti</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Chaffron</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Coppola</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Cruaud</surname>
<given-names>C</given-names>
</name>
<name>
<surname>de Vargas</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gasol</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Gorsky</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gregory</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Guidi</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hingamp</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Iudicone</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Not</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Ogata</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Pesant</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Poulos</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Schwenck</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Speich</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Dimier</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Picheral</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Searson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kandels-Lewis</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Coordinators</surname>
<given-names>TO</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bowler</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Karsenti</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sunagawa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wincker</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Patterns and ecological drivers of ocean viral communities</article-title>
<source>Science</source>
<issue>348</issue>
<year>2015</year>
<volume>22</volume>
<fpage>6237</fpage>
<pub-id pub-id-type="doi">10.1126/science.1261498</pub-id>
</element-citation>
</ref>
<ref id="ref-9">
<label>Brum & Sullivan (2015)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brum</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Rising to the challenge: accelerated pace of discovery transforms marine virology</article-title>
<source>Nature Reviews Microbiology</source>
<year>2015</year>
<volume>13</volume>
<fpage>147</fpage>
<lpage>159</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro3404</pub-id>
</element-citation>
</ref>
<ref id="ref-10">
<label>Busby, Kristensen & Koonin (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Busby</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kristensen</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Contribution of phage-derived genomic islands to the virulence of facultative bacterial pathogens</article-title>
<source>Environmental Microbiology</source>
<year>2013</year>
<volume>15</volume>
<fpage>307</fpage>
<lpage>312</lpage>
<pub-id pub-id-type="doi">10.1111/j.1462-2920.2012.02886.x</pub-id>
<pub-id pub-id-type="pmid">23035931</pub-id>
</element-citation>
</ref>
<ref id="ref-11">
<label>Bush et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bush</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Courvalin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dantas</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Davies</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Eisenstein</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Huovinen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Jacoby</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Kishony</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kreiswirth</surname>
<given-names>BN</given-names>
</name>
<name>
<surname>Kutter</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lerner</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lomovskaya</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Mobashery</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Piddock</surname>
<given-names>LJV</given-names>
</name>
<name>
<surname>Projan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Tomasz</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tulkens</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Walsh</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Watson</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Witkowski</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Witte</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Wright</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Yeh</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zgurskaya</surname>
<given-names>HI</given-names>
</name>
</person-group>
<article-title>Tackling antibiotic resistance</article-title>
<source>Nature Reviews Microbiology</source>
<year>2011</year>
<volume>9</volume>
<fpage>894</fpage>
<lpage>896</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro2693</pub-id>
</element-citation>
</ref>
<ref id="ref-12">
<label>Canchaya, Fournous & Brüssow (2004)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Canchaya</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Fournous</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Brüssow</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>The impact of prophages on bacterial chromosomes</article-title>
<source>Molecular Microbiology</source>
<year>2004</year>
<volume>53</volume>
<fpage>9</fpage>
<lpage>18</lpage>
<pub-id pub-id-type="doi">10.1111/j.1365-2958.2004.04113.x</pub-id>
<pub-id pub-id-type="pmid">15225299</pub-id>
</element-citation>
</ref>
<ref id="ref-13">
<label>Casjens (2003)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Casjens</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Prophages and bacterial genomics: what have we learned so far?</article-title>
<source>Molecular Microbiology</source>
<year>2003</year>
<volume>49</volume>
<fpage>277</fpage>
<lpage>300</lpage>
<pub-id pub-id-type="doi">10.1046/j.1365-2958.2003.03580.x</pub-id>
<pub-id pub-id-type="pmid">12886937</pub-id>
</element-citation>
</ref>
<ref id="ref-14">
<label>Delcher, Salzberg & Phillippy (2003)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Phillippy</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Using MUMmer to identify similar regions in large sequence sets</article-title>
<source>Current Protocols in Bioinformatics</source>
<year>2003</year>
<comment>(online)</comment>
<pub-id pub-id-type="doi">10.1002/0471250953.bi1003s00</pub-id>
</element-citation>
</ref>
<ref id="ref-15">
<label>Eddy (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Accelerated Profile HMM Searches</article-title>
<source>PLoS Computational Biology</source>
<year>2011</year>
<volume>7</volume>
<elocation-id>e985</elocation-id>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1002195</pub-id>
</element-citation>
</ref>
<ref id="ref-16">
<label>Edwards & Rohwer (2005)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Viral metagenomics</article-title>
<source>Nature Reviews Microbiology</source>
<year>2005</year>
<volume>3</volume>
<fpage>504</fpage>
<lpage>510</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro1163</pub-id>
</element-citation>
</ref>
<ref id="ref-17">
<label>Emerson et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Emerson</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Andrade</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>JF</given-names>
</name>
</person-group>
<article-title>Metagenomic assembly reveals dynamic viral populations in hypersaline systems</article-title>
<source>Applied and Environmental Microbiology</source>
<year>2012</year>
<volume>78</volume>
<fpage>6309</fpage>
<lpage>6320</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.01212-12</pub-id>
<pub-id pub-id-type="pmid">22773627</pub-id>
</element-citation>
</ref>
<ref id="ref-18">
<label>Enright, Van Dongen & Ouzounis (2002)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Enright</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Van Dongen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ouzounis</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>An efficient algorithm for large-scale detection of protein families</article-title>
<source>Nucleic Acids Research</source>
<year>2002</year>
<volume>30</volume>
<fpage>1575</fpage>
<lpage>1584</lpage>
<pub-id pub-id-type="doi">10.1093/nar/30.7.1575</pub-id>
<pub-id pub-id-type="pmid">11917018</pub-id>
</element-citation>
</ref>
<ref id="ref-19">
<label>Fouts (2006)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fouts</surname>
<given-names>DE</given-names>
</name>
</person-group>
<article-title>Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences</article-title>
<source>Nucleic Acids Research</source>
<year>2006</year>
<volume>34</volume>
<fpage>5839</fpage>
<lpage>5851</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkl732</pub-id>
<pub-id pub-id-type="pmid">17062630</pub-id>
</element-citation>
</ref>
<ref id="ref-20">
<label>Fuhrman (1999)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fuhrman</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>Marine viruses and their biogeochemical and ecological effects</article-title>
<source>Nature</source>
<year>1999</year>
<volume>399</volume>
<fpage>541</fpage>
<lpage>548</lpage>
<pub-id pub-id-type="doi">10.1038/21119</pub-id>
<pub-id pub-id-type="pmid">10376593</pub-id>
</element-citation>
</ref>
<ref id="ref-21">
<label>Goff et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goff</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Vaughn</surname>
<given-names>M</given-names>
</name>
<name>
<surname>McKay</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stapleton</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Gessler</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Matasci</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hanlon</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lenards</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Muir</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Merchant</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lowry</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mock</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Helmke</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kubach</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Narro</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hopkins</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Micklos</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hilgert</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Gonzales</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jordan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Skidmore</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Dooley</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cazes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>McLay</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Pasternak</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Koesterke</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Piel</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Grene</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Noutsos</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gendler</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lent</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S-J</given-names>
</name>
<name>
<surname>Kvilekval</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Manjunath</surname>
<given-names>BS</given-names>
</name>
<name>
<surname>Tannen</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sanderson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Welch</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Cranston</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Soltis</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Soltis</surname>
<given-names>D</given-names>
</name>
<name>
<surname>O’Meara</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ane</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Brutnell</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kleibenstein</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>White</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Leebens-Mack</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Spalding</surname>
<given-names>EP</given-names>
</name>
<name>
<surname>Vision</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Lowenthal</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Enquist</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Boyle</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Akoglu</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Andrews</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ram</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ware</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stanzione</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>The iplant collaborative: cyberinfrastructure for plant biology</article-title>
<source>Frontiers in Plant Science</source>
<year>2011</year>
<volume>2</volume>
<fpage>34</fpage>
<pub-id pub-id-type="doi">10.3389/fpls.2011.00034</pub-id>
<pub-id pub-id-type="pmid">22645531</pub-id>
</element-citation>
</ref>
<ref id="ref-22">
<label>Hurwitz, Brum & Sullivan (2015)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hurwitz</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Brum</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Depth-stratified functional and taxonomic niche specialization in the core and “flexible” Pacific Ocean Virome</article-title>
<source>The ISME Journal</source>
<year>2015</year>
<volume>9</volume>
<fpage>472</fpage>
<lpage>484</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2014.143</pub-id>
<pub-id pub-id-type="pmid">25093636</pub-id>
</element-citation>
</ref>
<ref id="ref-23">
<label>Hurwitz, Hallam & Sullivan (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hurwitz</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Metabolic reprogramming by viruses in the sunlit and dark ocean</article-title>
<source>Genome Biology</source>
<year>2013</year>
<volume>14</volume>
<fpage>R123</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2013-14-11-r123</pub-id>
<pub-id pub-id-type="pmid">24200126</pub-id>
</element-citation>
</ref>
<ref id="ref-24">
<label>Jia et al. (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Xuan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>NeSSM: a next-generation sequencing simulator for metagenomics</article-title>
<source>PLoS ONE</source>
<year>2013</year>
<volume>8</volume>
<elocation-id>e985</elocation-id>
<pub-id pub-id-type="doi">10.1371/journal.pone.0075448</pub-id>
</element-citation>
</ref>
<ref id="ref-25">
<label>Kamke, Sczyrba & Ivanova (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kamke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sczyrba</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Single-cell genomics reveals complex carbohydrate degradation patterns in poribacterial symbionts of marine sponges</article-title>
<source>The ISME Journal</source>
<year>2013</year>
<volume>7</volume>
<fpage>2287</fpage>
<lpage>2300</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2013.111</pub-id>
<pub-id pub-id-type="pmid">23842652</pub-id>
</element-citation>
</ref>
<ref id="ref-26">
<label>Kashtan et al. (2014)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kashtan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Roggensack</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Rodrigue</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Biller</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Marttinen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Malmstrom</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Stocker</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Follows</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Biller</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus</article-title>
<source>Science</source>
<year>2014</year>
<volume>344</volume>
<fpage>416</fpage>
<lpage>420</lpage>
<pub-id pub-id-type="doi">10.1126/science.1248575</pub-id>
<pub-id pub-id-type="pmid">24763590</pub-id>
</element-citation>
</ref>
<ref id="ref-27">
<label>Koonin, Senkevich & Dolja (2006)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
<name>
<surname>Senkevich</surname>
<given-names>TG</given-names>
</name>
<name>
<surname>Dolja</surname>
<given-names>VV</given-names>
</name>
</person-group>
<article-title>The ancient Virus World and evolution of cells</article-title>
<source>Biology Direct</source>
<year>2006</year>
<volume>1</volume>
<fpage>29</fpage>
<pub-id pub-id-type="doi">10.1186/1745-6150-1-29</pub-id>
<pub-id pub-id-type="pmid">16984643</pub-id>
</element-citation>
</ref>
<ref id="ref-28">
<label>Labonté et al. (2015)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Labonté</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Swan</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Poulos</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Koren</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Woyke</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Wommack</surname>
<given-names>EK</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Single cell genomics-based analysis of virus-host interactions in marine surface bacterioplankton</article-title>
<source>The ISME Journal</source>
<year>2015</year>
<comment>Epub ahead of print 7 April 2015</comment>
<pub-id pub-id-type="doi">10.1038/ismej.2015.48</pub-id>
</element-citation>
</ref>
<ref id="ref-29">
<label>Letarov & Kulikov (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Letarov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kulikov</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>The bacteriophages in human- and animal body-associated microbial communities</article-title>
<source>Journal of Applied Microbiology</source>
<year>2009</year>
<volume>107</volume>
<fpage>1</fpage>
<lpage>13</lpage>
<pub-id pub-id-type="doi">10.1111/j.1365-2672.2009.04143.x</pub-id>
<pub-id pub-id-type="pmid">19239553</pub-id>
</element-citation>
</ref>
<ref id="ref-30">
<label>Lima-Mendez et al. (2008)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lima-Mendez</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Van Helden</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Toussaint</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Leplae</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Prophinder: a computational tool for prophage prediction in prokaryotic genomes</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>863</fpage>
<lpage>865</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn043</pub-id>
<pub-id pub-id-type="pmid">18238785</pub-id>
</element-citation>
</ref>
<ref id="ref-31">
<label>Lindell et al. (2005)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lindell</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>ZI</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
</person-group>
<article-title>Photosynthesis genes in marine viruses yield proteins during host infection</article-title>
<source>Nature</source>
<year>2005</year>
<volume>438</volume>
<fpage>86</fpage>
<lpage>89</lpage>
<pub-id pub-id-type="doi">10.1038/nature04111</pub-id>
<pub-id pub-id-type="pmid">16222247</pub-id>
</element-citation>
</ref>
<ref id="ref-32">
<label>Minot et al. (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Minot</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bryson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Chehoud</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Bushman</surname>
<given-names>FD</given-names>
</name>
</person-group>
<article-title>Rapid evolution of the human gut virome</article-title>
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<year>2013</year>
<volume>110</volume>
<fpage>12450</fpage>
<lpage>12455</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1300833110</pub-id>
<pub-id pub-id-type="pmid">23836644</pub-id>
</element-citation>
</ref>
<ref id="ref-33">
<label>Narasingarao et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Narasingarao</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Podell</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ugalde</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Brochier-Armanet</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Emerson</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Brocks</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>EE</given-names>
</name>
</person-group>
<article-title>De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities</article-title>
<source>The ISME Journal</source>
<year>2012</year>
<volume>6</volume>
<fpage>81</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2011.78</pub-id>
<pub-id pub-id-type="pmid">21716304</pub-id>
</element-citation>
</ref>
<ref id="ref-34">
<label>Nobrega et al. (2015)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nobrega</surname>
<given-names>FL</given-names>
</name>
<name>
<surname>Costa</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Kluskens</surname>
<given-names>LD</given-names>
</name>
<name>
<surname>Azeredo</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Revisiting phage therapy: new applications for old resources</article-title>
<source>Trends in Microbiology</source>
<year>2015</year>
<volume>23</volume>
<fpage>185</fpage>
<lpage>191</lpage>
<pub-id pub-id-type="doi">10.1016/j.tim.2015.01.006</pub-id>
<pub-id pub-id-type="pmid">25708933</pub-id>
</element-citation>
</ref>
<ref id="ref-35">
<label>Noguchi, Taniguchi & Itoh (2008)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noguchi</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Taniguchi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Itoh</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes</article-title>
<source>DNA Research</source>
<year>2008</year>
<volume>15</volume>
<fpage>387</fpage>
<lpage>396</lpage>
<pub-id pub-id-type="doi">10.1093/dnares/dsn027</pub-id>
<pub-id pub-id-type="pmid">18940874</pub-id>
</element-citation>
</ref>
<ref id="ref-36">
<label>Peng et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peng</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>HCM</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Chin</surname>
<given-names>FYL</given-names>
</name>
</person-group>
<article-title>IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>1420</fpage>
<lpage>1428</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts174</pub-id>
<pub-id pub-id-type="pmid">22495754</pub-id>
</element-citation>
</ref>
<ref id="ref-37">
<label>Pride et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pride</surname>
<given-names>DT</given-names>
</name>
<name>
<surname>Salzman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Haynes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Davis-Long</surname>
<given-names>C</given-names>
</name>
<name>
<surname>White</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Loomer</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Armitage</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Relman</surname>
<given-names>DA</given-names>
</name>
</person-group>
<article-title>Evidence of a robust resident bacteriophage population revealed through analysis of the human salivary virome</article-title>
<source>The ISME Journal</source>
<year>2011</year>
<volume>6</volume>
<fpage>915</fpage>
<lpage>926</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2011.169</pub-id>
<pub-id pub-id-type="pmid">22158393</pub-id>
</element-citation>
</ref>
<ref id="ref-38">
<label>Rappé & Giovannoni (2003)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rappé</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>The uncultured microbial majority</article-title>
<source>Annual Review of Microbiology</source>
<year>2003</year>
<volume>57</volume>
<fpage>369</fpage>
<lpage>394</lpage>
<pub-id pub-id-type="doi">10.1146/annurev.micro.57.030502.090759</pub-id>
</element-citation>
</ref>
<ref id="ref-39">
<label>Reyes et al. (2010)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reyes</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Haynes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hanson</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Angly</surname>
<given-names>FE</given-names>
</name>
<name>
<surname>Heath</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<article-title>Viruses in the faecal microbiota of monozygotic twins and their mothers</article-title>
<source>Nature</source>
<year>2010</year>
<volume>466</volume>
<fpage>334</fpage>
<lpage>338</lpage>
<pub-id pub-id-type="doi">10.1038/nature09199</pub-id>
<pub-id pub-id-type="pmid">20631792</pub-id>
</element-citation>
</ref>
<ref id="ref-40">
<label>Reyes et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reyes</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Semenkovich</surname>
<given-names>NP</given-names>
</name>
<name>
<surname>Whiteson</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<article-title>Going viral: next-generation sequencing applied to phage populations in the human gut</article-title>
<source>Nature Reviews Microbiology</source>
<year>2012</year>
<volume>10</volume>
<fpage>607</fpage>
<lpage>617</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro2853</pub-id>
</element-citation>
</ref>
<ref id="ref-41">
<label>Rinke et al. (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rinke</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Schwientek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sczyrba</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>IJ</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>J-F</given-names>
</name>
<name>
<surname>Darling</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Malfatti</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Swan</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Gies</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Dodsworth</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Hedlund</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Tsiamis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sievert</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W-T</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Woyke</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Insights into the phylogeny and coding potential of microbial dark matter</article-title>
<source>Nature</source>
<year>2013</year>
<volume>499</volume>
<fpage>431</fpage>
<lpage>437</lpage>
<pub-id pub-id-type="doi">10.1038/nature12352</pub-id>
<pub-id pub-id-type="pmid">23851394</pub-id>
</element-citation>
</ref>
<ref id="ref-42">
<label>Rodriguez-Valera et al. (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodriguez-Valera</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Martin-Cuadrado</surname>
<given-names>A-B</given-names>
</name>
<name>
<surname>Rodriguez-Brito</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Pasić</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Thingstad</surname>
<given-names>TF</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Mira</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Explaining microbial population genomics through phage predation</article-title>
<source>Nature Reviews Microbiology</source>
<year>2009</year>
<volume>7</volume>
<fpage>828</fpage>
<lpage>836</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro2235</pub-id>
</element-citation>
</ref>
<ref id="ref-43">
<label>Rohwer & Thurber (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Thurber</surname>
<given-names>RV</given-names>
</name>
</person-group>
<article-title>Viruses manipulate the marine environment</article-title>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<fpage>207</fpage>
<lpage>212</lpage>
<pub-id pub-id-type="doi">10.1038/nature08060</pub-id>
<pub-id pub-id-type="pmid">19444207</pub-id>
</element-citation>
</ref>
<ref id="ref-44">
<label>Roux et al. (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roux</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Krupovic</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Debroas</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Forterre</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Enault</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences</article-title>
<source>Open Biology</source>
<year>2013</year>
<volume>3</volume>
<fpage>130160</fpage>
<pub-id pub-id-type="doi">10.1098/rsob.130160</pub-id>
<pub-id pub-id-type="pmid">24335607</pub-id>
</element-citation>
</ref>
<ref id="ref-45">
<label>Roux et al. (2014a)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roux</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tournayre</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mahul</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Debroas</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Enault</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Metavir 2: new tools for viral metagenome comparison and assembled virome analysis</article-title>
<source>BMC Bioinformatics</source>
<year>2014a</year>
<volume>15</volume>
<fpage>1</fpage>
<lpage>12</lpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-15-76</pub-id>
<pub-id pub-id-type="pmid">24383880</pub-id>
</element-citation>
</ref>
<ref id="ref-46">
<label>Roux et al. (2014b)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roux</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hawley</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Torres Beltran</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Scofield</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schwientek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Woyke</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta- genomics</article-title>
<source>eLife</source>
<year>2014b</year>
<volume>3</volume>
<fpage>1</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.7554/eLife.03125</pub-id>
</element-citation>
</ref>
<ref id="ref-47">
<label>Sharon et al. (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Alperovitch</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Haynes</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Glaser</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Atamna-Ismaeel</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pinter</surname>
<given-names>RY</given-names>
</name>
<name>
<surname>Partensky</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
<name>
<surname>Wolf</surname>
<given-names>YI</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Béjà</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Photosystem I gene cassettes are present in marine virus genomes</article-title>
<source>Nature</source>
<year>2009</year>
<volume>461</volume>
<fpage>258</fpage>
<lpage>262</lpage>
<pub-id pub-id-type="doi">10.1038/nature08284</pub-id>
<pub-id pub-id-type="pmid">19710652</pub-id>
</element-citation>
</ref>
<ref id="ref-48">
<label>Sharon et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Battchikova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Aro</surname>
<given-names>E-M</given-names>
</name>
<name>
<surname>Giglione</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Meinnel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Glaser</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pinter</surname>
<given-names>RY</given-names>
</name>
<name>
<surname>Breitbart</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rohwer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Béjà</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Comparative metagenomics of microbial traits within oceanic viral communities</article-title>
<source>The ISME journal</source>
<year>2011</year>
<volume>5</volume>
<fpage>1178</fpage>
<lpage>1190</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2011.2</pub-id>
<pub-id pub-id-type="pmid">21307954</pub-id>
</element-citation>
</ref>
<ref id="ref-49">
<label>Sullivan et al. (2006)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Lindell</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>LR</given-names>
</name>
<name>
<surname>Bielawski</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
</person-group>
<article-title>Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts</article-title>
<source>PLoS Biology</source>
<year>2006</year>
<volume>4</volume>
<elocation-id>e985</elocation-id>
<pub-id pub-id-type="doi">10.1371/journal.pbio.0040234</pub-id>
</element-citation>
</ref>
<ref id="ref-50">
<label>Suttle (2002)</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Suttle</surname>
<given-names>C</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Whitton</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Potts</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Cyanophages and their role in the ecology of cyanobacteria</article-title>
<source>The ecology of cyanobacteria</source>
<year>2002</year>
<publisher-loc>Dordrecht</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>564</fpage>
<lpage>584</lpage>
</element-citation>
</ref>
<ref id="ref-51">
<label>Suttle (2007)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Suttle</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>Marine viruses–major players in the global ecosystem</article-title>
<source>Nature Reviews Microbiology</source>
<year>2007</year>
<volume>5</volume>
<fpage>801</fpage>
<lpage>812</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro1750</pub-id>
</element-citation>
</ref>
<ref id="ref-52">
<label>Swan et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Swan</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Martinez-Garcia</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Preston</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Sczyrba</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Woyke</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lamy</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Reinthaler</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Poulton</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Masland</surname>
<given-names>EDP</given-names>
</name>
<name>
<surname>Gomez</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Sieracki</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>DeLong</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Herndl</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the dark ocean</article-title>
<source>Science</source>
<year>2011</year>
<volume>333</volume>
<fpage>1296</fpage>
<lpage>1300</lpage>
<pub-id pub-id-type="doi">10.1126/science.1203690</pub-id>
<pub-id pub-id-type="pmid">21885783</pub-id>
</element-citation>
</ref>
<ref id="ref-53">
<label>Thompson et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thompson</surname>
<given-names>LR</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Kelly</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Singer</surname>
<given-names>AU</given-names>
</name>
<name>
<surname>Stubbe</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
</person-group>
<article-title>Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism</article-title>
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<year>2011</year>
<volume>108</volume>
<fpage>E757</fpage>
<lpage>E764</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1102164108</pub-id>
<pub-id pub-id-type="pmid">21844365</pub-id>
</element-citation>
</ref>
<ref id="ref-54">
<label>Waldor & Mekalanos (1996)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Waldor</surname>
<given-names>MK</given-names>
</name>
<name>
<surname>Mekalanos</surname>
<given-names>JJ</given-names>
</name>
</person-group>
<article-title>Lysogenic conversion by a filamentous phage encoding cholera toxin</article-title>
<source>Science</source>
<year>1996</year>
<volume>272</volume>
<fpage>1910</fpage>
<lpage>1914</lpage>
<pub-id pub-id-type="doi">10.1126/science.272.5270.1910</pub-id>
<pub-id pub-id-type="pmid">8658163</pub-id>
</element-citation>
</ref>
<ref id="ref-55">
<label>Weinbauer (2004)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weinbauer</surname>
<given-names>MG</given-names>
</name>
</person-group>
<article-title>Ecology of prokaryotic viruses</article-title>
<source>FEMS Microbiology Reviews</source>
<year>2004</year>
<volume>28</volume>
<fpage>127</fpage>
<lpage>181</lpage>
<pub-id pub-id-type="doi">10.1016/j.femsre.2003.08.001</pub-id>
<pub-id pub-id-type="pmid">15109783</pub-id>
</element-citation>
</ref>
<ref id="ref-56">
<label>Winstanley et al. (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Winstanley</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Langille</surname>
<given-names>MGI</given-names>
</name>
<name>
<surname>Fothergill</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Kukavica-Ibrulj</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Paradis-Bleau</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sanschagrin</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Thomson</surname>
<given-names>NR</given-names>
</name>
<name>
<surname>Winsor</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Quail</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Lennard</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Bignell</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Seeger</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Saunders</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Parkhill</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hancock</surname>
<given-names>REW</given-names>
</name>
<name>
<surname>Brinkman</surname>
<given-names>FSL</given-names>
</name>
<name>
<surname>Levesque</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>Newly introduced genomic prophage islands are critical determinants of
<italic>in vivo</italic>
competitiveness in the Liverpool Epidemic Strain of Pseudomonas aeruginosa</article-title>
<source>Genome Research</source>
<year>2009</year>
<volume>19</volume>
<fpage>12</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="doi">10.1101/gr.086082.108</pub-id>
<pub-id pub-id-type="pmid">19047519</pub-id>
</element-citation>
</ref>
<ref id="ref-57">
<label>Wommack & Colwell (2000)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wommack</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Colwell</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>Virioplankton: viruses in aquatic ecosystems</article-title>
<source>Microbiology and Molecular Biology Reviews</source>
<year>2000</year>
<volume>64</volume>
<fpage>69</fpage>
<lpage>114</lpage>
<pub-id pub-id-type="doi">10.1128/MMBR.64.1.69-114.2000</pub-id>
<pub-id pub-id-type="pmid">10704475</pub-id>
</element-citation>
</ref>
<ref id="ref-58">
<label>Yoon et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoon</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Price</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rajah</surname>
<given-names>VD</given-names>
</name>
<name>
<surname>Sieracki</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>EC</given-names>
</name>
<name>
<surname>Duffy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bhattacharya</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Single-cell genomics reveals organismal interactions in uncultivated marine protists</article-title>
<source>Science</source>
<year>2011</year>
<volume>332</volume>
<fpage>714</fpage>
<lpage>717</lpage>
<pub-id pub-id-type="doi">10.1126/science.1203163</pub-id>
<pub-id pub-id-type="pmid">21551060</pub-id>
</element-citation>
</ref>
<ref id="ref-59">
<label>Zhou et al. (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lynch</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Dennis</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Wishart</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>PHAST: a fast phage search tool</article-title>
<source>Nucleic Acids Research</source>
<year>2011</year>
<volume>39</volume>
<fpage>W347</fpage>
<lpage>W352</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkr485</pub-id>
<pub-id pub-id-type="pmid">21672955</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000089 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000089 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4451026
   |texte=   VirSorter: mining viral signal from microbial genomic data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26038737" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024