Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Viral dark matter and virus–host interactions resolved from publicly available microbial genomes

Identifieur interne : 000092 ( Pmc/Curation ); précédent : 000091; suivant : 000093

Viral dark matter and virus–host interactions resolved from publicly available microbial genomes

Auteurs : Simon Roux [États-Unis] ; Steven J. Hallam [Canada] ; Tanja Woyke [États-Unis] ; Matthew B. Sullivan [États-Unis]

Source :

RBID : PMC:4533152

Abstract

The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.

DOI:http://dx.doi.org/10.7554/eLife.08490.001


Url:
DOI: 10.7554/eLife.08490
PubMed: 26200428
PubMed Central: 4533152

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4533152

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Viral dark matter and virus–host interactions resolved from publicly available microbial genomes</title>
<author>
<name sortKey="Roux, Simon" sort="Roux, Simon" uniqKey="Roux S" first="Simon" last="Roux">Simon Roux</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1">
<institution content-type="dept">Department of Ecology and Evolutionary Biology</institution>
,
<institution>University of Arizona</institution>
,
<addr-line>Tucson</addr-line>
,
<country>United States</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Hallam, Steven J" sort="Hallam, Steven J" uniqKey="Hallam S" first="Steven J" last="Hallam">Steven J. Hallam</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<institution content-type="dept">Department of Microbiology and Immunology</institution>
,
<institution>University of British Columbia</institution>
,
<addr-line>Vancouver</addr-line>
,
<country>Canada</country>
</nlm:aff>
<country xml:lang="fr">Canada</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff3">
<institution content-type="dept">Graduate Program in Bioinformatics</institution>
,
<institution>University of British Columbia</institution>
,
<addr-line>Vancouver</addr-line>
,
<country>Canada</country>
</nlm:aff>
<country xml:lang="fr">Canada</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Woyke, Tanja" sort="Woyke, Tanja" uniqKey="Woyke T" first="Tanja" last="Woyke">Tanja Woyke</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4">
<institution>U.S Department of Energy Joint Genome Institute</institution>
,
<addr-line>Walnut Creek</addr-line>
,
<country>United States</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Sullivan, Matthew B" sort="Sullivan, Matthew B" uniqKey="Sullivan M" first="Matthew B" last="Sullivan">Matthew B. Sullivan</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1">
<institution content-type="dept">Department of Ecology and Evolutionary Biology</institution>
,
<institution>University of Arizona</institution>
,
<addr-line>Tucson</addr-line>
,
<country>United States</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26200428</idno>
<idno type="pmc">4533152</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4533152</idno>
<idno type="RBID">PMC:4533152</idno>
<idno type="doi">10.7554/eLife.08490</idno>
<date when="????">????</date>
<idno type="wicri:Area/Pmc/Corpus">000092</idno>
<idno type="wicri:Area/Pmc/Curation">000092</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Viral dark matter and virus–host interactions resolved from publicly available microbial genomes</title>
<author>
<name sortKey="Roux, Simon" sort="Roux, Simon" uniqKey="Roux S" first="Simon" last="Roux">Simon Roux</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1">
<institution content-type="dept">Department of Ecology and Evolutionary Biology</institution>
,
<institution>University of Arizona</institution>
,
<addr-line>Tucson</addr-line>
,
<country>United States</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Hallam, Steven J" sort="Hallam, Steven J" uniqKey="Hallam S" first="Steven J" last="Hallam">Steven J. Hallam</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<institution content-type="dept">Department of Microbiology and Immunology</institution>
,
<institution>University of British Columbia</institution>
,
<addr-line>Vancouver</addr-line>
,
<country>Canada</country>
</nlm:aff>
<country xml:lang="fr">Canada</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff3">
<institution content-type="dept">Graduate Program in Bioinformatics</institution>
,
<institution>University of British Columbia</institution>
,
<addr-line>Vancouver</addr-line>
,
<country>Canada</country>
</nlm:aff>
<country xml:lang="fr">Canada</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Woyke, Tanja" sort="Woyke, Tanja" uniqKey="Woyke T" first="Tanja" last="Woyke">Tanja Woyke</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4">
<institution>U.S Department of Energy Joint Genome Institute</institution>
,
<addr-line>Walnut Creek</addr-line>
,
<country>United States</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Sullivan, Matthew B" sort="Sullivan, Matthew B" uniqKey="Sullivan M" first="Matthew B" last="Sullivan">Matthew B. Sullivan</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1">
<institution content-type="dept">Department of Ecology and Evolutionary Biology</institution>
,
<institution>University of Arizona</institution>
,
<addr-line>Tucson</addr-line>
,
<country>United States</country>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">eLife</title>
<idno type="ISSN">2050-084X</idno>
<idno type="eISSN">2050-084X</idno>
<imprint>
<date when="????">????</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.</p>
<p>
<bold>DOI:</bold>
<ext-link ext-link-type="doi" xlink:href="10.7554/eLife.08490.001">http://dx.doi.org/10.7554/eLife.08490.001</ext-link>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Abedon, St" uniqKey="Abedon S">ST Abedon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Akhter, S" uniqKey="Akhter S">S Akhter</name>
</author>
<author>
<name sortKey="Aziz, Rk" uniqKey="Aziz R">RK Aziz</name>
</author>
<author>
<name sortKey="Edwards, Ra" uniqKey="Edwards R">RA Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Allers, E" uniqKey="Allers E">E Allers</name>
</author>
<author>
<name sortKey="Moraru, C" uniqKey="Moraru C">C Moraru</name>
</author>
<author>
<name sortKey="Duhaime, Mb" uniqKey="Duhaime M">MB Duhaime</name>
</author>
<author>
<name sortKey="Beneze, E" uniqKey="Beneze E">E Beneze</name>
</author>
<author>
<name sortKey="Solonenko, N" uniqKey="Solonenko N">N Solonenko</name>
</author>
<author>
<name sortKey="Canosa, Jb" uniqKey="Canosa J">JB Canosa</name>
</author>
<author>
<name sortKey="Amann, R" uniqKey="Amann R">R Amann</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Allers, E" uniqKey="Allers E">E Allers</name>
</author>
<author>
<name sortKey="Wright, Jj" uniqKey="Wright J">JJ Wright</name>
</author>
<author>
<name sortKey="Konwar, Km" uniqKey="Konwar K">KM Konwar</name>
</author>
<author>
<name sortKey="Howes, Cg" uniqKey="Howes C">CG Howes</name>
</author>
<author>
<name sortKey="Beneze, E" uniqKey="Beneze E">E Beneze</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andersson, Af" uniqKey="Andersson A">AF Andersson</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
<author>
<name sortKey="Nurk, S" uniqKey="Nurk S">S Nurk</name>
</author>
<author>
<name sortKey="Antipov, D" uniqKey="Antipov D">D Antipov</name>
</author>
<author>
<name sortKey="Gurevich, Aa" uniqKey="Gurevich A">AA Gurevich</name>
</author>
<author>
<name sortKey="Dvorkin, M" uniqKey="Dvorkin M">M Dvorkin</name>
</author>
<author>
<name sortKey="Kulikov, As" uniqKey="Kulikov A">AS Kulikov</name>
</author>
<author>
<name sortKey="Lesin, Vm" uniqKey="Lesin V">VM Lesin</name>
</author>
<author>
<name sortKey="Nikolenko, Si" uniqKey="Nikolenko S">SI Nikolenko</name>
</author>
<author>
<name sortKey="Pham, S" uniqKey="Pham S">S Pham</name>
</author>
<author>
<name sortKey="Prjibelski, Ad" uniqKey="Prjibelski A">AD Prjibelski</name>
</author>
<author>
<name sortKey="Pyshkin, Av" uniqKey="Pyshkin A">AV Pyshkin</name>
</author>
<author>
<name sortKey="Sirotkin, Av" uniqKey="Sirotkin A">AV Sirotkin</name>
</author>
<author>
<name sortKey="Vyahhi, N" uniqKey="Vyahhi N">N Vyahhi</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
<author>
<name sortKey="Alekseyev, Ma" uniqKey="Alekseyev M">MA Alekseyev</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bastias, R" uniqKey="Bastias R">R Bastías</name>
</author>
<author>
<name sortKey="Higuera, G" uniqKey="Higuera G">G Higuera</name>
</author>
<author>
<name sortKey="Sierralta, W" uniqKey="Sierralta W">W Sierralta</name>
</author>
<author>
<name sortKey="Espejo, Rt" uniqKey="Espejo R">RT Espejo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brum, J" uniqKey="Brum J">J Brum</name>
</author>
<author>
<name sortKey="Ignacio Espinoza, J" uniqKey="Ignacio Espinoza J">J Ignacio-Espinoza</name>
</author>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Doulcier, G" uniqKey="Doulcier G">G Doulcier</name>
</author>
<author>
<name sortKey="Acinas, Sg" uniqKey="Acinas S">SG Acinas</name>
</author>
<author>
<name sortKey="Alberti, A" uniqKey="Alberti A">A Alberti</name>
</author>
<author>
<name sortKey="Chaffron, S" uniqKey="Chaffron S">S Chaffron</name>
</author>
<author>
<name sortKey="Cruaud, C" uniqKey="Cruaud C">C Cruaud</name>
</author>
<author>
<name sortKey="De Vargas, C" uniqKey="De Vargas C">C de Vargas</name>
</author>
<author>
<name sortKey="Gasol, Jm" uniqKey="Gasol J">JM Gasol</name>
</author>
<author>
<name sortKey="Gorsky, G" uniqKey="Gorsky G">G Gorsky</name>
</author>
<author>
<name sortKey="Gregory, Ac" uniqKey="Gregory A">AC Gregory</name>
</author>
<author>
<name sortKey="Ogata, H" uniqKey="Ogata H">H Ogata</name>
</author>
<author>
<name sortKey="Pesant, S" uniqKey="Pesant S">S Pesant</name>
</author>
<author>
<name sortKey="Poulos, Bt" uniqKey="Poulos B">BT Poulos</name>
</author>
<author>
<name sortKey="Schwenck, Sm" uniqKey="Schwenck S">SM Schwenck</name>
</author>
<author>
<name sortKey="Speich, S" uniqKey="Speich S">S Speich</name>
</author>
<author>
<name sortKey="Dimier, C" uniqKey="Dimier C">C Dimier</name>
</author>
<author>
<name sortKey="Kandels Lewis, S" uniqKey="Kandels Lewis S">S Kandels-Lewis</name>
</author>
<author>
<name sortKey="Picheral, M" uniqKey="Picheral M">M Picheral</name>
</author>
<author>
<name sortKey="Searson, S" uniqKey="Searson S">S Searson</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author>
<name sortKey="Bowler, C" uniqKey="Bowler C">C Bowler</name>
</author>
<author>
<name sortKey="Sunagawa, S" uniqKey="Sunagawa S">S Sunagawa</name>
</author>
<author>
<name sortKey="Wincker, P" uniqKey="Wincker P">P Wincker</name>
</author>
<author>
<name sortKey="Karsenti, E" uniqKey="Karsenti E">E Karsenti</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brum, Jr" uniqKey="Brum J">JR Brum</name>
</author>
<author>
<name sortKey="Jeffrey Morris, J" uniqKey="Jeffrey Morris J">J Jeffrey Morris</name>
</author>
<author>
<name sortKey="Decima, M" uniqKey="Decima M">M Décima</name>
</author>
<author>
<name sortKey="Stukel, Mr" uniqKey="Stukel M">MR Stukel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brum, Jr" uniqKey="Brum J">JR Brum</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Canchaya, C" uniqKey="Canchaya C">C Canchaya</name>
</author>
<author>
<name sortKey="Fournous, G" uniqKey="Fournous G">G Fournous</name>
</author>
<author>
<name sortKey="Brussow, H" uniqKey="Brussow H">H Brüssow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carbone, A" uniqKey="Carbone A">A Carbone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cardinale, Dj" uniqKey="Cardinale D">DJ Cardinale</name>
</author>
<author>
<name sortKey="Duffy, S" uniqKey="Duffy S">S Duffy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carey Smith, Gv" uniqKey="Carey Smith G">GV Carey-Smith</name>
</author>
<author>
<name sortKey="Billington, C" uniqKey="Billington C">C Billington</name>
</author>
<author>
<name sortKey="Cornelius, Aj" uniqKey="Cornelius A">AJ Cornelius</name>
</author>
<author>
<name sortKey="Hudson, Ja" uniqKey="Hudson J">JA Hudson</name>
</author>
<author>
<name sortKey="Heinemann, Ja" uniqKey="Heinemann J">JA Heinemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Casjens, S" uniqKey="Casjens S">S Casjens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Castelle, Cj" uniqKey="Castelle C">CJ Castelle</name>
</author>
<author>
<name sortKey="Hug, La" uniqKey="Hug L">LA Hug</name>
</author>
<author>
<name sortKey="Wrighton, Kc" uniqKey="Wrighton K">KC Wrighton</name>
</author>
<author>
<name sortKey="Thomas, Bc" uniqKey="Thomas B">BC Thomas</name>
</author>
<author>
<name sortKey="Williams, Kh" uniqKey="Williams K">KH Williams</name>
</author>
<author>
<name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author>
<name sortKey="Tringe, Sg" uniqKey="Tringe S">SG Tringe</name>
</author>
<author>
<name sortKey="Singer, Sw" uniqKey="Singer S">SW Singer</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clemente, Jc" uniqKey="Clemente J">JC Clemente</name>
</author>
<author>
<name sortKey="Ursell, Lk" uniqKey="Ursell L">LK Ursell</name>
</author>
<author>
<name sortKey="Parfrey, Lw" uniqKey="Parfrey L">LW Parfrey</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davis, Ma" uniqKey="Davis M">MA Davis</name>
</author>
<author>
<name sortKey="Martin, Ka" uniqKey="Martin K">KA Martin</name>
</author>
<author>
<name sortKey="Austin, Sj" uniqKey="Austin S">SJ Austin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Delong, Ef" uniqKey="Delong E">EF DeLong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deng, L" uniqKey="Deng L">L Deng</name>
</author>
<author>
<name sortKey="Ignacio Espinoza, Jc" uniqKey="Ignacio Espinoza J">JC Ignacio-Espinoza</name>
</author>
<author>
<name sortKey="Gregory, A" uniqKey="Gregory A">A Gregory</name>
</author>
<author>
<name sortKey="Poulos, Bt" uniqKey="Poulos B">BT Poulos</name>
</author>
<author>
<name sortKey="Weitz, Js" uniqKey="Weitz J">JS Weitz</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Diemer, Gs" uniqKey="Diemer G">GS Diemer</name>
</author>
<author>
<name sortKey="Stedman, Km" uniqKey="Stedman K">KM Stedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Emerson, Jb" uniqKey="Emerson J">JB Emerson</name>
</author>
<author>
<name sortKey="Thomas, Bc" uniqKey="Thomas B">BC Thomas</name>
</author>
<author>
<name sortKey="Alvarez, W" uniqKey="Alvarez W">W Alvarez</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Enav, H" uniqKey="Enav H">H Enav</name>
</author>
<author>
<name sortKey="Beja, O" uniqKey="Beja O">O Béjà</name>
</author>
<author>
<name sortKey="Mandel Gutfreund, Y" uniqKey="Mandel Gutfreund Y">Y Mandel-Gutfreund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Enright, Aj" uniqKey="Enright A">AJ Enright</name>
</author>
<author>
<name sortKey="Van Dongen, S" uniqKey="Van Dongen S">S Van Dongen</name>
</author>
<author>
<name sortKey="Ouzounis, Ca" uniqKey="Ouzounis C">CA Ouzounis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Falkowski, Pg" uniqKey="Falkowski P">PG Falkowski</name>
</author>
<author>
<name sortKey="Fenchel, T" uniqKey="Fenchel T">T Fenchel</name>
</author>
<author>
<name sortKey="Delong, Ef" uniqKey="Delong E">EF Delong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fischer, Cr" uniqKey="Fischer C">CR Fischer</name>
</author>
<author>
<name sortKey="Yoichi, M" uniqKey="Yoichi M">M Yoichi</name>
</author>
<author>
<name sortKey="Unno, H" uniqKey="Unno H">H Unno</name>
</author>
<author>
<name sortKey="Tanji, Y" uniqKey="Tanji Y">Y Tanji</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flores, Co" uniqKey="Flores C">CO Flores</name>
</author>
<author>
<name sortKey="Meyer, Jr" uniqKey="Meyer J">JR Meyer</name>
</author>
<author>
<name sortKey="Valverde, S" uniqKey="Valverde S">S Valverde</name>
</author>
<author>
<name sortKey="Farr, L" uniqKey="Farr L">L Farr</name>
</author>
<author>
<name sortKey="Weitz, Js" uniqKey="Weitz J">JS Weitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flores, Co" uniqKey="Flores C">CO Flores</name>
</author>
<author>
<name sortKey="Valverde, S" uniqKey="Valverde S">S Valverde</name>
</author>
<author>
<name sortKey="Weitz, Js" uniqKey="Weitz J">JS Weitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Forterre, P" uniqKey="Forterre P">P Forterre</name>
</author>
<author>
<name sortKey="Prangishvili, D" uniqKey="Prangishvili D">D Prangishvili</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fouts, De" uniqKey="Fouts D">DE Fouts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garrett, Ra" uniqKey="Garrett R">RA Garrett</name>
</author>
<author>
<name sortKey="Prangishvili, D" uniqKey="Prangishvili D">D Prangishvili</name>
</author>
<author>
<name sortKey="Shah, Sa" uniqKey="Shah S">SA Shah</name>
</author>
<author>
<name sortKey="Reuter, M" uniqKey="Reuter M">M Reuter</name>
</author>
<author>
<name sortKey="Stetter, Ko" uniqKey="Stetter K">KO Stetter</name>
</author>
<author>
<name sortKey="Peng, X" uniqKey="Peng X">X Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hanson, Ca" uniqKey="Hanson C">CA Hanson</name>
</author>
<author>
<name sortKey="Fuhrman, Ja" uniqKey="Fuhrman J">JA Fuhrman</name>
</author>
<author>
<name sortKey="Horner Devine, Mc" uniqKey="Horner Devine M">MC Horner-Devine</name>
</author>
<author>
<name sortKey="Martiny, Jb" uniqKey="Martiny J">JB Martiny</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hendrix, Rw" uniqKey="Hendrix R">RW Hendrix</name>
</author>
<author>
<name sortKey="Smith, Mc" uniqKey="Smith M">MC Smith</name>
</author>
<author>
<name sortKey="Burns, Rn" uniqKey="Burns R">RN Burns</name>
</author>
<author>
<name sortKey="Ford, Me" uniqKey="Ford M">ME Ford</name>
</author>
<author>
<name sortKey="Hatfull, Gf" uniqKey="Hatfull G">GF Hatfull</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hurwitz, Bl" uniqKey="Hurwitz B">BL Hurwitz</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hurwitz, Bl" uniqKey="Hurwitz B">BL Hurwitz</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ignacio Espinoza, Jc" uniqKey="Ignacio Espinoza J">JC Ignacio-Espinoza</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jia, B" uniqKey="Jia B">B Jia</name>
</author>
<author>
<name sortKey="Xuan, L" uniqKey="Xuan L">L Xuan</name>
</author>
<author>
<name sortKey="Cai, K" uniqKey="Cai K">K Cai</name>
</author>
<author>
<name sortKey="Hu, Z" uniqKey="Hu Z">Z Hu</name>
</author>
<author>
<name sortKey="Ma, L" uniqKey="Ma L">L Ma</name>
</author>
<author>
<name sortKey="Wei, C" uniqKey="Wei C">C Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kamke, J" uniqKey="Kamke J">J Kamke</name>
</author>
<author>
<name sortKey="Sczyrba, A" uniqKey="Sczyrba A">A Sczyrba</name>
</author>
<author>
<name sortKey="Ivanova, N" uniqKey="Ivanova N">N Ivanova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kashtan, N" uniqKey="Kashtan N">N Kashtan</name>
</author>
<author>
<name sortKey="Roggensack, Se" uniqKey="Roggensack S">SE Roggensack</name>
</author>
<author>
<name sortKey="Rodrigue, S" uniqKey="Rodrigue S">S Rodrigue</name>
</author>
<author>
<name sortKey="Thompson, Jw" uniqKey="Thompson J">JW Thompson</name>
</author>
<author>
<name sortKey="Biller, Sj" uniqKey="Biller S">SJ Biller</name>
</author>
<author>
<name sortKey="Coe, A" uniqKey="Coe A">A Coe</name>
</author>
<author>
<name sortKey="Ding, H" uniqKey="Ding H">H Ding</name>
</author>
<author>
<name sortKey="Marttinen, P" uniqKey="Marttinen P">P Marttinen</name>
</author>
<author>
<name sortKey="Malmstrom, Rr" uniqKey="Malmstrom R">RR Malmstrom</name>
</author>
<author>
<name sortKey="Stocker, R" uniqKey="Stocker R">R Stocker</name>
</author>
<author>
<name sortKey="Follows, Mj" uniqKey="Follows M">MJ Follows</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Chisholm, Sw" uniqKey="Chisholm S">SW Chisholm</name>
</author>
<author>
<name sortKey="Biller, J" uniqKey="Biller J">J Biller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Ms" uniqKey="Kim M">MS Kim</name>
</author>
<author>
<name sortKey="Park, Ej" uniqKey="Park E">EJ Park</name>
</author>
<author>
<name sortKey="Roh, Sw" uniqKey="Roh S">SW Roh</name>
</author>
<author>
<name sortKey="Bae, Jw" uniqKey="Bae J">JW Bae</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
<author>
<name sortKey="Senkevich, Tg" uniqKey="Senkevich T">TG Senkevich</name>
</author>
<author>
<name sortKey="Dolja, Vv" uniqKey="Dolja V">VV Dolja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krupovic, M" uniqKey="Krupovic M">M Krupovic</name>
</author>
<author>
<name sortKey="Zhi, N" uniqKey="Zhi N">N Zhi</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Hu, G" uniqKey="Hu G">G Hu</name>
</author>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
<author>
<name sortKey="Wong, S" uniqKey="Wong S">S Wong</name>
</author>
<author>
<name sortKey="Shevchenko, S" uniqKey="Shevchenko S">S Shevchenko</name>
</author>
<author>
<name sortKey="Zhao, K" uniqKey="Zhao K">K Zhao</name>
</author>
<author>
<name sortKey="Young, Ns" uniqKey="Young N">NS Young</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Labonte, Jm" uniqKey="Labonte J">JM Labonté</name>
</author>
<author>
<name sortKey="Swan, Bk" uniqKey="Swan B">BK Swan</name>
</author>
<author>
<name sortKey="Poulos, Bt" uniqKey="Poulos B">BT Poulos</name>
</author>
<author>
<name sortKey="Luo, H" uniqKey="Luo H">H Luo</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Wommack, Ek" uniqKey="Wommack E">EK Wommack</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Labonte, Jm" uniqKey="Labonte J">JM Labonté</name>
</author>
<author>
<name sortKey="Suttle, Ca" uniqKey="Suttle C">CA Suttle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leplae, R" uniqKey="Leplae R">R Leplae</name>
</author>
<author>
<name sortKey="Lima Mendez, G" uniqKey="Lima Mendez G">G Lima-Mendez</name>
</author>
<author>
<name sortKey="Toussaint, A" uniqKey="Toussaint A">A Toussaint</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lima Mendez, G" uniqKey="Lima Mendez G">G Lima-Mendez</name>
</author>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J Van Helden</name>
</author>
<author>
<name sortKey="Toussaint, A" uniqKey="Toussaint A">A Toussaint</name>
</author>
<author>
<name sortKey="Leplae, R" uniqKey="Leplae R">R Leplae</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lima Mendez, G" uniqKey="Lima Mendez G">G Lima-Mendez</name>
</author>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J Van Helden</name>
</author>
<author>
<name sortKey="Toussaint, A" uniqKey="Toussaint A">A Toussaint</name>
</author>
<author>
<name sortKey="Leplae, R" uniqKey="Leplae R">R Leplae</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marston, Mf" uniqKey="Marston M">MF Marston</name>
</author>
<author>
<name sortKey="Pierciey, Fj" uniqKey="Pierciey F">FJ Pierciey</name>
</author>
<author>
<name sortKey="Shepard, A" uniqKey="Shepard A">A Shepard</name>
</author>
<author>
<name sortKey="Gearin, G" uniqKey="Gearin G">G Gearin</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Yandava, C" uniqKey="Yandava C">C Yandava</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
<author>
<name sortKey="Henn, Mr" uniqKey="Henn M">MR Henn</name>
</author>
<author>
<name sortKey="Martiny, Jb" uniqKey="Martiny J">JB Martiny</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Middelboe, M" uniqKey="Middelboe M">M Middelboe</name>
</author>
<author>
<name sortKey="Holmfeldt, K" uniqKey="Holmfeldt K">K Holmfeldt</name>
</author>
<author>
<name sortKey="Riemann, L" uniqKey="Riemann L">L Riemann</name>
</author>
<author>
<name sortKey="Nybroe, O" uniqKey="Nybroe O">O Nybroe</name>
</author>
<author>
<name sortKey="Haaber, J" uniqKey="Haaber J">J Haaber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Minot, S" uniqKey="Minot S">S Minot</name>
</author>
<author>
<name sortKey="Grunberg, S" uniqKey="Grunberg S">S Grunberg</name>
</author>
<author>
<name sortKey="Wu, Gd" uniqKey="Wu G">GD Wu</name>
</author>
<author>
<name sortKey="Lewis, Jd" uniqKey="Lewis J">JD Lewis</name>
</author>
<author>
<name sortKey="Bushman, Fd" uniqKey="Bushman F">FD Bushman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mizuno, Cm" uniqKey="Mizuno C">CM Mizuno</name>
</author>
<author>
<name sortKey="Rodriguez Valera, F" uniqKey="Rodriguez Valera F">F Rodriguez-Valera</name>
</author>
<author>
<name sortKey="Kimes, Ne" uniqKey="Kimes N">NE Kimes</name>
</author>
<author>
<name sortKey="Ghai, R" uniqKey="Ghai R">R Ghai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mosig, G" uniqKey="Mosig G">G Mosig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pace, Nr" uniqKey="Pace N">NR Pace</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author>
<name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pride, Dt" uniqKey="Pride D">DT Pride</name>
</author>
<author>
<name sortKey="Wassenaar, Tm" uniqKey="Wassenaar T">TM Wassenaar</name>
</author>
<author>
<name sortKey="Ghose, C" uniqKey="Ghose C">C Ghose</name>
</author>
<author>
<name sortKey="Blaser, Mj" uniqKey="Blaser M">MJ Blaser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pruitt, Kd" uniqKey="Pruitt K">KD Pruitt</name>
</author>
<author>
<name sortKey="Tatusova, T" uniqKey="Tatusova T">T Tatusova</name>
</author>
<author>
<name sortKey="Klimke, W" uniqKey="Klimke W">W Klimke</name>
</author>
<author>
<name sortKey="Maglott, Dr" uniqKey="Maglott D">DR Maglott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rakonjac, J" uniqKey="Rakonjac J">J Rakonjac</name>
</author>
<author>
<name sortKey="Bennett, Nj" uniqKey="Bennett N">NJ Bennett</name>
</author>
<author>
<name sortKey="Spagnuolo, J" uniqKey="Spagnuolo J">J Spagnuolo</name>
</author>
<author>
<name sortKey="Gagic, D" uniqKey="Gagic D">D Gagic</name>
</author>
<author>
<name sortKey="Russel, M" uniqKey="Russel M">M Russel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rappe, Ms" uniqKey="Rappe M">MS Rappé</name>
</author>
<author>
<name sortKey="Giovannoni, Sj" uniqKey="Giovannoni S">SJ Giovannoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reyes, A" uniqKey="Reyes A">A Reyes</name>
</author>
<author>
<name sortKey="Semenkovich, Np" uniqKey="Semenkovich N">NP Semenkovich</name>
</author>
<author>
<name sortKey="Whiteson, K" uniqKey="Whiteson K">K Whiteson</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Gordon, Ji" uniqKey="Gordon J">JI Gordon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rice, P" uniqKey="Rice P">P Rice</name>
</author>
<author>
<name sortKey="Longden, I" uniqKey="Longden I">I Longden</name>
</author>
<author>
<name sortKey="Bleasby, A" uniqKey="Bleasby A">A Bleasby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rinke, C" uniqKey="Rinke C">C Rinke</name>
</author>
<author>
<name sortKey="Schwientek, P" uniqKey="Schwientek P">P Schwientek</name>
</author>
<author>
<name sortKey="Sczyrba, A" uniqKey="Sczyrba A">A Sczyrba</name>
</author>
<author>
<name sortKey="Ivanova, Nn" uniqKey="Ivanova N">NN Ivanova</name>
</author>
<author>
<name sortKey="Anderson, Ij" uniqKey="Anderson I">IJ Anderson</name>
</author>
<author>
<name sortKey="Cheng, Jf" uniqKey="Cheng J">JF Cheng</name>
</author>
<author>
<name sortKey="Darling, A" uniqKey="Darling A">A Darling</name>
</author>
<author>
<name sortKey="Malfatti, S" uniqKey="Malfatti S">S Malfatti</name>
</author>
<author>
<name sortKey="Swan, Bk" uniqKey="Swan B">BK Swan</name>
</author>
<author>
<name sortKey="Gies, Ea" uniqKey="Gies E">EA Gies</name>
</author>
<author>
<name sortKey="Dodsworth, Ja" uniqKey="Dodsworth J">JA Dodsworth</name>
</author>
<author>
<name sortKey="Hedlund, Bp" uniqKey="Hedlund B">BP Hedlund</name>
</author>
<author>
<name sortKey="Tsiamis, G" uniqKey="Tsiamis G">G Tsiamis</name>
</author>
<author>
<name sortKey="Sievert, Sm" uniqKey="Sievert S">SM Sievert</name>
</author>
<author>
<name sortKey="Liu, Wt" uniqKey="Liu W">WT Liu</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author>
<name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodriguez Valera, F" uniqKey="Rodriguez Valera F">F Rodriguez-Valera</name>
</author>
<author>
<name sortKey="Martin Cuadrado, Ab" uniqKey="Martin Cuadrado A">AB Martin-Cuadrado</name>
</author>
<author>
<name sortKey="Rodriguez Brito, B" uniqKey="Rodriguez Brito B">B Rodriguez-Brito</name>
</author>
<author>
<name sortKey="Pasi, L" uniqKey="Pasi L">L Pasić</name>
</author>
<author>
<name sortKey="Thingstad, Tf" uniqKey="Thingstad T">TF Thingstad</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Mira, A" uniqKey="Mira A">A Mira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
<author>
<name sortKey="Edwards, R" uniqKey="Edwards R">R Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Enault, F" uniqKey="Enault F">F Enault</name>
</author>
<author>
<name sortKey="Bronner, G" uniqKey="Bronner G">G Bronner</name>
</author>
<author>
<name sortKey="Vaulot, D" uniqKey="Vaulot D">D Vaulot</name>
</author>
<author>
<name sortKey="Forterre, P" uniqKey="Forterre P">P Forterre</name>
</author>
<author>
<name sortKey="Krupovic, M" uniqKey="Krupovic M">M Krupovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Enault, F" uniqKey="Enault F">F Enault</name>
</author>
<author>
<name sortKey="Hurwitz, Bl" uniqKey="Hurwitz B">BL Hurwitz</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S Roux</name>
</author>
<author>
<name sortKey="Hawley, Ak" uniqKey="Hawley A">AK Hawley</name>
</author>
<author>
<name sortKey="Torres Beltran, M" uniqKey="Torres Beltran M">M Torres Beltran</name>
</author>
<author>
<name sortKey="Scofield, M" uniqKey="Scofield M">M Scofield</name>
</author>
<author>
<name sortKey="Schwientek, P" uniqKey="Schwientek P">P Schwientek</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T Woyke</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saint Girons, I" uniqKey="Saint Girons I">I Saint Girons</name>
</author>
<author>
<name sortKey="Bourhy, P" uniqKey="Bourhy P">P Bourhy</name>
</author>
<author>
<name sortKey="Ottone, C" uniqKey="Ottone C">C Ottone</name>
</author>
<author>
<name sortKey="Picardeau, M" uniqKey="Picardeau M">M Picardeau</name>
</author>
<author>
<name sortKey="Yelton, D" uniqKey="Yelton D">D Yelton</name>
</author>
<author>
<name sortKey="Hendrix, Rw" uniqKey="Hendrix R">RW Hendrix</name>
</author>
<author>
<name sortKey="Glaser, P" uniqKey="Glaser P">P Glaser</name>
</author>
<author>
<name sortKey="Charon, N" uniqKey="Charon N">N Charon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salim, O" uniqKey="Salim O">O Salim</name>
</author>
<author>
<name sortKey="Skilton, Rj" uniqKey="Skilton R">RJ Skilton</name>
</author>
<author>
<name sortKey="Lambden, Pr" uniqKey="Lambden P">PR Lambden</name>
</author>
<author>
<name sortKey="Fane, Ba" uniqKey="Fane B">BA Fane</name>
</author>
<author>
<name sortKey="Clarke, In" uniqKey="Clarke I">IN Clarke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sencilo, A" uniqKey="Sencilo A">A Sencilo</name>
</author>
<author>
<name sortKey="Paulin, L" uniqKey="Paulin L">L Paulin</name>
</author>
<author>
<name sortKey="Kellner, S" uniqKey="Kellner S">S Kellner</name>
</author>
<author>
<name sortKey="Helm, M" uniqKey="Helm M">M Helm</name>
</author>
<author>
<name sortKey="Roine, E" uniqKey="Roine E">E Roine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sims, Ge" uniqKey="Sims G">GE Sims</name>
</author>
<author>
<name sortKey="Jun, Sr" uniqKey="Jun S">SR Jun</name>
</author>
<author>
<name sortKey="Wu, Ga" uniqKey="Wu G">GA Wu</name>
</author>
<author>
<name sortKey="Kim, Sh" uniqKey="Kim S">SH Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sternberg, N" uniqKey="Sternberg N">N Sternberg</name>
</author>
<author>
<name sortKey="Austin, S" uniqKey="Austin S">S Austin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sullivan, Mj" uniqKey="Sullivan M">MJ Sullivan</name>
</author>
<author>
<name sortKey="Petty, Nk" uniqKey="Petty N">NK Petty</name>
</author>
<author>
<name sortKey="Beatson, Sa" uniqKey="Beatson S">SA Beatson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suttle, Ca" uniqKey="Suttle C">CA Suttle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tadmor, Ad" uniqKey="Tadmor A">AD Tadmor</name>
</author>
<author>
<name sortKey="Ottesen, Ea" uniqKey="Ottesen E">EA Ottesen</name>
</author>
<author>
<name sortKey="Leadbetter, Jr" uniqKey="Leadbetter J">JR Leadbetter</name>
</author>
<author>
<name sortKey="Phillips, R" uniqKey="Phillips R">R Phillips</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weitz, Js" uniqKey="Weitz J">JS Weitz</name>
</author>
<author>
<name sortKey="Poisot, T" uniqKey="Poisot T">T Poisot</name>
</author>
<author>
<name sortKey="Meyer, Jr" uniqKey="Meyer J">JR Meyer</name>
</author>
<author>
<name sortKey="Flores, Co" uniqKey="Flores C">CO Flores</name>
</author>
<author>
<name sortKey="Valverde, S" uniqKey="Valverde S">S Valverde</name>
</author>
<author>
<name sortKey="Sullivan, Mb" uniqKey="Sullivan M">MB Sullivan</name>
</author>
<author>
<name sortKey="Hochberg, Me" uniqKey="Hochberg M">ME Hochberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitman, Wb" uniqKey="Whitman W">WB Whitman</name>
</author>
<author>
<name sortKey="Coleman, Dc" uniqKey="Coleman D">DC Coleman</name>
</author>
<author>
<name sortKey="Wiebe, Wj" uniqKey="Wiebe W">WJ Wiebe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wright, Jj" uniqKey="Wright J">JJ Wright</name>
</author>
<author>
<name sortKey="Konwar, Km" uniqKey="Konwar K">KM Konwar</name>
</author>
<author>
<name sortKey="Hallam, Sj" uniqKey="Hallam S">SJ Hallam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wrighton, K" uniqKey="Wrighton K">K Wrighton</name>
</author>
<author>
<name sortKey="Thomas, B" uniqKey="Thomas B">B Thomas</name>
</author>
<author>
<name sortKey="Sharon, I" uniqKey="Sharon I">I Sharon</name>
</author>
<author>
<name sortKey="Miller, Cs" uniqKey="Miller C">CS Miller</name>
</author>
<author>
<name sortKey="Castelle, Cj" uniqKey="Castelle C">CJ Castelle</name>
</author>
<author>
<name sortKey="Verberkmoes, Nc" uniqKey="Verberkmoes N">NC VerBerkmoes</name>
</author>
<author>
<name sortKey="Wilkins, Mj" uniqKey="Wilkins M">MJ Wilkins</name>
</author>
<author>
<name sortKey="Hettich, Rl" uniqKey="Hettich R">RL Hettich</name>
</author>
<author>
<name sortKey="Lipton, Ms" uniqKey="Lipton M">MS Lipton</name>
</author>
<author>
<name sortKey="Williams, Kh" uniqKey="Williams K">KH Williams</name>
</author>
<author>
<name sortKey="Long, Pe" uniqKey="Long P">PE Long</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yoon, Hs" uniqKey="Yoon H">HS Yoon</name>
</author>
<author>
<name sortKey="Price, Dc" uniqKey="Price D">DC Price</name>
</author>
<author>
<name sortKey="Stepanauskas, R" uniqKey="Stepanauskas R">R Stepanauskas</name>
</author>
<author>
<name sortKey="Rajah, Vd" uniqKey="Rajah V">VD Rajah</name>
</author>
<author>
<name sortKey="Sieracki, Me" uniqKey="Sieracki M">ME Sieracki</name>
</author>
<author>
<name sortKey="Wilson, Wh" uniqKey="Wilson W">WH Wilson</name>
</author>
<author>
<name sortKey="Yang, Ec" uniqKey="Yang E">EC Yang</name>
</author>
<author>
<name sortKey="Duffy, S" uniqKey="Duffy S">S Duffy</name>
</author>
<author>
<name sortKey="Bhattacharya, D" uniqKey="Bhattacharya D">D Bhattacharya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Youle, M" uniqKey="Youle M">M Youle</name>
</author>
<author>
<name sortKey="Haynes, M" uniqKey="Haynes M">M Haynes</name>
</author>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F Rohwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Y" uniqKey="Zhou Y">Y Zhou</name>
</author>
<author>
<name sortKey="Liang, Y" uniqKey="Liang Y">Y Liang</name>
</author>
<author>
<name sortKey="Lynch, Kh" uniqKey="Lynch K">KH Lynch</name>
</author>
<author>
<name sortKey="Dennis, Jj" uniqKey="Dennis J">JJ Dennis</name>
</author>
<author>
<name sortKey="Wishart, Ds" uniqKey="Wishart D">DS Wishart</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">eLife</journal-id>
<journal-id journal-id-type="hwp">eLife</journal-id>
<journal-id journal-id-type="publisher-id">eLife</journal-id>
<journal-title-group>
<journal-title>eLife</journal-title>
</journal-title-group>
<issn pub-type="ppub">2050-084X</issn>
<issn pub-type="epub">2050-084X</issn>
<publisher>
<publisher-name>eLife Sciences Publications, Ltd</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26200428</article-id>
<article-id pub-id-type="pmc">4533152</article-id>
<article-id pub-id-type="publisher-id">08490</article-id>
<article-id pub-id-type="doi">10.7554/eLife.08490</article-id>
<article-categories>
<subj-group subj-group-type="display-channel">
<subject>Tools and Resources</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Ecology</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Genomics and Evolutionary Biology</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Viral dark matter and virus–host interactions resolved from publicly available microbial genomes</article-title>
</title-group>
<contrib-group>
<contrib id="author-13616" contrib-type="author">
<name>
<surname>Roux</surname>
<given-names>Simon</given-names>
</name>
<xref ref-type="aff" rid="aff1">1</xref>
<xref ref-type="author-notes" rid="pa1"></xref>
<xref ref-type="fn" rid="con1"></xref>
<xref ref-type="fn" rid="conf1"></xref>
</contrib>
<contrib id="author-13631" contrib-type="author">
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
<xref ref-type="aff" rid="aff2">2</xref>
<xref ref-type="aff" rid="aff3">3</xref>
<xref ref-type="other" rid="par-2"></xref>
<xref ref-type="other" rid="par-3"></xref>
<xref ref-type="other" rid="par-4"></xref>
<xref ref-type="other" rid="par-5"></xref>
<xref ref-type="other" rid="par-6"></xref>
<xref ref-type="other" rid="par-7"></xref>
<xref ref-type="fn" rid="con3"></xref>
<xref ref-type="fn" rid="conf1"></xref>
</contrib>
<contrib id="author-13630" contrib-type="author">
<name>
<surname>Woyke</surname>
<given-names>Tanja</given-names>
</name>
<xref ref-type="aff" rid="aff4">4</xref>
<xref ref-type="other" rid="par-8"></xref>
<xref ref-type="fn" rid="con4"></xref>
<xref ref-type="fn" rid="conf1"></xref>
</contrib>
<contrib id="author-13273" contrib-type="author">
<name>
<surname>Sullivan</surname>
<given-names>Matthew B</given-names>
</name>
<xref ref-type="aff" rid="aff1">1</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
<xref ref-type="author-notes" rid="pa1"></xref>
<xref ref-type="author-notes" rid="pa2"></xref>
<xref ref-type="other" rid="par-1"></xref>
<xref ref-type="fn" rid="con2"></xref>
<xref ref-type="fn" rid="conf1"></xref>
</contrib>
<aff id="aff1">
<label>1</label>
<institution content-type="dept">Department of Ecology and Evolutionary Biology</institution>
,
<institution>University of Arizona</institution>
,
<addr-line>Tucson</addr-line>
,
<country>United States</country>
</aff>
<aff id="aff2">
<label>2</label>
<institution content-type="dept">Department of Microbiology and Immunology</institution>
,
<institution>University of British Columbia</institution>
,
<addr-line>Vancouver</addr-line>
,
<country>Canada</country>
</aff>
<aff id="aff3">
<label>3</label>
<institution content-type="dept">Graduate Program in Bioinformatics</institution>
,
<institution>University of British Columbia</institution>
,
<addr-line>Vancouver</addr-line>
,
<country>Canada</country>
</aff>
<aff id="aff4">
<label>4</label>
<institution>U.S Department of Energy Joint Genome Institute</institution>
,
<addr-line>Walnut Creek</addr-line>
,
<country>United States</country>
</aff>
</contrib-group>
<contrib-group>
<contrib id="author-1701" contrib-type="editor">
<name>
<surname>Neher</surname>
<given-names>Richard A</given-names>
</name>
<role>Reviewing editor</role>
<aff>
<institution>Max Planck Institute for Developmental Biology</institution>
,
<country>Germany</country>
</aff>
</contrib>
</contrib-group>
<author-notes>
<corresp id="cor1">
<label>*</label>
For correspondence:
<email>mbsulli@gmail.com</email>
</corresp>
<fn fn-type="present-address" id="pa1">
<label></label>
<p>Department of Microbiology, The Ohio State University, Columbus, United States.</p>
</fn>
<fn fn-type="present-address" id="pa2">
<label></label>
<p>Department of Civil, Environmental, and Geodetic Engineering, Columbus, United States.</p>
</fn>
</author-notes>
<pub-date publication-format="electronic" date-type="pub">
<day>22</day>
<month>7</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>4</volume>
<elocation-id>e08490</elocation-id>
<history>
<date date-type="received">
<day>02</day>
<month>5</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>7</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© 2015, Roux et al</copyright-statement>
<copyright-year>2015</copyright-year>
<copyright-holder>Roux et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This article is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="elife08490.pdf"></self-uri>
<abstract>
<p>The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.</p>
<p>
<bold>DOI:</bold>
<ext-link ext-link-type="doi" xlink:href="10.7554/eLife.08490.001">http://dx.doi.org/10.7554/eLife.08490.001</ext-link>
</p>
</abstract>
<abstract abstract-type="executive-summary">
<title>eLife digest</title>
<p>Viruses are infectious particles that can only multiply inside the cells of microbes and other organisms. Little is known about the genetic differences between virus particles (so-called ‘genetic diversity’), especially compared to what we know about the diversity of bacteria, archaea, and other single-celled microbes. This lack of knowledge hampers our understanding of the role viruses play in the evolution of microbial communities and their associated ecosystems.</p>
<p>Studying the genetics of the viruses in these communities is challenging. There is no single ‘marker’ gene that can be used to identify all viruses in environmental samples. Also, many of the fragments of viral genomes that have been identified have not yet been linked to their host microbes. Many viruses integrate their genome into the DNA of their host cell, and there are computational tools available that exploit this ability to identify viruses and link them to their host. However, other viruses can live and multiply inside cells without integrating their genome into the host's DNA.</p>
<p>Earlier in 2015, researchers developed a new computational tool called VirSorter that can predict virus genome sequences within the DNA extracted from microbes. VirSorter identifies viral genome sequences based on the presence of ‘hallmark’ genes that encode for components found in many virus particles, together with a reference database of genomes from many viruses.</p>
<p>Now, Roux et al.—including some of the researchers from the earlier work—use VirSorter to predict viral DNA from publicly available bacteria and archaea genome data. The study identifies over 12,000 viral genomes and links them to their microbial hosts. These data increase the number of viral genome sequences that are publically available by a factor of ten and identify the first viruses associated with 13 new types of bacteria, which include species that are abundant in particular environments.</p>
<p>It is possible for several different viruses to infect a single cell at the same time. Some viruses are known to be able to exchange DNA, and if this happens frequently in other viruses, it could have a big impact on how viruses evolve. Roux et al.'s findings suggest that although it is common for several different viruses to infect the same cell, it is relatively rare for these viruses to exchange genetic material.</p>
<p>Roux et al.'s findings demonstrate the value of searching publicly available microbial genome data for fragments of viral genomes. These new viral genomes will serve as a useful resource for researchers as they explore the communities of viruses and microbes in natural environments, the human body and in industrial processes.</p>
<p>
<bold>DOI:</bold>
<ext-link ext-link-type="doi" xlink:href="10.7554/eLife.08490.002">http://dx.doi.org/10.7554/eLife.08490.002</ext-link>
</p>
</abstract>
<kwd-group kwd-group-type="author-keywords">
<title>Author keywords</title>
<kwd>virus</kwd>
<kwd>phage</kwd>
<kwd>prophage</kwd>
<kwd>virus-host adaptation</kwd>
</kwd-group>
<kwd-group kwd-group-type="research-organism">
<title>Research organism</title>
<kwd>none</kwd>
</kwd-group>
<funding-group>
<award-group id="par-1">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000936</institution-id>
<institution>Gordon and Betty Moore Foundation</institution>
</institution-wrap>
</funding-source>
<award-id>3790</award-id>
<principal-award-recipient>
<name>
<surname>Sullivan</surname>
<given-names>Matthew B</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-2">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100000038</institution-id>
<institution>Natural Sciences and Engineering Research Council of Canada (Conseil de Recherches en Sciences Naturelles et en Génie du Canada)</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-3">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100000196</institution-id>
<institution>Canada Foundation for Innovation (Fondation canadienne pour l'innovation)</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-4">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100007631</institution-id>
<institution>Canadian Institute for Advanced Research (L'Institut Canadien de Recherches Avancées)</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-5">
<funding-source>
<institution-wrap>
<institution>Tula Foundation</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-6">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100001246</institution-id>
<institution>Ambrose Monell Foundation</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-7">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100001372</institution-id>
<institution>G. Unger Vetlesen Foundation</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Hallam</surname>
<given-names>Steven J</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="par-8">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000015</institution-id>
<institution>U.S. Department of Energy (Department of Energy)</institution>
</institution-wrap>
</funding-source>
<award-id>Joint Genome Institute (DE-AC02-05CH11231)</award-id>
<principal-award-recipient>
<name>
<surname>Woyke</surname>
<given-names>Tanja</given-names>
</name>
</principal-award-recipient>
</award-group>
<funding-statement>The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement>
</funding-group>
<custom-meta-group>
<custom-meta>
<meta-name>elife-xml-version</meta-name>
<meta-value>2.3</meta-value>
</custom-meta>
<custom-meta specific-use="meta-only">
<meta-name>Author impact statement</meta-name>
<meta-value>From public microbial genomes, VirSorter revealed 12,498 viral genome sequences that expand the map of the global virosphere and whose analyses improve understanding of viral taxonomy, evolution and virus-host interactions.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<sub-article id="SA1" article-type="article-commentary">
<front-stub>
<article-id pub-id-type="doi">10.7554/eLife.08490.022</article-id>
<title-group>
<article-title>Decision letter</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Neher</surname>
<given-names>Richard A</given-names>
</name>
<role>Reviewing editor</role>
<aff>
<institution>Max Planck Institute for Developmental Biology</institution>
,
<country>Germany</country>
</aff>
</contrib>
</contrib-group>
</front-stub>
<body>
<boxed-text position="float" orientation="portrait">
<p>eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see
<ext-link ext-link-type="uri" xlink:href="http://elifesciences.org/review-process">review process</ext-link>
). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.</p>
</boxed-text>
<p>[Editors’ note: this article was originally rejected after discussions between the reviewers, but the authors were invited to resubmit after an appeal against the decision.]</p>
<p>Thank you for choosing to send your work entitled “Viral dark matter and virus-host interactions resolved from publicly available microbial genomes” for consideration at
<italic>eLife</italic>
. Your full submission has been evaluated by Diethard Tautz (Senior editor) and three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the decision was reached after discussions between the reviewers. One of the three reviewers, Ken Stedmann, has agreed to share his identity.</p>
<p>All reviewers agreed that virus diversity and ecology are very important topics that deserve more attention and new approaches. The large set of putative viral sequences is impressive and the patterns of host association are intriguing, but we felt that the analysis didn't deliver much novel insight into the evolution of “viral dark matter”. A more in depth analysis covering multiple scales of evolution would be necessary to make significant progress in this direction. As it stands, the manuscript describes in broad terms a data set generated with a tool that is supposed to be published elsewhere. The usefulness of the data set as a resource for others remains limited without a feature rich data base that allows convenient exploration and access. Hence we don't feel that your manuscript is appropriate for publication as a Research article in
<italic>eLife</italic>
.</p>
<p>Reviewer #1:</p>
<p>Roux and colleagues analyze a large set of putative viral sequences mined from published bacterial and archaeal genomes using a software that is described in a manuscript that is currently under review elsewhere (and provided to the reviewers). The method seems sound and I think the majority of the reported viral sequences are genuine. The authors use this large set (10 fold larger than existing data bases) to investigate patterns of host adaptation, host range, and virus taxonomy.</p>
<p>The main results reported are:</p>
<p>(i) > 12000 sequences fall into ∼600 clusters, half of which contain known viruses;</p>
<p>(ii) Viruses are well adapted to their hosts, across virus types and proteins;</p>
<p>(iii) Viruses are mostly host specific and virus/hosts define modules.</p>
<p>These results make sense and are mostly expected, the novel element here is to be able to do it on a massive scale. I have a couple of other comments/criticisms:</p>
<p>1) Only the more reliable predictions were used. How do things change when less stringent criteria are used?</p>
<p>2) Is there a sense of saturation: if one had done the study a few years ago with fewer genomes, how many clusters would have been found? Are some parts of the bacterial world exhausted, where do the new sequences come from?</p>
<p>3) What can be learned from this for future efforts to detect viruses? How should sequencing be targeted?</p>
<p>4) Coinfection: the number of viruses per genome seem compatible with random. What can really be learned from this? Does this reflect the number of concomitant infections, or the number of genomes deposited in this genome in the past (like endogenous retroviruses in mammals)?</p>
<p>5) How are others supposed to use this? A data set of this size needs tools to analyze. Are the authors going to develop a data base with interactive views etc?</p>
<p>6) This is a computational study. I expect the scripts and code to be deposited.</p>
<p>Reviewer #2:</p>
<p>In this manuscript the authors describe an approach to increase our knowledge about prokaryotic viruses (phages) by mining prokaryotic genomes available in public repositories. It is an excellent idea that makes a lot of sense and is badly needed to increase the knowledge about the sequence space of phages presently very biased and incomplete. It can also provide a great contribution into one of the conundrums of phage biology, the infection range without the bias of culture. However, I have mixed feelings about this manuscript. On the one hand it is very comprehensive including all genomes in repositories (including many draft genomes) but the results are a bit disappointing and provide very little novelty. That the pattern of infection at large phylogenetic scale will be modular was largely expected from classical work with cultures. But the most relevant question is whether at short phylogenetic distances is nested what is left unanswered. Maybe a problem that is general to these “big data” analysis is the gross level of detail. I wonder why the authors do not provide analysis at the fine resolution level i.e. phages detected within a single species or genus. At the broad level analysed here most of the results are very predictable from classic approaches. The use of draft genomes and the possibility of discriminating plasmids from phages is another question that is left untouched in both this manuscript and the previous submission. There is a gradient in nature between infective phages and conjugative elements and establishing the borderline might be risky.</p>
<p>In summary I missed some more fine grained analysis of examples in this big data approach.</p>
<p>Reviewer #3:</p>
<p>I find this to be a very well-written report of the application of a new bioinformatic tool (VirSorter) developed by some of the authors. This tool has been applied to data mining of the available and rapidly growing genomic datasets and thereby has increased the number of putative (mostly partial) viral genomes by ten-fold. Due to association with both known genomes and SAG genomes from known sources, the analysis allowed identification of potential viruses in hosts for which no known viruses are currently available. This is clearly a boon to researchers working on these organisms. I find the tetranucleotide analysis of viral genomes in order to possibly identify hosts for these viruses to be particularly attractive and plan on using it in my own research.</p>
<p>I am not convinced of the premise stated in the title and Abstract that this analysis provides much insight into viral dark matter or virus-host interactions. This tool enables further investigations that would allow that insight, which would otherwise be extremely difficult if not unfeasible.</p>
<p>I wonder if the manuscript describing the development and testing of the tool (which was submitted together with this manuscript) could be combined into one manuscript.</p>
<p>[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]</p>
<p>Thank you for resubmitting your work entitled “Viral dark matter and virus-host interactions resolved from publicly available microbial genomes” for further consideration as a Tools and resources article at
<italic>eLife</italic>
. Your revised article has been favorably evaluated by Diethard Tautz (Senior editor) and a Reviewing editor. The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:</p>
<p>1) Data availability: given that this is now being considered as a Tools and resources article, we feel that the data availability section should be more prominent. We suggest to move the availability section up (possibly before the conclusions) and provide a little bit more detail. iVirus.us itself seems like a rather hollow shell – clicking on data access yields a 502 bad gateway error. As far as I can tell, everything happens within the discovery environment of iPlant for which registration is necessary. Please elaborate a little bit. The MetaVir environment seems useful, but some of the MetaVir analysis haven't completed yet. In addition, we think that the “richly annotated genbank files” (promised in the rebuttal letter) should be made available not only on the author's website but uploaded to a big-data repository such as data dryad.</p>
<p>2) It seems the authors have misunderstood the request for making the scripts available. We were not asking for the VirSorter scripts, but the scripts that analyze the VirSorter data set to produce figures and results of the paper. Those scripts provide the most accurate description of the methods, and in the interest of reproducibility, they should – whenever possible – be made available. The preferred place would be a separate GitHub repository.</p>
<p>3) Host association figure: We continue to be underwhelmed by this figure. There are lots of lines which clearly fall into a handful of modules, but within these modules it is pretty hard to see what is going on. Maybe a two-way clustering would be more insightful. Consider a distance matrix d_ij, where d_ij is the fraction of sequences in viral cluster (VC) i that come from genomes of host phylum j, maybe normalized for the abundance of genomes from phylum j. Then cluster this matrix both by VC and host, similar to RNA-seq being clustered by gene and tissue. The modules should show up as blocks on the diagonal, while promiscuous affiliations are off-diagonal terms. What exactly the distance matrix d_ij should be requires some thought and there are probably better choices then this proposal. But if something like this would work out, it could be more informative than the current figure. Keeping one as a supplement of the other could be a good solution.</p>
</body>
</sub-article>
<sub-article id="SA2" article-type="reply">
<front-stub>
<article-id pub-id-type="doi">10.7554/eLife.08490.023</article-id>
<title-group>
<article-title>Author response</article-title>
</title-group>
</front-stub>
<body>
<p>
<italic>All reviewers agreed that virus diversity and ecology are very important topics that deserve more attention and new approaches. The large set of putative viral sequences is impressive and the patterns of host association are intriguing, but we felt that the analysis didn't deliver much novel insight into the evolution of</italic>
<italic>viral dark matter</italic>
<italic>. A more in depth analysis covering multiple scales of evolution would be necessary to make significant progress in this direction. As it stands, the manuscript describes in broad terms a data set generated with a tool that is supposed to be published elsewhere. The usefulness of the data set as a resource for others remains limited without a feature rich data base that allows convenient exploration and access. Hence we don't feel that your manuscript is appropriate for publication as a Research article in</italic>
eLife.</p>
<p>We can appreciate that a manuscript introducing 12,498 new phage genomes (whole and large fragments) leaves a feeling of unfinished business no matter how it is written. Seeing these reviews also help us see that we failed to really start out with a quantitative metric of “impact” as to us the scale alone (augmenting available phage genome sequences by an order of magnitude) was a closed case. This is because the last decade has seen microbial ecology transformed by large scale datasets including the Global Ocean Survey microbial metagenomics dataset (Rusch et al. PLoS Biology, 2007) and the first viral metagenomic dataset (Angly et al. PLoS Biology, 2006) – papers which have 1383 and 613 citations, respectively. At the same time, viral ecology is paralyzed by the dominance of “unknowns” in metagenomics studies as commonly 63–93% of new viral metagenomic reads are new to science, presumably because we only have just over a thousand phage genomes and they derive largely (85%) from only 3 of 45 bacterial phyla.</p>
<p>How much of a difference will our 12,498 host-associated phage genomes improve the situation? A new analysis we include here shows that they as much as double the number of affiliated proteins for some environmental viromes (∼35% for seawater viromes vs ∼100% for human gut virome; see
<xref ref-type="fig" rid="fig7">Author response image 1</xref>
). Thus we hope this more clearly emphasizes how single study’s dataset alone will be foundational for future ecology studies seeking to “see” viruses in microbial datasets and to affiliate viruses in viral datasets. These new results were added to the revised manuscript (text and new
<xref ref-type="fig" rid="fig3">Figure 3B</xref>
).
<fig id="fig7" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7554/eLife.08490.024</object-id>
<label>Author response image 1.</label>
<caption>
<title>Improvement in the proportion of affiliated genes from viromes with VirSorter dataset.</title>
<p>Predicted genes from the Pacific Ocean Viromes (
<xref rid="bib36" ref-type="bibr">Hurwitz and Sullivan, 2013</xref>
), Tara Ocean Viromes (
<xref rid="bib8" ref-type="bibr">Brumnoza, et al., 2015</xref>
) and Human Gut Viromes (Minot et al., 2013) were compared to RefSeqVirus (May 2015) and the 12.5k VirSorter dataset (BLASTp, threshold of 50 on bit score and 0.001 on e-value). Predicted proteins affiliated to VirSorter (in blue) did not display any significant similarity to a RefSeq virus, but can now be associated with a phage and a host through the VirSorter database.</p>
<p>
<bold>DOI:</bold>
<ext-link ext-link-type="doi" xlink:href="10.7554/eLife.08490.024">http://dx.doi.org/10.7554/eLife.08490.024</ext-link>
</p>
</caption>
<graphic xlink:href="elife08490f007"></graphic>
</fig>
</p>
<p>As well, we can see that we failed to clearly articulate to the reviewers the specific biological advances made in this manuscript. To summarize these advances, we list the major advances here, any single of which, I would argue, could be the sole focus of a strong, top-tier manuscript.</p>
<p>1) The amount of viral signal in publicly deposited genomes (12.5k highly confident viral sequences in 15k bacteria and archaea genomes) is unexpectedly high since we focused our analysis on “active” infections by excluding fragmented genomes likely to be defective or decayed prophages.</p>
<p>2) This study is the first to attempt to quantify the lesser studied types of viruses and finds viral genomes not integrated in the host genome to be rather abundant (>1k sequences were identifiable, subsection “New viruses detected in public microbial genomic datasets with VirSorter”). These could represent extrachromosomal prophages, chronic, or “cryptic” lytic viruses (i.e. lytic viruses that goes unnoticed in a culture), all infection types that are understudied and with unknown and likely underestimated ecological impacts.</p>
<p>3) Genome-based clustering analyses revealed that approximately half of the observed viral clusters in the VirSorter dataset lacked known reference genomes (subsection “264 new putative viral genera identified through genome-based network clustering”, last paragraph). Obtaining complete or near-complete genomes and documenting the host range for these new groups is critical for mapping the virosphere, especially because while other approaches (e.g., viral metagenomics) can help identify non-cultivated viral diversity, these lack this host association. Highlightable “firsts” here include the first viral genomes for 9 bacterial phyla (subsection “Long-term evolutionary patterns of bacterial and archaeal virus genomes” and
<xref ref-type="table" rid="tbl1">Table 1</xref>
, see also
<xref ref-type="fig" rid="fig8">Author response image 2</xref>
), which is about as much from all published literature to date, as well as a new
<italic>Bacteroides</italic>
virus, unrelated to any virus previously described that likely represents a new viral order (subsection “New viruses detected in public microbial genomic datasets with VirSorter”, third paragraph).
<fig id="fig8" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7554/eLife.08490.025</object-id>
<label>Author response image 2.</label>
<caption>
<title>Viral sequences distribution of RefSeq and VirSorter dataset.</title>
<p>For each host group, a circle proportional to the number of viral genomes available is noted in red for RefSeq and blue for VirSorter. Hosts for which no RefSeq references were available are highlighted in bold.</p>
<p>
<bold>DOI:</bold>
<ext-link ext-link-type="doi" xlink:href="10.7554/eLife.08490.025">http://dx.doi.org/10.7554/eLife.08490.025</ext-link>
</p>
</caption>
<graphic xlink:href="elife08490f008"></graphic>
</fig>
</p>
<p>4) The fraction of microbes in any environment that are co-infection by more than one virus remains a fundamental, yet largely (completely?) unknown number in any environment. Here we show that co-infection is common (∼50% of cells are co-infected, l. 242) and many of these co-infections are by more than one type of virus. Such co-infections likely have far-reaching implications for viral genome evolution, as they provide opportunity for gene exchange, so quantifying the co-infection frequency across viral groups offers insight into how genomically promiscuous one viral group might be relative to another.</p>
<p>5) While not completely novel observations, we also perform analyses that confirm, with a much larger dataset, prior work in key areas that are desperately needed in viral taxonomy and ecology. First, genome network analysis helps classify new viruses in a robust genome-based taxonomic framework that is largely consistent with accepted ICTV taxonomy (subsection “264 new putative viral genera identified through genome-based network clustering”). Second, leveraging this unprecedentedly large-scale, host-associated viral genomic dataset, we show that tetranucleotide frequency distance is a surprisingly robust predictor of the host of most viruses. Again, while not novel knowledge, working at this scale we added an effort to quantify the probabilistic value of these predictions across multiple host phyla, as well as compared the performance of tetranucleotide frequency to 4 other sequence composition based metrics to help provide strong guidance to researchers on using this metric. Perhaps this is why in spite of the idea being in the literature for some time, reviewer #3 notes: “I find the tetranucleotide analysis of viral genomes in order to possibly identify hosts for these viruses to be particularly attractive and plan on using it in my own research
<italic>.</italic>
” Third, the modular pattern of a global virus-host network was indeed predicted by theoretical models, although the only study of comparable size observing a modular virus-host network (Flores et al., ISMEJ, 2011) was based on plaque formation on host cultures where genetic diversity was unknown. Here, we validate the modularity with an unprecedentedly large-scale dataset that includes microbes spanning 18 phyla, and add information about the level of taxonomy at which the virus-host network is modular (as we expect it to become nested at one point, near the “tip of the tree”). Fourth, the dominance of Caudovirales in the dataset, as well as the clear separation between DNA and RNA viruses as well as Archaeal and Bacterial viruses were all expected based on the previous knowledge on viral diversity, and so these findings are largely only confirmatory.</p>
<p>To better emphasize these results, we added a new figure displaying more clearly how the curated VirSorter dataset expands the range of known viruses (
<xref ref-type="fig" rid="fig1">Figure 1</xref>
), and re-organizing the manuscript so that the first three subparts of Results section are now entirely dedicated to the exploration of this new diversity (subsection “New viruses detected in public microbial genomic datasets with VirSorter”). The questions of viral classification through genome-based network, virus-host interactions and adaptations are then addressed in three more subsections (“Long-term evolutionary patterns of bacterial and archaeal virus genomes“, Global virus–host network is confirmed as modular” and “Virus–host adaptation signals detected at the genome composition and codon usage level”). We hope that this new organization brings more balance to the manuscript and helps to better introduce the dataset before actually switching to secondary analyses.</p>
<p>Another issue that the reviewers had was that there was the perception of a lack of depth in the manuscript. We acknowledge that the format of the manuscript follows a style in which only global patterns are presented. This is similar to how Rinke et al
<italic>.</italic>
(2013 Nature) handled their explorations of microbial dark matter – and is a common strategy for getting such big datasets out to specialists for follow-on analyses. Such follow-up studies will undoubtedly be extremely interesting and critical for the field. However, we chose to play to the strengths of the data and consider only global-scale stories.</p>
<p>Notably,
<italic>eLife</italic>
recently published a manuscript describing comparative phage genomics of 627 mycobacteriophage genomes (Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity, Pope et al., 2015.
<italic>eLife</italic>
). The major findings in the manuscript (e.g., phage genomes are part of a genetic continuum) are consistent with and redundant to the findings from at least 5 previously published papers by the same group (Hendrix et al., PNAS, 1998; Pedulla et al., Cell, 2003; Jacobs-Sera et al., Virology, 2012; Cresawn et al., PLoS One, 2015) yet the value of such a large-scale dataset is paramount currently for future studies of the ecology, evolution and genomics of phages, which is presumably why the study was accepted at
<italic>eLife</italic>
.</p>
<p>Finally, the last two criticisms in the editor’s summary were as follows:</p>
<p>1)
<italic>“the manuscript describes in broad terms a data set generated with a tool that is supposed to be published elsewhere.”</italic>
The manuscript describing the methodological details of the tool is now published VirSorter: mining viral signal from microbial genomic data”, S Roux, F Enault, BL Hurwitz, MB Sullivan, PeerJ 3, e985.</p>
<p>2)
<italic>“The usefulness of the data set as a resource for others remains limited without a feature rich data base that allows convenient exploration and access.”</italic>
We strongly agree that providing convenient and enriched access to data and tools is crucial for researchers, as can be seen by the previous projects of the lead author who developed and maintain one of the only viral metagenomic databases and analysis tools publicly available (MetaVir), and the senior author’s laboratory which has been building iVirus on the back of the NSF-funded iPlant Cyberinfrastructure in spite of a lack of funding for the project. In fact, both projects are unfunded, yet we maintain and/or develop the efforts as they represent our commitment to getting the data and tools into researchers hands to enable them to better “see” the viruses in their datasets. Although we did not emphasize this in the manuscript, a mistake we would correct in a revised manuscript, we intend to make the dataset available on these two complementary websites (MetaVir and iVirus). MetaVir provides an automatic annotation of each sequence, with multiple visualization tools to explore and compare genome maps, as well as multiple ways of searching the data (by host, by phage affiliation, by gene taxonomic or functional affiliation, by size, etc) and extract a specific subset of interest. iVirus offers optimized data repository features as well as numerous analytical tools for comparative genome analyses and metagenomic fragment recruitment analyses (BLAST, bwa and bowtie2 read aligners, multiple flavors of functional gene annotation, phylogenetic tree building pipelines, etc.). Finally, a summary of the sequences and clusters is provided as supplementary files, and both the raw data and richly annotated sequences (genbank file format, including taxonomic and functional affiliation of all genes) will be available to download on Sullivan’s publications webpage should the paper be accepted – just as we do for other datasets of community interest (e.g., the Pacific Ocean Viromes and the Tara Oceans Viromes datasets). The information about the dataset availability was added to the Material and methods (subsection “Dataset and script availability”).</p>
<p>In summary, we would argue that there is no greater challenge to exploring the ecology and evolution of viral communities in diverse ecosystems (e.g., oceans, soils, humans) than the lack of reference genomes that cause dominance by ‘viral dark matter’. Above we have tried to more carefully articulate the major advances this study makes, and we emphasize that the reviewers also noted the quality of the work and its relevance for the field (Reviewer 2:
<italic>“It is an excellent idea that makes a lot of sense and is badly needed to increase the knowledge about the sequence space of phages presently very biased and incomplete.”</italic>
, Reviewer 3:
<italic>“This is clearly a boon to researchers working on these organisms.”</italic>
). We hope that the new figures (
<xref ref-type="fig" rid="fig1 fig2 fig3">Figures 1–, 2 and 3</xref>
), the added results and new organization of the manuscript helped to bring out how valuable VirSorter curated dataset is, and what insights into virus-host interactions were obtained. Please find below a point-by-point response to the reviewers comments.</p>
<p>Reviewer #1:</p>
<p>
<italic>Roux and colleagues analyze a large set of putative viral sequences mined from published bacterial and archaeal genomes using a software that is described in a manuscript that is currently under review elsewhere (and provided to the reviewers). The method seems sound and I think the majority of the reported viral sequences are genuine. The authors use this large set (10 fold larger than existing data bases) to investigate patterns of host adaptation, host range, and virus taxonomy</italic>
.</p>
<p>The main results reported are:</p>
<p>
<italic>(i) > 12000 sequences fall into ∼600 clusters, half of which contain known viruses</italic>
;</p>
<p>
<italic>(ii) Viruses are well adapted to their hosts, across virus types and proteins</italic>
;</p>
<p>
<italic>(iii) Viruses are mostly host specific and virus/hosts define modules</italic>
.</p>
<p>
<italic>These results make sense and are mostly expected, the novel element here is to be able to do it on a massive scale</italic>
.</p>
<p>We acknowledge that some results mostly confirm what could be predicted based on smaller scale studies, but we feel that we have made significant new discoveries and phenomenological observations in exploring the global scale patterns in this dataset. We hope that our efforts above (response to editor’s summary) now better articulate the specific advances made in this manuscript.</p>
<p>
<italic>I have a couple of other comments/criticisms</italic>
:</p>
<p>
<italic>1) Only the more reliable predictions were used</italic>
.
<italic>How do things change when less stringent criteria are used?</italic>
</p>
<p>We appreciate the suggestion of including the category 3 predictions (∼90K sequences) in our analyses. During the data exploration phase of preparing this manuscript, we examined the category 3 predictions but found it to be of mixed use since we were focused on viral sequence space in this manuscript. We went into some detail about this in the “tool” manuscript in PeerJ
<italic>,</italic>
but also here explicitly caution the reader about the value of category 3 predictions. Specifically, “we discarded all predictions lacking a viral hallmark gene or a viral gene enrichment […] as these are likely defective prophages for which boundaries are difficult to predict in silico and that often include bacterial genes” (subsection “Selection of a relevant subset of viral sequences: the VirSorter dataset”). While we were focused here on the higher confidence viral genome sequences (category 1 and 2 predictions), the category 3 predictions are of great value to specialists interested in defective prophages, mobile elements or microbial genomic islands. We hope with this added context that you can appreciate our decision to focus in this way and yet also make the category 3 predictions available through this study since they could be of value for follow-on work.</p>
<p>2) Is there a sense of saturation: if one had done the study a few years ago with fewer genomes, how many clusters would have been found? Are some parts of the bacterial world exhausted, where do the new sequences come from?</p>
<p>Because this study leverages 15K publicly available microbial genomes, it is not an ideal dataset from which to draw conclusions about saturation. Notably, however, we saw that even well studied groups, such as
<italic>Gammaproteobacteria</italic>
and
<italic>Bacilli,</italic>
do not appear saturated as new VCs (i.e., those lacking a RefSeq reference) were detected here too (new
<xref ref-type="fig" rid="fig2">Figure 2B</xref>
). We do see this as an ideal question to approach in the future using VirSorter – once the floods of SAGs data are available as these sequences will better span the microbial tree of life and provide context for both lytically infecting and cell-associated (prophage, extrachromosomal, chronic, etc.) infecting viruses.</p>
<p>3) What can be learned from this for future efforts to detect viruses? How should sequencing be targeted?</p>
<p>This is a question each individual researcher will need to answer based upon their particular research question of interest – are you interested in capturing sequence breadth or depth? It’s an age-old trade-off and one we do not answer well here since even this scale of data is not very deep in any one category yet since we leveraged public data rather than develop an explicit experimental sampling strategy.</p>
<p>
<italic>4) Coinfection: the number of viruses per genome seem compatible with random. What can really be learned from this? Does this reflect the number of concomitant infections</italic>
,
<italic>or the number of genomes deposited in this genome in the past (like endogenous retroviruses in mammals)?</italic>
</p>
<p>Unfortunately, discerning genomes previously deposited in the host genome from active viral infections in silico is nearly impossible. We conservatively focus on sequences that are likely to be active and not past infections as we only consider prophages that included the capsid-associated genes (viral hallmark genes). Thus, the 12.5k sequences likely underestimate the total number of viruses in the dataset (since we miss those with unrecognizable capsid genes), but should conservatively identify active infections that represent some combination of lytic infections, prophages and chronic infections. We added a discussion about active viruses (subsection “VirSorter curated dataset includes extrachromosomal genomes and improves virome affiliation”).</p>
<p>
<italic>5) How are others supposed to use this? A data set of this size needs tools to analyze. Are the authors going to develop a data base with interactive views etc</italic>
.
<italic>?</italic>
</p>
<p>We carefully considered for some time how best to make these data available and in the end chose to make it available through the iPlant Cyber infrastructure, which allows to easily share large sequences datasets, and the MetaVir web server, which generates automatically annotated contig maps searchable by function, taxonomy, or host taxonomy (details in the response to editor's summary above). Notably, all viral genomes are made available in a fully-annotated genbank format that could be utilized by researches in any number of genome browsers (e.g., Artemis) for follow-on analytics using the tool of choice. It is beyond the scope of this manuscript or our lab to develop interactive data interrogation tools for these data as these efforts are often large-scale projects (e.g., iPlant is a $100M NSF Center, KBase is $100M DOE Center).</p>
<p>
<italic>6) This is a computational study. I expect the scripts and code to be deposited</italic>
.</p>
<p>We agree, and apologize for not displaying this clearly in the manuscript. All of our code was previously made publicly available through GitHub and a community-available version is implemented through the iPlant Cyberinfrastructure. These details were in the prior, PeerJ publication that describes the VirSorter tool, but we now also point readers to these details in the current manuscript (subsection “Dataset and script availability”).</p>
<p>Reviewer #2:</p>
<p>
<italic>In this manuscript the authors describe an approach to increase our knowledge about prokaryotic viruses (phages) by mining prokaryotic genomes available in public repositories. It is an excellent idea that makes a lot of sense and is badly needed to increase the knowledge about the sequence space of phages presently very biased and incomplete. It can also provide a great contribution into one of the conundrums of phage biology, the infection range without the bias of culture. However, I have mixed feelings about this manuscript. On the one hand it is very comprehensive including all genomes in repositories (including many draft genomes) but the results are a bit disappointing and provide very little novelty. That the pattern of infection at large phylogenetic scale will be modular was largely expected from classical work with cultures. But the most relevant question is whether at short phylogenetic distances is nested what is left unanswered. Maybe a problem that is general to these</italic>
<italic>big data</italic>
<italic>analysis is the gross level of detail. I wonder why the authors do not provide analysis at the fine resolution level i.e. phages detected within a single species or genus. At the broad level analysed here most of the results are very predictable from classic approaches. The use of draft genomes and the possibility of discriminating plasmids from phages is another question that is left untouched in both this manuscript and the previous submission. There is a gradient in nature between infective phages and conjugative elements and establishing the borderline might be risky</italic>
.</p>
<p>In summary I missed some more fine grained analysis of examples in this big data approach.</p>
<p>We thank the reviewer for these kind words. Indeed, these more detailed analyses would likely be extremely interesting, however the density of the current manuscript (see the reply to editor's summary) hardly allows for the addition of more results, which would also lead to additional Introduction and Discussion. In the response to the editor’s summary above, we describe our rationale for why we hope to keep the manuscript focused on the big picture or global-scale analyses.</p>
<p>Reviewer #3:</p>
<p>I find this to be a very well-written report of the application of a new bioinformatic tool (VirSorter) developed by some of the authors. This tool has been applied to data mining of the available and rapidly growing genomic datasets and thereby has increased the number of putative (mostly partial) viral genomes by ten-fold. Due to association with both known genomes and SAG genomes from known sources, the analysis allowed identification of potential viruses in hosts for which no known viruses are currently available. This is clearly a boon to researchers working on these organisms. I find the tetranucleotide analysis of viral genomes in order to possibly identify hosts for these viruses to be particularly attractive and plan on using it in my own research.</p>
<p>
<italic>I am not convinced of the premise stated in the title and Abstract that this analysis provides much insight into viral dark matter or virus-host interactions. This tool enables further investigations that would allow that insight, which would otherwise be extremely difficult if not unfeasible</italic>
.</p>
<p>We thank the reviewer for the kind words. Although we agree that the tool enables exciting potential follow-up investigations, we still consider that the description of such a vast dataset, that includes (as noted by the reviewer) potential viruses for host groups with no currently isolated virus, is akin to taking one (giant) step into the viral dark matter. Notably, using this host-associated viral sequences as a complementary database doubled the ratio of affiliated genes from human gut viromes (see
<xref ref-type="fig" rid="fig7">Author response image 1</xref>
). The description of the different viral clusters linked to these new viruses and associated with specific host groups is for us what we consider as new insights into viral dark matter and virus -host interactions. Clearly we failed to articulate those advances in the submitted manuscript, but hope that our response to the editor’s summary above helps more clearly make our case. We hope this revised manuscript is better at bringing these points out.</p>
<p>
<italic>I wonder if the manuscript describing the development and testing of the tool (which was submitted together with this manuscript) could be combined into one manuscript</italic>
.</p>
<p>We had felt similarly and previously prepared a manuscript combining the tool and the findings presented in this current manuscript. Unfortunately, 18 months ago such a “merged” manuscript did not review well as frustrated both informaticists and biologists each desiring more detail. Thus we chose to separately publish the tool VirSorter: mining viral signal from microbial genomic data, S Roux, F Enault, BL Hurwitz, MB Sullivan, PeerJ 3, e985 – and here present its first application to ∼15K publicly available bacterial and archaeal genomes (this study).</p>
<p>[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]</p>
<p>
<italic>1) Data availability: given that this is a Tools and Resources article, we feel that the data availability section should be more prominent. We suggest to move the availability section up (possibly before the conclusions) and provide a little bit more detail. iVirus.us itself seems like a rather hollow shell – clicking on data access yields a 502 bad gateway error. As far as I can tell, everything happens within the discovery environment of iPlant for which registration is necessary. Please elaborate a little bit. The MetaVir environment seems useful, but some of the MetaVir analysis haven't completed yet. In addition, we think that the</italic>
<italic>richly annotated genbank files</italic>
<italic>(promised in the rebuttal letter) should be made available not only on the author's website but uploaded to a big-data repository such as data dryad</italic>
.</p>
<p>We agree with the idea of placing more emphasis on data availability and appreciate the suggestions for how to do so. To this end, we have:</p>
<p>A) Created a “Dataset availability” section. This section is located at the end of the manuscript just before the conclusions, and now details the different places where the VirSorter Curated Dataset and the associated results are available.</p>
<p>B) Created a direct iVirus link for the datasets. As noted by the reviewers, the structure of iVirus is very young and still in development for the most part. However, we will leverage here a new feature in iVirus which allows for direct access to a set of files linked to a publication without the need for registration. This link provides direct access:
<ext-link ext-link-type="uri" xlink:href="http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/ivirus/VirSorter_curated_dataset">http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/ivirus/VirSorter_curated_dataset</ext-link>
. We added this link to the manuscript in this new section (“Dataset availability”).</p>
<p>C) Made the annotated genbank files available via DataDryad. These are organized by host and provided as a zip package now uploaded to DataDryad (DataDryad package
<ext-link ext-link-type="uri" xlink:href="https://datadryad.org/resource/doi:10.5061/dryad.b8226">dryad.b8226</ext-link>
) and added this information in the subsection “Dataset availability”of the revised manuscript.</p>
<p>
<italic>2) It seems the authors have misunderstood the request for making the scripts available. We were not asking for the VirSorter scripts, but the scripts that analyze the VirSorter data set to produce figures and results of the paper. Those scripts provide the most accurate description of the methods, and in the interest of reproducibility, they should – whenever possible – be made available. The preferred place would be a separate GitHub repository</italic>
.</p>
<p>Indeed, we misunderstood the former request from the reviewers. To rectify this, we have now prepared the scripts used to produce the results in this manuscript for public release on our lab wiki (the corresponding link:
<ext-link ext-link-type="uri" xlink:href="http://tmpl.arizona.edu/dokuwiki/doku.php?id=bioinformatics:scripts:vsb">http://tmpl.arizona.edu/dokuwiki/doku.php?id=bioinformatics:scripts:vsb</ext-link>
) and a GitHub repository (
<ext-link ext-link-type="uri" xlink:href="https://github.com/simroux/virsorter-curated-dataset-scripts-package">https://github.com/simroux/virsorter-curated-dataset-scripts-package</ext-link>
).</p>
<p>
<italic>3) Host association figure: We continue to be underwhelmed by this figure. There are lots of lines which clearly fall into a handful of modules, but within these modules it is pretty hard to see what is going on. Maybe a two-way clustering would be more insightful. Consider a distance matrix d_ij, where d_ij is the fraction of sequences in viral cluster (VC) i that come from genomes of host phylum j, maybe normalized for the abundance of genomes from phylum j. Then cluster this matrix both by VC and host, similar to RNA-seq being clustered by gene and tissue. The modules should show up as blocks on the diagonal, while promiscuous affiliations are off-diagonal terms. What exactly the distance matrix d_ij should be requires some thought and there are probably better choices then this proposal. But if something like this would work out, it could be more informative than the current figure. Keeping one as a supplement of the other could be a good solution</italic>
.</p>
<p>We thank you for helping us see the issues with this figure better. To clarify these results, we modified the figures and represent the same network (and the same modules, identified through lp-brim) in a matrix form as suggested by the reviewers. Although we find that the overall “shape” of the network is not as apparent as in the “network” visualization, the “matrix” plot makes it indeed easier to identify the connections between virus clusters and host groups. The new figure (“matrix” visualization) was thus added as
<xref ref-type="fig" rid="fig5">Figure 5</xref>
, and the former
<xref ref-type="fig" rid="fig5">Figure 5</xref>
is now displayed as
<xref ref-type="fig" rid="fig5s1">Figure 5–figure supplement 1</xref>
. We hope that these two representations together help present the findings in a manner that is most informative.</p>
</body>
</sub-article>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000092 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000092 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4533152
   |texte=   Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:26200428" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024