Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0001069 ( Pmc/Corpus ); précédent : 0001068; suivant : 0001070 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">HostPhinder: A Phage Host Prediction Tool</title>
<author>
<name sortKey="Villarroel, Julia" sort="Villarroel, Julia" uniqKey="Villarroel J" first="Julia" last="Villarroel">Julia Villarroel</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kleinheinz, Kortine Annina" sort="Kleinheinz, Kortine Annina" uniqKey="Kleinheinz K" first="Kortine Annina" last="Kleinheinz">Kortine Annina Kleinheinz</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jurtz, Vanessa Isabell" sort="Jurtz, Vanessa Isabell" uniqKey="Jurtz V" first="Vanessa Isabell" last="Jurtz">Vanessa Isabell Jurtz</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zschach, Henrike" sort="Zschach, Henrike" uniqKey="Zschach H" first="Henrike" last="Zschach">Henrike Zschach</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lund, Ole" sort="Lund, Ole" uniqKey="Lund O" first="Ole" last="Lund">Ole Lund</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nielsen, Morten" sort="Nielsen, Morten" uniqKey="Nielsen M" first="Morten" last="Nielsen">Morten Nielsen</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="af2-viruses-08-00116">Instituto de Investigaciones Biotecnológicas, Universidad de San Martín, CP(1650) San Martín, Prov. de Buenos Aires, Argentina</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Larsen, Mette Voldby" sort="Larsen, Mette Voldby" uniqKey="Larsen M" first="Mette Voldby" last="Larsen">Mette Voldby Larsen</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27153081</idno>
<idno type="pmc">4885074</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4885074</idno>
<idno type="RBID">PMC:4885074</idno>
<idno type="doi">10.3390/v8050116</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000106</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000106</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">HostPhinder: A Phage Host Prediction Tool</title>
<author>
<name sortKey="Villarroel, Julia" sort="Villarroel, Julia" uniqKey="Villarroel J" first="Julia" last="Villarroel">Julia Villarroel</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kleinheinz, Kortine Annina" sort="Kleinheinz, Kortine Annina" uniqKey="Kleinheinz K" first="Kortine Annina" last="Kleinheinz">Kortine Annina Kleinheinz</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jurtz, Vanessa Isabell" sort="Jurtz, Vanessa Isabell" uniqKey="Jurtz V" first="Vanessa Isabell" last="Jurtz">Vanessa Isabell Jurtz</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zschach, Henrike" sort="Zschach, Henrike" uniqKey="Zschach H" first="Henrike" last="Zschach">Henrike Zschach</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lund, Ole" sort="Lund, Ole" uniqKey="Lund O" first="Ole" last="Lund">Ole Lund</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nielsen, Morten" sort="Nielsen, Morten" uniqKey="Nielsen M" first="Morten" last="Nielsen">Morten Nielsen</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="af2-viruses-08-00116">Instituto de Investigaciones Biotecnológicas, Universidad de San Martín, CP(1650) San Martín, Prov. de Buenos Aires, Argentina</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Larsen, Mette Voldby" sort="Larsen, Mette Voldby" uniqKey="Larsen M" first="Mette Voldby" last="Larsen">Mette Voldby Larsen</name>
<affiliation>
<nlm:aff id="af1-viruses-08-00116">Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Viruses</title>
<idno type="eISSN">1999-4915</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The current dramatic increase of antibiotic resistant bacteria has revitalised the interest in bacteriophages as alternative antibacterial treatment. Meanwhile, the development of bioinformatics methods for analysing genomic data places high-throughput approaches for phage characterization within reach. Here, we present HostPhinder, a tool aimed at predicting the bacterial host of phages by examining the phage genome sequence. Using a reference database of 2196 phages with known hosts, HostPhinder predicts the host species of a query phage as the host of the most genomically similar reference phages. As a measure of genomic similarity the number of co-occurring k-mers (DNA sequences of length k) is used. Using an independent evaluation set, HostPhinder was able to correctly predict host genus and species for 81% and 74% of the phages respectively, giving predictions for more phages than BLAST and significantly outperforming BLAST on phages for which both had predictions. HostPhinder predictions on phage draft genomes from the INTESTI phage cocktail corresponded well with the advertised targets of the cocktail. Our study indicates that for most phages genomic similarity correlates well with related bacterial hosts. HostPhinder is available as an interactive web service [
<xref rid="B1-viruses-08-00116" ref-type="bibr">1</xref>
] and as a stand alone download from the Docker registry [
<xref rid="B2-viruses-08-00116" ref-type="bibr">2</xref>
].</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kapi, A" uniqKey="Kapi A">A. Kapi</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harper, D" uniqKey="Harper D">D. Harper</name>
</author>
<author>
<name sortKey="Anderson, J" uniqKey="Anderson J">J. Anderson</name>
</author>
<author>
<name sortKey="Enright, M" uniqKey="Enright M">M. Enright</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kutateladze, M" uniqKey="Kutateladze M">M. Kutateladze</name>
</author>
<author>
<name sortKey="Adamia, R" uniqKey="Adamia R">R. Adamia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kutateladze, M" uniqKey="Kutateladze M">M. Kutateladze</name>
</author>
<author>
<name sortKey="Adamia, R" uniqKey="Adamia R">R. Adamia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miedzybrodzki, R" uniqKey="Miedzybrodzki R">R. Miedzybrodzki</name>
</author>
<author>
<name sortKey="Borysowski, J" uniqKey="Borysowski J">J. Borysowski</name>
</author>
<author>
<name sortKey="Weber Dabrowska, B" uniqKey="Weber Dabrowska B">B. Weber-Dabrowska</name>
</author>
<author>
<name sortKey="Fortuna, W" uniqKey="Fortuna W">W. Fortuna</name>
</author>
<author>
<name sortKey="Letkiewicz, S" uniqKey="Letkiewicz S">S. Letkiewicz</name>
</author>
<author>
<name sortKey="Szufnarowski, K" uniqKey="Szufnarowski K">K. Szufnarowski</name>
</author>
<author>
<name sortKey="Pawelczyk, Z" uniqKey="Pawelczyk Z">Z. Pawelczyk</name>
</author>
<author>
<name sortKey="Rog Z, P" uniqKey="Rog Z P">P. Rogóz</name>
</author>
<author>
<name sortKey="Klak, M" uniqKey="Klak M">M. Klak</name>
</author>
<author>
<name sortKey="Wojtasik, E" uniqKey="Wojtasik E">E. Wojtasik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weber D Browska, B" uniqKey="Weber D Browska B">B. Weber-Dąbrowska</name>
</author>
<author>
<name sortKey="Mulczyk, M" uniqKey="Mulczyk M">M. Mulczyk</name>
</author>
<author>
<name sortKey="G Rski, A" uniqKey="G Rski A">A. Górski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Biswas, B" uniqKey="Biswas B">B. Biswas</name>
</author>
<author>
<name sortKey="Adhya, S" uniqKey="Adhya S">S. Adhya</name>
</author>
<author>
<name sortKey="Washart, P" uniqKey="Washart P">P. Washart</name>
</author>
<author>
<name sortKey="Paul, B" uniqKey="Paul B">B. Paul</name>
</author>
<author>
<name sortKey="Trostel, A N" uniqKey="Trostel A">A.N. Trostel</name>
</author>
<author>
<name sortKey="Powell, B" uniqKey="Powell B">B. Powell</name>
</author>
<author>
<name sortKey="Carlton, R" uniqKey="Carlton R">R. Carlton</name>
</author>
<author>
<name sortKey="Merril, C R" uniqKey="Merril C">C.R. Merril</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Capparelli, R" uniqKey="Capparelli R">R. Capparelli</name>
</author>
<author>
<name sortKey="Parlato, M" uniqKey="Parlato M">M. Parlato</name>
</author>
<author>
<name sortKey="Borriello, G" uniqKey="Borriello G">G. Borriello</name>
</author>
<author>
<name sortKey="Salvatore, P" uniqKey="Salvatore P">P. Salvatore</name>
</author>
<author>
<name sortKey="Iannelli, D" uniqKey="Iannelli D">D. Iannelli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, H W" uniqKey="Smith H">H.W. Smith</name>
</author>
<author>
<name sortKey="Huggins, M" uniqKey="Huggins M">M. Huggins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wright, A" uniqKey="Wright A">A. Wright</name>
</author>
<author>
<name sortKey="Hawkins, C" uniqKey="Hawkins C">C. Hawkins</name>
</author>
<author>
<name sortKey=" Ngg Rd, E" uniqKey=" Ngg Rd E">E. Änggård</name>
</author>
<author>
<name sortKey="Harper, D" uniqKey="Harper D">D. Harper</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matsuzaki, S" uniqKey="Matsuzaki S">S. Matsuzaki</name>
</author>
<author>
<name sortKey="Uchiyama, J" uniqKey="Uchiyama J">J. Uchiyama</name>
</author>
<author>
<name sortKey="Takemura Uchiyama, I" uniqKey="Takemura Uchiyama I">I. Takemura-Uchiyama</name>
</author>
<author>
<name sortKey="Daibata, M" uniqKey="Daibata M">M. Daibata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reardon, S" uniqKey="Reardon S">S. Reardon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sulakvelidze, A" uniqKey="Sulakvelidze A">A. Sulakvelidze</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guenther, S" uniqKey="Guenther S">S. Guenther</name>
</author>
<author>
<name sortKey="Huwyler, D" uniqKey="Huwyler D">D. Huwyler</name>
</author>
<author>
<name sortKey="Richard, S" uniqKey="Richard S">S. Richard</name>
</author>
<author>
<name sortKey="Loessner, M J" uniqKey="Loessner M">M.J. Loessner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carrillo, C L" uniqKey="Carrillo C">C.L. Carrillo</name>
</author>
<author>
<name sortKey="Atterbury, R" uniqKey="Atterbury R">R. Atterbury</name>
</author>
<author>
<name sortKey="El Shibiny, A" uniqKey="El Shibiny A">A. El-Shibiny</name>
</author>
<author>
<name sortKey="Connerton, P" uniqKey="Connerton P">P. Connerton</name>
</author>
<author>
<name sortKey="Dillon, E" uniqKey="Dillon E">E. Dillon</name>
</author>
<author>
<name sortKey="Scott, A" uniqKey="Scott A">A. Scott</name>
</author>
<author>
<name sortKey="Connerton, I" uniqKey="Connerton I">I. Connerton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mclean, S K" uniqKey="Mclean S">S.K. McLean</name>
</author>
<author>
<name sortKey="Dunn, L A" uniqKey="Dunn L">L.A. Dunn</name>
</author>
<author>
<name sortKey="Palombo, E A" uniqKey="Palombo E">E.A. Palombo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stern, A" uniqKey="Stern A">A. Stern</name>
</author>
<author>
<name sortKey="Sorek, R" uniqKey="Sorek R">R. Sorek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deveau, H" uniqKey="Deveau H">H. Deveau</name>
</author>
<author>
<name sortKey="Garneau, J E" uniqKey="Garneau J">J.E. Garneau</name>
</author>
<author>
<name sortKey="Moineau, S" uniqKey="Moineau S">S. Moineau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fineran, P C" uniqKey="Fineran P">P.C. Fineran</name>
</author>
<author>
<name sortKey="Blower, T R" uniqKey="Blower T">T.R. Blower</name>
</author>
<author>
<name sortKey="Foulds, I J" uniqKey="Foulds I">I.J. Foulds</name>
</author>
<author>
<name sortKey="Humphreys, D P" uniqKey="Humphreys D">D.P. Humphreys</name>
</author>
<author>
<name sortKey="Lilley, K S" uniqKey="Lilley K">K.S. Lilley</name>
</author>
<author>
<name sortKey="Salmond, G P" uniqKey="Salmond G">G.P. Salmond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carbone, A" uniqKey="Carbone A">A. Carbone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blower, T R" uniqKey="Blower T">T.R. Blower</name>
</author>
<author>
<name sortKey="Evans, T J" uniqKey="Evans T">T.J. Evans</name>
</author>
<author>
<name sortKey="Przybilski, R" uniqKey="Przybilski R">R. Przybilski</name>
</author>
<author>
<name sortKey="Fineran, P C" uniqKey="Fineran P">P.C. Fineran</name>
</author>
<author>
<name sortKey="Salmond, G P" uniqKey="Salmond G">G.P. Salmond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Labrie, S J" uniqKey="Labrie S">S.J. Labrie</name>
</author>
<author>
<name sortKey="Samson, J E" uniqKey="Samson J">J.E. Samson</name>
</author>
<author>
<name sortKey="Moineau, S" uniqKey="Moineau S">S. Moineau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weitz, J S" uniqKey="Weitz J">J.S. Weitz</name>
</author>
<author>
<name sortKey="Hartman, H" uniqKey="Hartman H">H. Hartman</name>
</author>
<author>
<name sortKey="Levin, S A" uniqKey="Levin S">S.A. Levin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Duffy, S" uniqKey="Duffy S">S. Duffy</name>
</author>
<author>
<name sortKey="Turner, P E" uniqKey="Turner P">P.E. Turner</name>
</author>
<author>
<name sortKey="Burch, C L" uniqKey="Burch C">C.L. Burch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amarillas, L" uniqKey="Amarillas L">L. Amarillas</name>
</author>
<author>
<name sortKey="Chaidez Quiroz, C" uniqKey="Chaidez Quiroz C">C. Cháidez-Quiroz</name>
</author>
<author>
<name sortKey="Sa Udo Barajas, A" uniqKey="Sa Udo Barajas A">A. Sañudo-Barajas</name>
</author>
<author>
<name sortKey="Le N Felix, J" uniqKey="Le N Felix J">J. León-Félix</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loessner, M J" uniqKey="Loessner M">M.J. Loessner</name>
</author>
<author>
<name sortKey="Neugirg, E" uniqKey="Neugirg E">E. Neugirg</name>
</author>
<author>
<name sortKey="Zink, R" uniqKey="Zink R">R. Zink</name>
</author>
<author>
<name sortKey="Scherer, S" uniqKey="Scherer S">S. Scherer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koskella, B" uniqKey="Koskella B">B. Koskella</name>
</author>
<author>
<name sortKey="Meaden, S" uniqKey="Meaden S">S. Meaden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Casjens, S R" uniqKey="Casjens S">S.R. Casjens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohwer, F" uniqKey="Rohwer F">F. Rohwer</name>
</author>
<author>
<name sortKey="Edwards, R" uniqKey="Edwards R">R. Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jacobs Sera, D" uniqKey="Jacobs Sera D">D. Jacobs-Sera</name>
</author>
<author>
<name sortKey="Marinelli, L J" uniqKey="Marinelli L">L.J. Marinelli</name>
</author>
<author>
<name sortKey="Bowman, C" uniqKey="Bowman C">C. Bowman</name>
</author>
<author>
<name sortKey="Broussard, G W" uniqKey="Broussard G">G.W. Broussard</name>
</author>
<author>
<name sortKey="Bustamante, C G" uniqKey="Bustamante C">C.G. Bustamante</name>
</author>
<author>
<name sortKey="Boyle, M M" uniqKey="Boyle M">M.M. Boyle</name>
</author>
<author>
<name sortKey="Petrova, Z O" uniqKey="Petrova Z">Z.O. Petrova</name>
</author>
<author>
<name sortKey="Dedrick, R M" uniqKey="Dedrick R">R.M. Dedrick</name>
</author>
<author>
<name sortKey="Pope, W H" uniqKey="Pope W">W.H. Pope</name>
</author>
<author>
<name sortKey="Advancing, S E A P H" uniqKey="Advancing S">S.E.A.P.H. Advancing</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woese, C R" uniqKey="Woese C">C.R. Woese</name>
</author>
<author>
<name sortKey="Fox, G E" uniqKey="Fox G">G.E. Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Larsen, M V" uniqKey="Larsen M">M.V. Larsen</name>
</author>
<author>
<name sortKey="Cosentino, S" uniqKey="Cosentino S">S. Cosentino</name>
</author>
<author>
<name sortKey="Lukjancenko, O" uniqKey="Lukjancenko O">O. Lukjancenko</name>
</author>
<author>
<name sortKey="Saputra, D" uniqKey="Saputra D">D. Saputra</name>
</author>
<author>
<name sortKey="Rasmussen, S" uniqKey="Rasmussen S">S. Rasmussen</name>
</author>
<author>
<name sortKey="Hasman, H" uniqKey="Hasman H">H. Hasman</name>
</author>
<author>
<name sortKey="Sicheritz Ponten, T" uniqKey="Sicheritz Ponten T">T. Sicheritz-Pontén</name>
</author>
<author>
<name sortKey="Aarestrup, F M" uniqKey="Aarestrup F">F.M. Aarestrup</name>
</author>
<author>
<name sortKey="Ussery, D W" uniqKey="Ussery D">D.W. Ussery</name>
</author>
<author>
<name sortKey="Lund, O" uniqKey="Lund O">O. Lund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hendrix, R W" uniqKey="Hendrix R">R.W. Hendrix</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lawrence, J G" uniqKey="Lawrence J">J.G. Lawrence</name>
</author>
<author>
<name sortKey="Hatfull, G F" uniqKey="Hatfull G">G.F. Hatfull</name>
</author>
<author>
<name sortKey="Hendrix, R W" uniqKey="Hendrix R">R.W. Hendrix</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zschach, H" uniqKey="Zschach H">H. Zschach</name>
</author>
<author>
<name sortKey="Joensen, K G" uniqKey="Joensen K">K.G. Joensen</name>
</author>
<author>
<name sortKey="Lindhard, B" uniqKey="Lindhard B">B. Lindhard</name>
</author>
<author>
<name sortKey="Lund, O" uniqKey="Lund O">O. Lund</name>
</author>
<author>
<name sortKey="Goderdzishvili, M" uniqKey="Goderdzishvili M">M. Goderdzishvili</name>
</author>
<author>
<name sortKey="Chkonia, I" uniqKey="Chkonia I">I. Chkonia</name>
</author>
<author>
<name sortKey="Jgenti, G" uniqKey="Jgenti G">G. Jgenti</name>
</author>
<author>
<name sortKey="Kvatadze, N" uniqKey="Kvatadze N">N. Kvatadze</name>
</author>
<author>
<name sortKey="Alavidze, Z" uniqKey="Alavidze Z">Z. Alavidze</name>
</author>
<author>
<name sortKey="Kutter, E M" uniqKey="Kutter E">E.M. Kutter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nielsen, H B" uniqKey="Nielsen H">H.B. Nielsen</name>
</author>
<author>
<name sortKey="Almeida, M" uniqKey="Almeida M">M. Almeida</name>
</author>
<author>
<name sortKey="Juncker, A S" uniqKey="Juncker A">A.S. Juncker</name>
</author>
<author>
<name sortKey="Rasmussen, S" uniqKey="Rasmussen S">S. Rasmussen</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J. Li</name>
</author>
<author>
<name sortKey="Sunagawa, S" uniqKey="Sunagawa S">S. Sunagawa</name>
</author>
<author>
<name sortKey="Plichta, D R" uniqKey="Plichta D">D.R. Plichta</name>
</author>
<author>
<name sortKey="Gautier, L" uniqKey="Gautier L">L. Gautier</name>
</author>
<author>
<name sortKey="Pedersen, A G" uniqKey="Pedersen A">A.G. Pedersen</name>
</author>
<author>
<name sortKey="Le Chatelier, E" uniqKey="Le Chatelier E">E. le Chatelier</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Euzeby, J P" uniqKey="Euzeby J">J.P. Euzéby</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hobohm, U" uniqKey="Hobohm U">U. Hobohm</name>
</author>
<author>
<name sortKey="Scharf, M" uniqKey="Scharf M">M. Scharf</name>
</author>
<author>
<name sortKey="Schneider, R" uniqKey="Schneider R">R. Schneider</name>
</author>
<author>
<name sortKey="Sander, C" uniqKey="Sander C">C. Sander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bonferroni, C E" uniqKey="Bonferroni C">C.E. Bonferroni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, S F" uniqKey="Altschul S">S.F. Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W. Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W. Miller</name>
</author>
<author>
<name sortKey="Myers, E W" uniqKey="Myers E">E.W. Myers</name>
</author>
<author>
<name sortKey="Lipman, D J" uniqKey="Lipman D">D.J. Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Akhter, S" uniqKey="Akhter S">S. Akhter</name>
</author>
<author>
<name sortKey="Aziz, R K" uniqKey="Aziz R">R.K. Aziz</name>
</author>
<author>
<name sortKey="Edwards, R A" uniqKey="Edwards R">R.A. Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dang, V T" uniqKey="Dang V">V.T. Dang</name>
</author>
<author>
<name sortKey="Sullivan, M B" uniqKey="Sullivan M">M.B. Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martinez Garcia, M" uniqKey="Martinez Garcia M">M. Martínez-García</name>
</author>
<author>
<name sortKey="Santos, F" uniqKey="Santos F">F. Santos</name>
</author>
<author>
<name sortKey="Moreno Paz, M" uniqKey="Moreno Paz M">M. Moreno-Paz</name>
</author>
<author>
<name sortKey="Parro, V" uniqKey="Parro V">V. Parro</name>
</author>
<author>
<name sortKey="Ant N, J" uniqKey="Ant N J">J. Antón</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S. Roux</name>
</author>
<author>
<name sortKey="Enault, F" uniqKey="Enault F">F. Enault</name>
</author>
<author>
<name sortKey="Hurwitz, B L" uniqKey="Hurwitz B">B.L. Hurwitz</name>
</author>
<author>
<name sortKey="Sullivan, M B" uniqKey="Sullivan M">M.B. Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roux, S" uniqKey="Roux S">S. Roux</name>
</author>
<author>
<name sortKey="Hallam, S J" uniqKey="Hallam S">S.J. Hallam</name>
</author>
<author>
<name sortKey="Woyke, T" uniqKey="Woyke T">T. Woyke</name>
</author>
<author>
<name sortKey="Sullivan, M B" uniqKey="Sullivan M">M.B. Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Williamson, S J" uniqKey="Williamson S">S.J. Williamson</name>
</author>
<author>
<name sortKey="Allen, L Z" uniqKey="Allen L">L.Z. Allen</name>
</author>
<author>
<name sortKey="Lorenzi, H A" uniqKey="Lorenzi H">H.A. Lorenzi</name>
</author>
<author>
<name sortKey="Fadrosh, D W" uniqKey="Fadrosh D">D.W. Fadrosh</name>
</author>
<author>
<name sortKey="Brami, D" uniqKey="Brami D">D. Brami</name>
</author>
<author>
<name sortKey="Thiagarajan, M" uniqKey="Thiagarajan M">M. Thiagarajan</name>
</author>
<author>
<name sortKey="Mccrow, J P" uniqKey="Mccrow J">J.P. McCrow</name>
</author>
<author>
<name sortKey="Tovchigrechko, A" uniqKey="Tovchigrechko A">A. Tovchigrechko</name>
</author>
<author>
<name sortKey="Yooseph, S" uniqKey="Yooseph S">S. Yooseph</name>
</author>
<author>
<name sortKey="Venter, J C" uniqKey="Venter J">J.C. Venter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, R A" uniqKey="Edwards R">R.A. Edwards</name>
</author>
<author>
<name sortKey="Mcnair, K" uniqKey="Mcnair K">K. McNair</name>
</author>
<author>
<name sortKey="Faust, K" uniqKey="Faust K">K. Faust</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J. Raes</name>
</author>
<author>
<name sortKey="Dutilh, B E" uniqKey="Dutilh B">B.E. Dutilh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C. Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kawulok, J" uniqKey="Kawulok J">J. Kawulok</name>
</author>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S. Deorowicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, D E" uniqKey="Wood D">D.E. Wood</name>
</author>
<author>
<name sortKey="Salzberg, S L" uniqKey="Salzberg S">S.L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edwards, R A" uniqKey="Edwards R">R.A. Edwards</name>
</author>
<author>
<name sortKey="Olson, R" uniqKey="Olson R">R. Olson</name>
</author>
<author>
<name sortKey="Disz, T" uniqKey="Disz T">T. Disz</name>
</author>
<author>
<name sortKey="Pusch, G D" uniqKey="Pusch G">G.D. Pusch</name>
</author>
<author>
<name sortKey="Vonstein, V" uniqKey="Vonstein V">V. Vonstein</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R. Stevens</name>
</author>
<author>
<name sortKey="Overbeek, R" uniqKey="Overbeek R">R. Overbeek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marinelli, L J" uniqKey="Marinelli L">L.J. Marinelli</name>
</author>
<author>
<name sortKey="Fitz Gibbon, S" uniqKey="Fitz Gibbon S">S. Fitz-Gibbon</name>
</author>
<author>
<name sortKey="Hayes, C" uniqKey="Hayes C">C. Hayes</name>
</author>
<author>
<name sortKey="Bowman, C" uniqKey="Bowman C">C. Bowman</name>
</author>
<author>
<name sortKey="Inkeles, M" uniqKey="Inkeles M">M. Inkeles</name>
</author>
<author>
<name sortKey="Loncaric, A" uniqKey="Loncaric A">A. Loncaric</name>
</author>
<author>
<name sortKey="Russell, D A" uniqKey="Russell D">D.A. Russell</name>
</author>
<author>
<name sortKey="Jacobs Sera, D" uniqKey="Jacobs Sera D">D. Jacobs-Sera</name>
</author>
<author>
<name sortKey="Cokus, S" uniqKey="Cokus S">S. Cokus</name>
</author>
<author>
<name sortKey="Pellegrini, M" uniqKey="Pellegrini M">M. Pellegrini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J. Liu</name>
</author>
<author>
<name sortKey="Yan, R" uniqKey="Yan R">R. Yan</name>
</author>
<author>
<name sortKey="Zhong, Q" uniqKey="Zhong Q">Q. Zhong</name>
</author>
<author>
<name sortKey="Ngo, S" uniqKey="Ngo S">S. Ngo</name>
</author>
<author>
<name sortKey="Bangayan, N J" uniqKey="Bangayan N">N.J. Bangayan</name>
</author>
<author>
<name sortKey="Nguyen, L" uniqKey="Nguyen L">L. Nguyen</name>
</author>
<author>
<name sortKey="Lui, T" uniqKey="Lui T">T. Lui</name>
</author>
<author>
<name sortKey="Liu, M" uniqKey="Liu M">M. Liu</name>
</author>
<author>
<name sortKey="Erfe, M C" uniqKey="Erfe M">M.C. Erfe</name>
</author>
<author>
<name sortKey="Craft, N" uniqKey="Craft N">N. Craft</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Farrar, M D" uniqKey="Farrar M">M.D. Farrar</name>
</author>
<author>
<name sortKey="Howson, K M" uniqKey="Howson K">K.M. Howson</name>
</author>
<author>
<name sortKey="Bojar, R A" uniqKey="Bojar R">R.A. Bojar</name>
</author>
<author>
<name sortKey="West, D" uniqKey="West D">D. West</name>
</author>
<author>
<name sortKey="Towler, J C" uniqKey="Towler J">J.C. Towler</name>
</author>
<author>
<name sortKey="Parry, J" uniqKey="Parry J">J. Parry</name>
</author>
<author>
<name sortKey="Pelton, K" uniqKey="Pelton K">K. Pelton</name>
</author>
<author>
<name sortKey="Holland, K T" uniqKey="Holland K">K.T. Holland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuhn, J" uniqKey="Kuhn J">J. Kuhn</name>
</author>
<author>
<name sortKey="Suissa, M" uniqKey="Suissa M">M. Suissa</name>
</author>
<author>
<name sortKey="Chiswell, D" uniqKey="Chiswell D">D. Chiswell</name>
</author>
<author>
<name sortKey="Azriel, A" uniqKey="Azriel A">A. Azriel</name>
</author>
<author>
<name sortKey="Berman, B" uniqKey="Berman B">B. Berman</name>
</author>
<author>
<name sortKey="Shahar, D" uniqKey="Shahar D">D. Shahar</name>
</author>
<author>
<name sortKey="Reznick, S" uniqKey="Reznick S">S. Reznick</name>
</author>
<author>
<name sortKey="Sharf, R" uniqKey="Sharf R">R. Sharf</name>
</author>
<author>
<name sortKey="Wyse, J" uniqKey="Wyse J">J. Wyse</name>
</author>
<author>
<name sortKey="Bar On, T" uniqKey="Bar On T">T. Bar-On</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ford, M E" uniqKey="Ford M">M.E. Ford</name>
</author>
<author>
<name sortKey="Sarkis, G J" uniqKey="Sarkis G">G.J. Sarkis</name>
</author>
<author>
<name sortKey="Belanger, A E" uniqKey="Belanger A">A.E. Belanger</name>
</author>
<author>
<name sortKey="Hendrix, R W" uniqKey="Hendrix R">R.W. Hendrix</name>
</author>
<author>
<name sortKey="Hatfull, G F" uniqKey="Hatfull G">G.F. Hatfull</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwudke, D" uniqKey="Schwudke D">D. Schwudke</name>
</author>
<author>
<name sortKey="Ergin, A" uniqKey="Ergin A">A. Ergin</name>
</author>
<author>
<name sortKey="Michael, K" uniqKey="Michael K">K. Michael</name>
</author>
<author>
<name sortKey="Volkmar, S" uniqKey="Volkmar S">S. Volkmar</name>
</author>
<author>
<name sortKey="Appel, B" uniqKey="Appel B">B. Appel</name>
</author>
<author>
<name sortKey="Knabner, D" uniqKey="Knabner D">D. Knabner</name>
</author>
<author>
<name sortKey="Konietzny, A" uniqKey="Konietzny A">A. Konietzny</name>
</author>
<author>
<name sortKey="Strauch, E" uniqKey="Strauch E">E. Strauch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garcia, E" uniqKey="Garcia E">E. Garcia</name>
</author>
<author>
<name sortKey="Elliott, J M" uniqKey="Elliott J">J.M. Elliott</name>
</author>
<author>
<name sortKey="Ramanculov, E" uniqKey="Ramanculov E">E. Ramanculov</name>
</author>
<author>
<name sortKey="Chain, P S" uniqKey="Chain P">P.S. Chain</name>
</author>
<author>
<name sortKey="Chu, M C" uniqKey="Chu M">M.C. Chu</name>
</author>
<author>
<name sortKey="Molineux, I J" uniqKey="Molineux I">I.J. Molineux</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, X" uniqKey="Zhao X">X. Zhao</name>
</author>
<author>
<name sortKey="Cui, Y" uniqKey="Cui Y">Y. Cui</name>
</author>
<author>
<name sortKey="Yan, Y" uniqKey="Yan Y">Y. Yan</name>
</author>
<author>
<name sortKey="Du, Z" uniqKey="Du Z">Z. Du</name>
</author>
<author>
<name sortKey="Tan, Y" uniqKey="Tan Y">Y. Tan</name>
</author>
<author>
<name sortKey="Yang, H" uniqKey="Yang H">H. Yang</name>
</author>
<author>
<name sortKey="Bi, Y" uniqKey="Bi Y">Y. Bi</name>
</author>
<author>
<name sortKey="Zhang, P" uniqKey="Zhang P">P. Zhang</name>
</author>
<author>
<name sortKey="Zhou, L" uniqKey="Zhou L">L. Zhou</name>
</author>
<author>
<name sortKey="Zhou, D" uniqKey="Zhou D">D. Zhou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chang, H W" uniqKey="Chang H">H.W. Chang</name>
</author>
<author>
<name sortKey="Kim, K H" uniqKey="Kim K">K.H. Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Lappe, N" uniqKey="De Lappe N">N. De Lappe</name>
</author>
<author>
<name sortKey="Doran, G" uniqKey="Doran G">G. Doran</name>
</author>
<author>
<name sortKey="O Onnor, J" uniqKey="O Onnor J">J. O’Connor</name>
</author>
<author>
<name sortKey="O Are, C" uniqKey="O Are C">C. O’Hare</name>
</author>
<author>
<name sortKey="Cormican, M" uniqKey="Cormican M">M. Cormican</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hood, A" uniqKey="Hood A">A. Hood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bielke, L" uniqKey="Bielke L">L. Bielke</name>
</author>
<author>
<name sortKey="Higgins, S" uniqKey="Higgins S">S. Higgins</name>
</author>
<author>
<name sortKey="Donoghue, A" uniqKey="Donoghue A">A. Donoghue</name>
</author>
<author>
<name sortKey="Donoghue, D" uniqKey="Donoghue D">D. Donoghue</name>
</author>
<author>
<name sortKey="Hargis, B" uniqKey="Hargis B">B. Hargis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jensen, E C" uniqKey="Jensen E">E.C. Jensen</name>
</author>
<author>
<name sortKey="Schrader, H S" uniqKey="Schrader H">H.S. Schrader</name>
</author>
<author>
<name sortKey="Rieland, B" uniqKey="Rieland B">B. Rieland</name>
</author>
<author>
<name sortKey="Thompson, T L" uniqKey="Thompson T">T.L. Thompson</name>
</author>
<author>
<name sortKey="Lee, K W" uniqKey="Lee K">K.W. Lee</name>
</author>
<author>
<name sortKey="Nickerson, K W" uniqKey="Nickerson K">K.W. Nickerson</name>
</author>
<author>
<name sortKey="Kokjohn, T A" uniqKey="Kokjohn T">T.A. Kokjohn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Olsen, R H" uniqKey="Olsen R">R.H. Olsen</name>
</author>
<author>
<name sortKey="Siak, J S" uniqKey="Siak J">J.S. Siak</name>
</author>
<author>
<name sortKey="Gray, R H" uniqKey="Gray R">R.H. Gray</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carson, L" uniqKey="Carson L">L. Carson</name>
</author>
<author>
<name sortKey="Gorman, S P" uniqKey="Gorman S">S.P. Gorman</name>
</author>
<author>
<name sortKey="Gilmore, B F" uniqKey="Gilmore B">B.F. Gilmore</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Viruses</journal-id>
<journal-id journal-id-type="iso-abbrev">Viruses</journal-id>
<journal-id journal-id-type="publisher-id">viruses</journal-id>
<journal-title-group>
<journal-title>Viruses</journal-title>
</journal-title-group>
<issn pub-type="epub">1999-4915</issn>
<publisher>
<publisher-name>MDPI</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">27153081</article-id>
<article-id pub-id-type="pmc">4885074</article-id>
<article-id pub-id-type="doi">10.3390/v8050116</article-id>
<article-id pub-id-type="publisher-id">viruses-08-00116</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>HostPhinder: A Phage Host Prediction Tool</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Villarroel</surname>
<given-names>Julia</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
<xref rid="c1-viruses-08-00116" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kleinheinz</surname>
<given-names>Kortine Annina</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jurtz</surname>
<given-names>Vanessa Isabell</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zschach</surname>
<given-names>Henrike</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lund</surname>
<given-names>Ole</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nielsen</surname>
<given-names>Morten</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
<xref ref-type="aff" rid="af2-viruses-08-00116">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Larsen</surname>
<given-names>Mette Voldby</given-names>
</name>
<xref ref-type="aff" rid="af1-viruses-08-00116">1</xref>
<xref rid="c1-viruses-08-00116" ref-type="corresp">*</xref>
</contrib>
</contrib-group>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Lavigne</surname>
<given-names>Rob</given-names>
</name>
<role>Academic Editor</role>
</contrib>
</contrib-group>
<aff id="af1-viruses-08-00116">
<label>1</label>
Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
<email>kortinekleinheinz@gmx.de</email>
(K.A.K.);
<email>vanessa@cbs.dtu.dk</email>
(V.I.J.);
<email>henrike@cbs.dtu.dk</email>
(H.Z.);
<email>lund@cbs.dtu.dk</email>
(O.L.);
<email>mniel@cbs.dtu.dk</email>
(M.N.)</aff>
<aff id="af2-viruses-08-00116">
<label>2</label>
Instituto de Investigaciones Biotecnológicas, Universidad de San Martín, CP(1650) San Martín, Prov. de Buenos Aires, Argentina</aff>
<author-notes>
<corresp id="c1-viruses-08-00116">
<label>*</label>
Correspondence:
<email>juliavi@cbs.dtu.dk</email>
(J.V.);
<email>metteb@cbs.dtu.dk</email>
(M.V.L.); Tel.: +45-4525-2425 (M.V.L.); Fax: +45-4593-1585 (M.V.L.)</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>04</day>
<month>5</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<month>5</month>
<year>2016</year>
</pub-date>
<volume>8</volume>
<issue>5</issue>
<elocation-id>116</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>12</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>4</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>© 2016 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2016</copyright-year>
<license>
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
).</license-p>
</license>
</permissions>
<abstract>
<p>The current dramatic increase of antibiotic resistant bacteria has revitalised the interest in bacteriophages as alternative antibacterial treatment. Meanwhile, the development of bioinformatics methods for analysing genomic data places high-throughput approaches for phage characterization within reach. Here, we present HostPhinder, a tool aimed at predicting the bacterial host of phages by examining the phage genome sequence. Using a reference database of 2196 phages with known hosts, HostPhinder predicts the host species of a query phage as the host of the most genomically similar reference phages. As a measure of genomic similarity the number of co-occurring k-mers (DNA sequences of length k) is used. Using an independent evaluation set, HostPhinder was able to correctly predict host genus and species for 81% and 74% of the phages respectively, giving predictions for more phages than BLAST and significantly outperforming BLAST on phages for which both had predictions. HostPhinder predictions on phage draft genomes from the INTESTI phage cocktail corresponded well with the advertised targets of the cocktail. Our study indicates that for most phages genomic similarity correlates well with related bacterial hosts. HostPhinder is available as an interactive web service [
<xref rid="B1-viruses-08-00116" ref-type="bibr">1</xref>
] and as a stand alone download from the Docker registry [
<xref rid="B2-viruses-08-00116" ref-type="bibr">2</xref>
].</p>
</abstract>
<kwd-group>
<kwd>“host specificity”</kwd>
<kwd>prediction</kwd>
<kwd>genome</kwd>
<kwd>k-mers</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="sec1-viruses-08-00116">
<title>1. Introduction</title>
<p>In 2012, the World Health Organization (WHO) announced the beginning of the end of the antibiotic era, and the possible return to a time when even trivial bacterial infections could turn out to be fatal [
<xref rid="B3-viruses-08-00116" ref-type="bibr">3</xref>
]. Since then, the problem of antimicrobial resistance has continued to grow and in the foreword to the WHO report “Antimicrobial resistance: global report on surveillance 2014” it is stated that “A post-antibiotic era-in which common infections and minor injuries can kill-far from being an apocalyptic fantasy, is instead a very real possibility for the 21st century” [
<xref rid="B4-viruses-08-00116" ref-type="bibr">4</xref>
]. As emphasized by WHO there is an urgent need for treatment alternatives, one such being bacteriophages (phages). The idea of using phages for the treatment of bacterial infections dates back to 1919, when French-Canadian microbiologist Félix d’Herelle used them for treating a patient with severe bacillary dysentery [
<xref rid="B5-viruses-08-00116" ref-type="bibr">5</xref>
]. For a number of historical reasons, phage therapy never became general practice in the West, although it has been used extensively in countries from the former Eastern bloc [
<xref rid="B6-viruses-08-00116" ref-type="bibr">6</xref>
,
<xref rid="B7-viruses-08-00116" ref-type="bibr">7</xref>
,
<xref rid="B8-viruses-08-00116" ref-type="bibr">8</xref>
,
<xref rid="B9-viruses-08-00116" ref-type="bibr">9</xref>
]. Several recent studies from the West have also demonstrated the effectiveness of phages as antibacterial treatment [
<xref rid="B10-viruses-08-00116" ref-type="bibr">10</xref>
,
<xref rid="B11-viruses-08-00116" ref-type="bibr">11</xref>
,
<xref rid="B12-viruses-08-00116" ref-type="bibr">12</xref>
,
<xref rid="B13-viruses-08-00116" ref-type="bibr">13</xref>
], and more countries are currently revisiting phage therapy [
<xref rid="B14-viruses-08-00116" ref-type="bibr">14</xref>
,
<xref rid="B15-viruses-08-00116" ref-type="bibr">15</xref>
]. Phages have furthermore been suggested for use in the agriculture and food industries [
<xref rid="B16-viruses-08-00116" ref-type="bibr">16</xref>
,
<xref rid="B17-viruses-08-00116" ref-type="bibr">17</xref>
]. Examples include their use for reducing
<italic>Campylobacter jejuni</italic>
colonisation of broiler chickens [
<xref rid="B18-viruses-08-00116" ref-type="bibr">18</xref>
] and the growth of
<italic>E. coli</italic>
in milk [
<xref rid="B19-viruses-08-00116" ref-type="bibr">19</xref>
].</p>
<p>For a phage to successfully infect a bacterial host, the phage must adsorb to the bacterial surface through recognition of specific host receptors, e.g., proteins, LPS, or cell wall polysaccharides. Phage adsorption to an appropriate surface receptor is, however, only the first step required for successful infection. Several host defence mechanisms must also be overcome: Restriction-Modification (RM) systems have been shown to be present in more than 90% of sequenced bacterial genomes [
<xref rid="B20-viruses-08-00116" ref-type="bibr">20</xref>
]. These systems include restriction enzymes that degrade incoming phage DNA with appropriate target sequences. Some bacteria contain Clustered Regular Interspaced Short Palindromic Repeats (CRISPR) loci, which together with the CRISPR-associated (cas) genes encode an adaptive anti-phage immune system [
<xref rid="B21-viruses-08-00116" ref-type="bibr">21</xref>
]. Phage abortive systems (Abi systems) allow infected bacteria to commit “altruistic suicide” thereby preventing the spread of the phage within the bacterial community [
<xref rid="B22-viruses-08-00116" ref-type="bibr">22</xref>
]. Other factors such as successful gene transcription and translation based on amino acid or tRNA availability further limit the host range [
<xref rid="B23-viruses-08-00116" ref-type="bibr">23</xref>
]. Bacteria and phages have from the outset of their coexistence been engaged in a vehement arms race leading to intricate coevolutionary processes, and for each of the defence mechanisms mentioned above, examples exist of phages that have evolved to circumvent them [
<xref rid="B24-viruses-08-00116" ref-type="bibr">24</xref>
,
<xref rid="B25-viruses-08-00116" ref-type="bibr">25</xref>
]. The arms race has contributed to bacterial as well as phage diversity [
<xref rid="B26-viruses-08-00116" ref-type="bibr">26</xref>
] and entails that phage host determination is influenced by multiple genes and genome features distributed across the phage genome. Although examples exist of phages that have extended their host range based on only a few mutations [
<xref rid="B27-viruses-08-00116" ref-type="bibr">27</xref>
], the extended host range is typically limited to different strains of the same species. Apart from polyvalent enterobacteria phages, which are able to infect members of phylogenetically linked genera within the
<italic>Enterobacteriaceae</italic>
family, e.g.,
<italic>Escherichia</italic>
,
<italic>Shigella</italic>
, and
<italic>Klebsiella</italic>
[
<xref rid="B28-viruses-08-00116" ref-type="bibr">28</xref>
,
<xref rid="B29-viruses-08-00116" ref-type="bibr">29</xref>
], most phages have been found to be specific to a particular genus [
<xref rid="B30-viruses-08-00116" ref-type="bibr">30</xref>
]. This has been indicated by studies examining proteins, not entire proteomes [
<xref rid="B31-viruses-08-00116" ref-type="bibr">31</xref>
], as has the “Phage Proteomic Tree”, which is based on completely sequenced phage genomes [
<xref rid="B32-viruses-08-00116" ref-type="bibr">32</xref>
], and analysis of genome type for Mycobacteriophages and host preference [
<xref rid="B33-viruses-08-00116" ref-type="bibr">33</xref>
].</p>
<p>In this study, we extend the observation that genetically similar phages often share the same bacterial host species and hypothesize that it should be possible to predict the host species of a phage by searching for the most genetically similar phages in a database of reference phages with known hosts. In the developed method, called HostPhinder, genetic similarity is defined as the number of co-occurring k-mers between the query phage and phages in the reference database. K-mers are stretches of DNA with a length of k, and their use as a measure of genetic relatedness dates back to Woese and Fox and their groundbreaking paper from 1977, which uncovered Archaea as a separate branch in the tree of life [
<xref rid="B34-viruses-08-00116" ref-type="bibr">34</xref>
]. Woese and Fox limited their analysis to k-mers (they used the term oligonucleotides) in 16S (18S) ribosomal RNA, but since phages do not have 16S rRNA genes or any other genes which are common to all phages [
<xref rid="B32-viruses-08-00116" ref-type="bibr">32</xref>
], and because high-throughput sequencing methods have made the entire genome of phages easily available, HostPhinder examines the complete genome. Further, for bacteria we have previously shown that the co-occurrence of k-mers across the entire genome performs superior to other whole-genome or single locus based approaches for inferring genetic relatedness [
<xref rid="B35-viruses-08-00116" ref-type="bibr">35</xref>
]. The splitting of entire phage genomes into overlapping k-mers may furthermore be an advantage in relation to the highly mosaic phage genome structure [
<xref rid="B36-viruses-08-00116" ref-type="bibr">36</xref>
,
<xref rid="B37-viruses-08-00116" ref-type="bibr">37</xref>
].</p>
<p>We believe that a method enabling prediction of the bacterial hosts of phages will be useful for several reasons. Firstly, phages have for many years been used to treat bacterial infections in countries belonging to the former Eastern bloc. The Eliava Institute in Tbilisi, Georgia has in particular been dominant in this regard and produce cocktails containing a mixture of phages for a range of bacterial infections. One of the steps towards adopting phage therapy in the West, is likely to be a full characterization of the content of these cocktails, which due to the way they are manufactured is not known [
<xref rid="B38-viruses-08-00116" ref-type="bibr">38</xref>
]. Further, the current approach to exploration of many ecological niches is done by untargeted sequencing of samples isolated directly from the environment, so called metagenomics. This enables identification of phage and bacterial sequences without knowledge of the link between them, and importantly also enables identification of bacteria, and hence phages, that cannot be cultured. HostPhinder could help establish the link between phages and bacteria, which might be an important step towards understanding, e.g., the microbiome of the human gut, and possibly associations between the microbiome and clinical parameters of the human host [
<xref rid="B39-viruses-08-00116" ref-type="bibr">39</xref>
].</p>
</sec>
<sec id="sec2-viruses-08-00116">
<title>2. Materials and Methods</title>
<sec id="sec2dot1-viruses-08-00116">
<title>2.1. Whole Genome Phage Sequences from Public Databases</title>
<p>A set of public phage Whole Genome Sequences (WGS) was collected in August 2014: First, lists of phage WGS IDs were obtained from Phages.ids–VBI mirrors page [
<xref rid="B40-viruses-08-00116" ref-type="bibr">40</xref>
], the NCBI viral Genome Resource [
<xref rid="B41-viruses-08-00116" ref-type="bibr">41</xref>
], the EMBL EBI phage genomes list [
<xref rid="B42-viruses-08-00116" ref-type="bibr">42</xref>
], and the phagesdb databases for Mycobacteriophages [
<xref rid="B43-viruses-08-00116" ref-type="bibr">43</xref>
], Arthrobacter [
<xref rid="B44-viruses-08-00116" ref-type="bibr">44</xref>
], Bacillus [
<xref rid="B45-viruses-08-00116" ref-type="bibr">45</xref>
], and Streptomyces [
<xref rid="B46-viruses-08-00116" ref-type="bibr">46</xref>
]. The resulting unique list of IDs was uploaded to the Batch Entrez service of NCBI to retrieve the corresponding WGS. Furthermore genome sequences were downloaded from the PhAnToMe genomes database and from NCBI searching for “(phage [Title]) AND complete genome”.</p>
<p>Only entries indicating "complete genome" in the DEFINITION field of the GeneBank file and which host taxonomy was specified at least at the genus level were included. Entries annotated as "prophage" in the DEFINITION were removed. Hosts annotated as
<italic>Salmonella Typhimurium</italic>
were re-annotated as
<italic>Salmonella enterica</italic>
according to current nomenclature [
<xref rid="B47-viruses-08-00116" ref-type="bibr">47</xref>
]. Finally, only the genus was taken into account for hosts with species specified as "sp." followed by an alphanumeric code; for example
<italic>Synechococcus</italic>
sp.
<italic>WH7803</italic>
was re-annotated as
<italic>Synechococcus</italic>
. 2196 phages had annotated host genus, here called
<inline-formula>
<mml:math id="mm1">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>genus</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
dataset, and of these, 1871 had annotated species as well,
<inline-formula>
<mml:math id="mm2">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>species</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
. A total of 209 different host species and 129 different genera were represented among the phages (this data is available in HostPhinder’s repository [
<xref rid="B48-viruses-08-00116" ref-type="bibr">48</xref>
]).
<xref ref-type="fig" rid="viruses-08-00116-f001">Figure 1</xref>
shows the distribution of hosts in the dataset.</p>
</sec>
<sec id="sec2dot2-viruses-08-00116">
<title>2.2. Data Partitioning and Clustering</title>
<p>In this study, a 4-fold cross validation setup was used to assess the ability of the host prediction method to generalize to previously unseen data. Five data partitions were made, and one partition,
<inline-formula>
<mml:math id="mm3">
<mml:mrow>
<mml:msub>
<mml:mi>phage</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
was left aside during the entire process of parameter optimization. Once the parameters were optimized, the prediction accuracy was evaluated on this
<inline-formula>
<mml:math id="mm4">
<mml:mrow>
<mml:msub>
<mml:mi>phage</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set, using the entire
<inline-formula>
<mml:math id="mm5">
<mml:mrow>
<mml:msub>
<mml:mi>phage</mml:mi>
<mml:mrow>
<mml:mi>train</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>test</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set as reference database (
<xref ref-type="supplementary-material" rid="viruses-08-00116-s001">Supplementary Materials Figure S1</xref>
). In this setup, the performance of the evaluation set is hence completely unbiased towards the model parameter optimizations.</p>
<p>A reliable,
<italic>i.e.</italic>
, not overfitted, evaluation can only be made if phage genomes in the training-test and evaluation sets are not too similar to each other. Indeed, if a phage genome in the training set is almost identical to a genome in the evaluation set, it would be a simple task for HostPhinder to predict its host, leading to an overestimation of the method’s ability to generalize to previously unseen data. To avoid such a bias we clustered the genomes according to 16-mer similarity by means of a Hobohm 1 approach [
<xref rid="B49-viruses-08-00116" ref-type="bibr">49</xref>
]. The Hobohm approach consists in the formation of a final list of representative phage genomes, here called seeds. After the first sequence in a randomly sorted list enters the seed list and forms a seed, the following sequences are each checked for similarity (number of overlapping 16-mers) to each seed in the final list. Only if significantly different to the seed sequences, the new sequence will be included in the seed list. Otherwise, it will be linked to the most similar seed as member of the same cluster. The similarity between two genomes was measured in terms of
<inline-formula>
<mml:math id="mm6">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mi>q</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
(see Equation (
<xref ref-type="disp-formula" rid="FD4-viruses-08-00116">4</xref>
) in section “K-mer-based resemblance measures”) using a threshold
<inline-formula>
<mml:math id="mm7">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mi>q</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
> 0.7. This threshold was chosen because the resulting clustering was most similar (93%) to the clustering obtained with a BLAST-Hobohm1 approach, where the similarity threshold was set to >90% genomewide ID (data not shown). The k-mer-Hobohm1 analysis resulted in 293 clusters with at least 2 sequences and 1121 singlets. The total number of seeds was hence 1414 containing 1 to 97 sequences. To separate the clustered phages in train-test and evaluation sets, the 1414 seeds were sorted by host alphabetical order, and secondly by size and alternately distributed between 5 partitions. This assured an equal host and genome size representation among partitions. Finally remaining members of each cluster were integrated into the partition of their respective seed. Sequences within the same cluster shared the host; therefore the unbiased host distribution was maintained also after integrating members of the clusters in each partition (see
<xref ref-type="supplementary-material" rid="viruses-08-00116-s001">Supplementary Materials Figure S2</xref>
). Subsets of each of these partitions were made, which comprised all phages that contained information about the species of the host, overall constituting the
<inline-formula>
<mml:math id="mm8">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>species</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
dataset. The host and size distribution between partitions remained conserved (see
<xref ref-type="supplementary-material" rid="viruses-08-00116-s001">Supplementary Materials Figures S2–S4</xref>
). As stated above, one partition was next left aside for final evaluation,
<inline-formula>
<mml:math id="mm9">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, and the remaining 4 formed the train-test set,
<inline-formula>
<mml:math id="mm10">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
. The final
<inline-formula>
<mml:math id="mm11">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set contained 1818 phages (115 genera and 190 species), the
<inline-formula>
<mml:math id="mm12">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set contained 378 phages (72 genera, 96 species), while the
<inline-formula>
<mml:math id="mm13">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mtext>train-test,species</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set consisted of 1546 phages and the
<inline-formula>
<mml:math id="mm14">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set consisted of 325 phages (data available in HostPhinder’s repository [
<xref rid="B48-viruses-08-00116" ref-type="bibr">48</xref>
]).</p>
</sec>
<sec id="sec2dot3-viruses-08-00116">
<title>2.3. K-mer-Based Resemblance Measures</title>
<p>Under the assumption that phages infecting the same bacterial host share genomic features, the host of a query phage should be predictable by searching for the most genomically similar phages in a reference database of phages with annotated hosts. The reference database was build from phage genome sequences and their reverse complements by splitting both into k-mers and sliding a window of length k along the sequences with step-size 1.</p>
<p>Query sequences were likewise split into k-mers, and for each reference sequence
<italic>i</italic>
having at least one k-mer in common with the query, a score,
<inline-formula>
<mml:math id="mm15">
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
, was defined as the number of identical unique k-mers between query and template. This score was subsequently used to determine the expectation value
<inline-formula>
<mml:math id="mm16">
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
:
<disp-formula id="FD1-viruses-08-00116">
<label>(1)</label>
<mml:math id="mm17">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>Hits</mml:mi>
</mml:msub>
<mml:mfrac>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>tot</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
where
<inline-formula>
<mml:math id="mm18">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>Hits</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
is the sum of scores over all references,
<inline-formula>
<mml:math id="mm19">
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, is the total number of unique k-mers found in the reference sequence
<italic>i</italic>
and in its reverse complement and
<inline-formula>
<mml:math id="mm20">
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>tot</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
is the sum of unique k-mers over all references in the database. This expectation value was used to obtain a z-score:
<disp-formula id="FD2-viruses-08-00116">
<label>(2)</label>
<mml:math id="mm21">
<mml:mrow>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>η</mml:mi>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
with
<inline-formula>
<mml:math id="mm22">
<mml:mrow>
<mml:mi>η</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>001</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
being a pseudocount used to avoid division by zero. Using SciPy, a two-sided
<italic>p</italic>
-value was generated from the z-score. All
<italic>p</italic>
-values were corrected using the Bonferroni method [
<xref rid="B50-viruses-08-00116" ref-type="bibr">50</xref>
] by multiplying each
<italic>p</italic>
-value by the number of reference phages in the database:
<disp-formula id="FD3-viruses-08-00116">
<label>(3)</label>
<mml:math id="mm23">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>corr</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>*</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>ref</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
where
<inline-formula>
<mml:math id="mm24">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>ref</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
is the number of reference sequences in the database. HostPhinder outputs only significant hits,
<italic>i.e.</italic>
,
<inline-formula>
<mml:math id="mm25">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>corr</mml:mi>
</mml:msub>
<mml:mo><</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
. Additionally, the values
<inline-formula>
<mml:math id="mm26">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula>
<mml:math id="mm27">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
were estimated. They represent the ratio of the score and the number of unique k-mers in query and reference sequences respectively:
<disp-formula id="FD4-viruses-08-00116">
<label>(4)</label>
<mml:math id="mm28">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>η</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
where
<inline-formula>
<mml:math id="mm29">
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
is the number of unique query k-mers and
<inline-formula>
<mml:math id="mm30">
<mml:mrow>
<mml:mi>η</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>001</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
avoids division by zero. The value of
<inline-formula>
<mml:math id="mm31">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, falling between 0 and 1, gives a direct indication of how much of the query sequence matched to the reference phage.
<disp-formula id="FD5-viruses-08-00116">
<label>(5)</label>
<mml:math id="mm32">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>η</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
where
<inline-formula>
<mml:math id="mm33">
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
is the number of unique k-mers in the reference sequence and in its complement. Therefore,
<inline-formula>
<mml:math id="mm34">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
falls between 0.5 and 1 if query and reference are identical, depending on the number of additional unique k-mers found in the reversed complement. The two measures are hence not directly comparable. Finally the coverage was determined as a measure of how much of the reference sequence is covered by the total number of k-mers in the query that match the reference:
<disp-formula id="FD6-viruses-08-00116">
<label>(6)</label>
<mml:math id="mm35">
<mml:mrow>
<mml:msub>
<mml:mi>coverage</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi>matched</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>η</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
where
<inline-formula>
<mml:math id="mm36">
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi>matched</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
is the total number of k-mers in the query that were matched to reference
<italic>i</italic>
, and
<inline-formula>
<mml:math id="mm37">
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
is the total number of k-mers in the reference. Both of these values include identical k-mers and do not only count unique k-mers. The factor 2 is included to account for the additionally used reverse complement sequence of the reference to obtain
<inline-formula>
<mml:math id="mm38">
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
. The coverage can be larger than 1 if the query contains k-mers that could be matched multiple times.</p>
</sec>
<sec id="sec2dot4-viruses-08-00116">
<title>2.4. Determining the Measure and Selection Criteria for Final Prediction</title>
<p>As described above, 5 measures were calculated for the similarity of a query phage to each of the phages in the reference database: score, z-score,
<inline-formula>
<mml:math id="mm39">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mi>q</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
,
<inline-formula>
<mml:math id="mm40">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, and coverage. The optimal measure was determined in a simple 4 fold cross-validation setup. Here in turn, 3 of the 4 data sets were used as reference database for predicting the host for each query phage in the left out test set (see
<xref ref-type="supplementary-material" rid="viruses-08-00116-s001">Supplementary Materials Figure S1</xref>
, left). The host was inferred from the host of the reference phage with the highest value of similarity measure. This was repeated 4 times so that all 4 partitions were used as test set, and an overall performance for the given measure was calculated by concatenating the predictions of the 4 test sets. For each measure the average and interval of confidence was assessed through 100 bootstrap resamplings with replacement for each test set and calculating the overall accuracy. On a pairwise comparison based on 1000 bootstrap resamplings, coverage outperformed the other measures and was therefore chosen for further analysis. A number of different selection criteria can be used for the final prediction of the host of a query phage. We tested and compared the efficacy of 4 selection criteria that are each described in detail below.</p>
<sec id="sec2dot4dot1-viruses-08-00116">
<title>2.4.1. Criterion 1: Host of Best-Matching Reference Phage</title>
<p>The host of the reference phage with the highest coverage value was selected as predicted host. This is the selection criterion used above to define the optimal similarity measure.</p>
</sec>
<sec id="sec2dot4dot2-viruses-08-00116">
<title>2.4.2. Criterion 2: Majority Host among Top-10 Reference Phages</title>
<p>As predicted host, the most abundant host among the hosts of the top 10 reference phages with the highest coverage values was selected. In case of a tie, the most abundant host with the highest coverage, was selected.</p>
<p>In cases where the coverage of non-top reference phages is far below the coverage of the top reference phage, it might not be advantageous to consider them in the selection criterion. To accommodate this, two additional criteria, criteria 3 and 4, were developed.</p>
</sec>
<sec id="sec2dot4dot3-viruses-08-00116">
<title>2.4.3. Criterion 3: Majority Host among Reference Phages above Coverage Threshold</title>
<p>As predicted host, the most abundant host among the phages with a coverage value above a given threshold was selected. The threshold was defined as a fraction of the highest coverage:
<disp-formula id="FD7-viruses-08-00116">
<label>(7)</label>
<mml:math id="mm41">
<mml:mrow>
<mml:msub>
<mml:mi>coverage</mml:mi>
<mml:mi>threshold</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>f</mml:mi>
<mml:msub>
<mml:mi>coverage</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
where
<italic>f</italic>
(fraction) is a number in the range 0.0–1.0. Note that
<inline-formula>
<mml:math id="mm42">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
means considering all significant predictions, whilst
<inline-formula>
<mml:math id="mm43">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
corresponds to selecting the host of the reference phage with the highest coverage (criterion 1). The optimal value of
<italic>f</italic>
was determined through a nested 3 fold cross-validation to avoid biased estimates of performances that would result from using the same cross validation used to select the optimal criterion. Here in turn, 3 data partitions were used as tripartite train-test set in a procedure called inner cross-validation. Within the tripartite set, 2 partitions were sequentially used as reference database for predicting the host for the left out test set using Equation (7) for a given value of
<italic>f</italic>
. This was repeated 3 times within each tripartite set so that all 3 partitions were used as test set and an overall performance for the given
<italic>f</italic>
value was calculated (see
<xref ref-type="supplementary-material" rid="viruses-08-00116-s001">Supplementary Materials Figure S5</xref>
). For each
<italic>f</italic>
value the average accuracy was assessed through 100 bootstrap resamplings with replacement for each inner cross validation loop. The same procedure was repeated 4 times so that each tripartite combination was analysed leading to 4 estimates of the optimal
<italic>f</italic>
value. The accuracy
<italic>vs.</italic>
<italic>f</italic>
values curves are shown in
<xref ref-type="fig" rid="viruses-08-00116-f002">Figure 2</xref>
for prediction of species and genus. The horizontal bars span
<italic>f</italic>
values that yield at least 99% of the highest accuracy in the relative tripartite combination. Given these performance curves, an
<italic>f</italic>
value of 0.8 was chosen within the highest performance range,
<xref ref-type="fig" rid="viruses-08-00116-f002">Figure 2</xref>
.</p>
</sec>
<sec id="sec2dot4dot4-viruses-08-00116">
<title>2.4.4. Criterion 4: Summing up Normalized Coverage Values of Phages with Same Host</title>
<p>In the scoring method, coverage values of all significant reference phages were normalised by division by the highest coverage,
<inline-formula>
<mml:math id="mm44">
<mml:mrow>
<mml:msub>
<mml:mi>coverage</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, and raised to the power of an arbitrary number,
<inline-formula>
<mml:math id="mm45">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo>></mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
.
<disp-formula id="FD8-viruses-08-00116">
<label>(8)</label>
<mml:math id="mm46">
<mml:mrow>
<mml:msub>
<mml:mi>score</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mfenced separators="" open="(" close=")">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>coverage</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>coverage</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
<mml:mi>α</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Next, scores of hits with the same host were summed up and the host was predicted as the one with the highest score. The higher the value of
<italic>α</italic>
, the higher the score of the first hit, the closer this method is to criterion 1. Values of
<italic>α</italic>
in the range 0.0–10.0 were tested. As for the criterion 3, the optimal
<italic>α</italic>
was determined through a nested 3 fold cross-validation setup (see
<xref ref-type="supplementary-material" rid="viruses-08-00116-s001">Supplementary Materials Figure S5</xref>
) and led to the selection of
<inline-formula>
<mml:math id="mm47">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>6</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
within the range that yielded the highest accuracy in the 4 tripartite train-test sets (see
<xref ref-type="fig" rid="viruses-08-00116-f003">Figure 3</xref>
).</p>
</sec>
</sec>
<sec id="sec2dot5-viruses-08-00116">
<title>2.5. Programming Language and Speed of Execution</title>
<p>The algorithm was written in Python and Bash.</p>
<p>On an Intel(R) Xeon(R) CPU E5-4610 v2 @ 2.30GHz computer, using 2 cores and 10 GB RAM, HostPhinder average running time is of 61.1662 s for host species prediction and 109.622 s for genus prediction. The longer runtime for genus prediction is due the larger database used for genus predictions. These values were calculated on the evaluation set.</p>
</sec>
<sec id="sec2dot6-viruses-08-00116">
<title>2.6. BLAST Evaluation</title>
<p>The accuracy of the HostPhinder k-mer based approach was compared to the state-of-the-art tool in bioinformatics, BLAST [
<xref rid="B51-viruses-08-00116" ref-type="bibr">51</xref>
]. BLAST performance was assessed on the
<inline-formula>
<mml:math id="mm48">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set using the
<inline-formula>
<mml:math id="mm49">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mtext>train-test</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set to create a local nucleotide BLAST database. The host associated to the hit with the lowest E-value and secondarily highest bit score was returned as prediction.</p>
</sec>
<sec id="sec2dot7-viruses-08-00116">
<title>2.7. Establishing an Evaluation Set of Predicted Prophages</title>
<p>The PhiSpy prophage prediction tool [
<xref rid="B52-viruses-08-00116" ref-type="bibr">52</xref>
] was used to predict prophages in 2679 complete bacterial genomes collected from NCBI [
<xref rid="B53-viruses-08-00116" ref-type="bibr">53</xref>
]. PhiSpy was run once on each genome resulting in a total of 7559 predicted bacterial prophages in 2074 genomes. Of these, 2796 were from bacterial species that were also included in the HostPhinder reference database. In the following, these predicted prophages will be referred to as the
<inline-formula>
<mml:math id="mm50">
<mml:mrow>
<mml:msub>
<mml:mi>prophages</mml:mi>
<mml:mi>species</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set. A total of 4639 predicted prophages were from genera that were included in the reference database of HostPhinder. They will be referred to as the
<inline-formula>
<mml:math id="mm51">
<mml:mrow>
<mml:msub>
<mml:mi>prophages</mml:mi>
<mml:mi>genus</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set.</p>
<p>Furthermore 261 manually verified prophages were downloaded from PhiSpy and phage_finder directories from Phantome [
<xref rid="B54-viruses-08-00116" ref-type="bibr">54</xref>
] and HostPhinder prediction was tested on them.</p>
</sec>
<sec id="sec2dot8-viruses-08-00116">
<title>2.8. Host Prediction of INTESTI Bacteriophage Cocktail</title>
<p>The Georgian George Eliava Institute of Bacteriophages, Microbiology and Virology has developed phage cocktails (mixtures of phages) since the 1950s. One of these, the INTESTI bacteriophage cocktail, claims to contain sterile filtrates of phage lysates effective against
<italic>Staphylococcus</italic>
,
<italic>Enterococcus</italic>
,
<italic>Proteus</italic>
,
<italic>Shigella</italic>
,
<italic>Salmonella</italic>
,
<italic>Escherichia coli</italic>
, and
<italic>Pseudomonas aeruginosa</italic>
for the treatment of intestinal bacterial infections. The cocktail was sequenced directly on an Illumina MiSeq platform and de novo assembled to contigs, which were further grouped into 19 draft genomes each hypothesized to represent close to complete phage genomes, and 4 smaller groups hypothesized to represent fragments of phage genomes previously described [
<xref rid="B38-viruses-08-00116" ref-type="bibr">38</xref>
]. The host genus and species of each of these 23 groups was predicted by the final HostPhinder method using the 4th criterion with
<inline-formula>
<mml:math id="mm52">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>6</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
.</p>
</sec>
</sec>
<sec id="sec3-viruses-08-00116">
<title>3. Results</title>
<p>In this study, we developed and benchmarked HostPhinder, a bioinformatics tool for predicting the bacterial host species of phages. The method is based on the assumption that genetically similar phages are likely to share bacterial hosts. For performing the predictions, HostPhinder relies on a reference database in which WGS data from phages with annotated hosts have been split into k-mers. The genomes of the query phages for which the hosts should be predicted are likewise split into k-mers, and the number of co-occurring k-mers between the query phage and the phages in the reference database is used as a measure of genetic similarity.</p>
<sec id="sec3dot1-viruses-08-00116">
<title>3.1. Developing and Benchmarking the HostPhinder Method</title>
<p>Initial analysis on a small dataset indicated that k-mers of length 15–20 nt led to comparable predictive performances. In contrast, shorter k-mers were too unspecific and led to a lower final accuracy, while longer k-mers were too specific and led to more query phages for which no predictions at all could be made (data not shown). Based on these results and a previous study that showed 16-mers to be optimal, when using a k-mer based approach for bacterial species identification [
<xref rid="B35-viruses-08-00116" ref-type="bibr">35</xref>
], 16 was chosen as the k-mer length in the following.</p>
<p>In the initial testing of the basic genetic similarity assumption of HostPhinder, 5 measures were evaluated for estimating the similarity of the query phage to the reference phages as described in Materials and Methods. For each measure, the query host was inferred from the host of the reference hit with the highest similarity.
<xref ref-type="table" rid="viruses-08-00116-t001">Table 1</xref>
shows the performance of each similarity measure in this initial testing.</p>
<p>The measures’ accuracies in predicting the query phage host species of the training-test set were pairwise compared by 1000 bootstrap resamplings with replacement. Coverage performed significantly better than other measures (
<italic>p</italic>
-value < 0.05), apart from
<inline-formula>
<mml:math id="mm66">
<mml:mrow>
<mml:msub>
<mml:mi>frac</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, which in turn did not significantly outperformed coverage. Since coverage showed the highest performance in predicting the host species, it was chosen as the measure used when further optimizing HostPhinder prediction at the species level. Next, the performance of 4 scoring methods for host selection was compared (see Material and Methods for criteria description and parameter optimization). For each selection criterion only significant hits were considered (
<inline-formula>
<mml:math id="mm67">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>corr</mml:mi>
</mml:msub>
<mml:mo><</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
) and the number of queries with predictions was constant for all criteria allowing a direct comparison of criteria efficacy. Using the model parameters determined above, the 4 criteria were compared in terms of overall accuracy in a 4 fold cross-validation system. In turn, 3 of the 4 partitions were used as reference database for predicting the host for the left out test set using each criterion. This was repeated 4 times so that all 4 partitions were used as test set, and an overall performance for the given criterion was calculated. For each criterion the average and interval of confidence was assessed through 100 bootstrap resamplings with replacement for each test set and calculating the overall accuracy.
<xref ref-type="table" rid="viruses-08-00116-t002">Table 2</xref>
shows the overall accuracy on
<inline-formula>
<mml:math id="mm68">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mtext>train-test,genus</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula>
<mml:math id="mm69">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mtext>train-test,species</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
sets for each criterion on genus and species level, respectively. Bacterial host genera and species were not predicted for 5.8%
<inline-formula>
<mml:math id="mm70">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and 5.6%
<inline-formula>
<mml:math id="mm71">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
phages respectively.</p>
<p>Criterion 4 with
<inline-formula>
<mml:math id="mm83">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>6</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
had the highest predictive value, with an accuracy of 79% and 84% for species and genus respectively, even though it only significantly outperforms criterion 2.</p>
<p>Some hosts are substantially more frequent than others in the data set. This could potentially lead to a bias in the prediction, and a subsequent sub-optimal predictive performance. To investigate this, modified versions of criteria 2–4 were tested, where the sequences in the reference database were clustered according to Hobohm 1 algorithm [
<xref rid="B49-viruses-08-00116" ref-type="bibr">49</xref>
], and only the highest scoring element within one cluster was used in the prediction schema. This did not, however, improve the performance.</p>
<p>Based on the above benchmarking procedures, the final method called HostPhinder was developed. The reference database was generated by splitting all phage genomes in the entire phage set into 16-mers using a step-size of 1. After searching through the database, HostPhinder examines the coverage measure and creates a hits list,
<italic>i.e.</italic>
, phages significantly similar to the query. The final host species and genus is given according to criterion 4 with an
<inline-formula>
<mml:math id="mm84">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>6</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
. HostPhinder is freely available as a web server [
<xref rid="B1-viruses-08-00116" ref-type="bibr">1</xref>
] and as a Docker image [
<xref rid="B2-viruses-08-00116" ref-type="bibr">2</xref>
].</p>
</sec>
<sec id="sec3dot2-viruses-08-00116">
<title>3.2. Evaluating HostPhinder’s Performance on Complete and Partial Genomes</title>
<p>HostPhinder was evaluated on the
<inline-formula>
<mml:math id="mm85">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula>
<mml:math id="mm86">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
sets containing phages from public databases. HostPhinder was able to correctly predict the bacterial host species and genera of 74.24% ± 0.270% and 81.39% ± 0.206% of the phages respectively. In the evaluation set, 4.0% (3.44%) of the phages could not be matched to any phage in the database when predicting on species (genus) level. We speculated that the accuracy of the HostPhinder method is depending on the coverage value of its prediction. That is, the higher the coverage value, the higher the accuracy. To quantify if this is indeed the case, we show in
<xref ref-type="fig" rid="viruses-08-00116-f004">Figure 4</xref>
the accuracy on the evaluation set at different intervals of the coverage value. No hit appeared to have range 0.8 < coverage ≤ 0.9 for species. For species as well as genus level, it can be seen that predictions based on a coverage value below 0.1 are only correct for 47% (species) and 63% (genus) of the phages. At the other end of the scale, predictions based on a coverage value above 0.7 (species) and 0.8 (genus) are correct in all instances.</p>
<p>Assembly of metagenomic samples often do not results in entire phage genomes. To assess how the completeness of a phage genome affects HostPhinder performance, we ran the tool on the evaluation set where each genome was gradually reduced by 10%, 20%, ... ,90% of its total length.
<xref ref-type="fig" rid="viruses-08-00116-f005">Figure 5</xref>
shows the accuracy and the number of predictions for each percentage of genome length. HostPhinder maintained the prediction accuracy but made gradually fewer predictions as the fraction of genome given as query is decreased.</p>
<p>Generally, HostPhinder returned predictions at 10% genome length for those genomes which prediction at complete genome length had a higher coverage. The average coverage for predictions made at complete genome length but not at 10% genome length was 0.023, while the average coverage for commonly predicted was 0.36.</p>
<p>We next examined if HostPhinder always correctly predicted particular host species or genera (
<xref ref-type="table" rid="viruses-08-00116-t003">Table 3</xref>
). Only hosts occurring at least 3 times in the
<inline-formula>
<mml:math id="mm89">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set are listed. All phages in the
<inline-formula>
<mml:math id="mm90">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set that target these hosts listed in
<xref ref-type="table" rid="viruses-08-00116-t003">Table 3</xref>
were correctly predicted. Additionally, none of these hosts were erroneously predicted as targets of other phages.</p>
<p>HostPhinder also worked effectively for predicting the host of phages, which according to the initial clustering were of different types; in fact in the HostPhinder dataset there are 14 different types of
<italic>Enterococcus faecalis</italic>
phages, 13 types of
<italic>Listeria monocytogenes</italic>
phages and 21 types of
<italic>Vibrio cholerae</italic>
phages and all phages known to infect these host have been correctly predicted, see
<xref ref-type="table" rid="viruses-08-00116-t003">Table 3</xref>
.</p>
<p>
<xref ref-type="fig" rid="viruses-08-00116-f006">Figure 6</xref>
and
<xref ref-type="fig" rid="viruses-08-00116-f007">Figure 7</xref>
show right and wrong predictions for species and genera respectively. To ease comprehension of the plots, hosts were grouped by phyla, which are displayed on the left side of the figures. Rows are alternatively shaded and column names are enhanced with the same colour of the phylum of belonging. The heatmaps are read from right to left and then downwards; expressely, the phage related to the host identified by the row name, on the right, was predicted (red intensity of the cell) to infect the host identified by the column name in the lower part of the figure. As an example,
<italic>Alteromonas macleodii</italic>
phages, the row encompassed in a blue horizontal box in
<xref ref-type="fig" rid="viruses-08-00116-f006">Figure 6</xref>
, occurred four times in the
<inline-formula>
<mml:math id="mm93">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set, as indicated by the number within parenthesis beside the host name, and all of them were wrongly predicted to be
<italic>S. aureus</italic>
phages (vertical blue box) as indicated by the intense red colour of the square in the intersection between the two blue boxes; of note, there were 69
<italic>S. aureus</italic>
phages in the
<inline-formula>
<mml:math id="mm94">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test,species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
data set and no
<italic>Alteromonas macleodii</italic>
phages.</p>
<p>At species level, phages with mispredicted hosts are often predicted to target a host of the same genus as the annotated host (see small deviations from the diagonal in
<xref ref-type="fig" rid="viruses-08-00116-f006">Figure 6</xref>
). As examples, the 3 phages annotated to target
<italic>Bacillus subtilis</italic>
are predicted to target either
<italic>B. subtilis</italic>
or
<italic>Bacillus cereus</italic>
. For some phages the mispredicted host is, however, of an entirely different genus, e.g., the phage annotated to target
<italic>Yersinia enterocolitica</italic>
and the phage annotated to target
<italic>Yersinia pestis</italic>
are both predicted to target
<italic>E. coli</italic>
. For species as well as genera there is a tendency that phages with mispredicted hosts are predicted to target the most frequent hosts in the
<inline-formula>
<mml:math id="mm101">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set, e.g.,
<italic>E. coli</italic>
and
<italic>Mycobacterium smegmatis</italic>
on species level and
<italic>Escherichia</italic>
and
<italic>Mycobacterium</italic>
on genus level. What is important to note is that inaccurate predictions were finding related hosts. For example, imprecise predictions of phages infecting
<italic>Proteobacteria</italic>
(the ones within the brown region) were still falling within the phylum of
<italic>Proteobacteria</italic>
. This indicates a relatedness in terms of genome sequence among phages infecting different hosts belonging to the same phylum.</p>
</sec>
<sec id="sec3dot3-viruses-08-00116">
<title>3.3. Comparing HostPhinder to BLAST</title>
<p>Next, the HostPhinder performance on
<inline-formula>
<mml:math id="mm102">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
was compared to BLAST.
<xref ref-type="table" rid="viruses-08-00116-t004">Table 4</xref>
summarises the results.</p>
<p>HostPhinder was able to make host predictions for more phages than the BLAST-based method. For the phages that both methods were able to make a prediction for, HostPhinder outperformed BLAST on both genus and species level. The observed better performance of HostPhinder on species level is significant (
<italic>p</italic>
< 0.05). HostPhinder correctly predicted 25% among 24 (genera) and 10% among 20 (species) predictions not covered by BLAST. Moreover when inferring the host genus of a phage for which HostPhinder gave no prediction, BLAST match to the most closely related phage resulted in the wrong prediction.</p>
</sec>
<sec id="sec3dot4-viruses-08-00116">
<title>3.4. HostPhinder’s Performance on Predicted Prophages and Establishment of Confidence Threshold</title>
<p>To further evaluate the performance of HostPhinder and to establish a confidence threshold for the predictive value, we examined if HostPhinder was able to identify the bacterial hosts of predicted prophages on the premise that prophages are phages that have at one point infected the host that they are currently found in. The predicted prophages provide a dataset diverse enough to define a reliability threshold that can be generalized and applied to previously unseen data. For this purpose, we predicted prophages in 2679 bacterial genomes using PhiSpy [
<xref rid="B52-viruses-08-00116" ref-type="bibr">52</xref>
]. Without any threshold value set, HostPhinder was able to correctly predict approximately 45% and 47% of the species and genus respectively. The accuracy was calculated over the number of phages that HostPhinder was able to make a prediction for.</p>
<p>As for
<inline-formula>
<mml:math id="mm106">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, the results on PhiSpy predicted prophages were binned into coverage ranges (
<xref ref-type="fig" rid="viruses-08-00116-f008">Figure 8</xref>
, upper panels). The accuracy pattern for prophages generally resembled the one for the evaluation set,
<italic>i.e.</italic>
, it had low accuracy for coverage ≤1, and 100% accuracy above a certain threshold, which in this case is 0.8 for species. There is an unexpected drop in accuracy for coverage values >0.9 (genus), which a bootstrap analysis proved non significant (
<italic>p</italic>
> 0.05). To further cofirm the thresholds, we ran HostPhinder on 261 manually verified prophages, downloaded from PhAnToMe.org, which resulted in 63.57 % ± 0.356 % and 78.69 % ± 0.262 % prediction accuracy of species and genus respectively. Accuracy distribution for this dataset among different coverage ranges can be seen in
<xref ref-type="fig" rid="viruses-08-00116-f008">Figure 8</xref>
, lower panels. Based on observations
<inline-formula>
<mml:math id="mm107">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and on prophages, HostPhinder considers trustable results with coverage value higher than 0.1, and it applies a conservative threshold of 0.8 to distinguish highly trustable results.</p>
</sec>
<sec id="sec3dot5-viruses-08-00116">
<title>3.5. Host Analysis of Phages from Therapeutic Phage Cocktail from the Georgian George Eliava Institute</title>
<p>In a recent study, we examined the content of an INTESTI bacteriophage cocktail from the Georgian George Eliava Institute. According to the packing, the cocktail is effective against
<italic>Staphylococcus</italic>
,
<italic>Enterococcus</italic>
,
<italic>Proteus</italic>
,
<italic>Shigella</italic>
,
<italic>Salmonella</italic>
,
<italic>Escherichia coli</italic>
, and
<italic>Pseudomonas aeruginosa</italic>
infections [
<xref rid="B38-viruses-08-00116" ref-type="bibr">38</xref>
]. A total of 19 phage draft genomes were identified that were hypothesized to represent close to complete phage genomes. An additional set of four sequences represented fragments of phage genomes. Here, we used HostPhinder in an attempt to predict host genera and species of these phage draft genomes and fragments.
<xref ref-type="table" rid="viruses-08-00116-t005">Table 5</xref>
provides an overview.</p>
<p>For six of the seven bacterial targets of the cocktail, HostPhinder predicted at least one phage targeting this type of bacteria. The only bacterium that was not predicted among the hosts was
<italic>Proteus</italic>
. Instead, the phage that was experimentally found to infect
<italic>Proteus</italic>
[
<xref rid="B38-viruses-08-00116" ref-type="bibr">38</xref>
], was predicted as an
<italic>E. coli</italic>
phage with a coverage of 0.0026. This is not surprising, as the HostPhinder database contains no examples of
<italic>Proteus</italic>
phages. A
<italic>Sodalis glossinidius</italic>
was predicted, not corresponding to any of the anticipated targets. This bacterium is an endosymbiont of the tsetse fly [
<xref rid="B50-viruses-08-00116" ref-type="bibr">50</xref>
] and its prediction was based on a coverage value of 0.43, where predictions with coverages above 0.2 have approximately 80% chance of being correct (see
<xref ref-type="fig" rid="viruses-08-00116-f004">Figure 4</xref>
and
<xref ref-type="fig" rid="viruses-08-00116-f008">Figure 8</xref>
). The predicted hosts of the 4 phage fragments were generally based on a lower coverage than the 19 phage draft genomes, indicating that these predictions are less certain.</p>
</sec>
</sec>
<sec id="sec4-viruses-08-00116">
<title>4. Discussion</title>
<p>In the present study, we developed a fast and simple method for prediction of phage hosts. Other studies have previously focused on the identification of phage-host pairs. Experimental methods examining phage-host interactions include mining viral signals from SAG (single amplified genomes) datasets; microfluidic digital PCR and phageFISH [
<xref rid="B55-viruses-08-00116" ref-type="bibr">55</xref>
]. Recently, M. Martínez-García
<italic>et al.</italic>
combined single-cell genomics and microarrays technology to assign viruses to hosts depending on hybridization allowing for discovery of new virus-host pairs directly on a metagenomic samples without requiring cultivation or relying on genomic information [
<xref rid="B56-viruses-08-00116" ref-type="bibr">56</xref>
]. In another study, Roux
<italic>et al.</italic>
developed a bioinformatics tool VirSorter [
<xref rid="B57-viruses-08-00116" ref-type="bibr">57</xref>
], which was able to identify more than 12,000 virus-host linkages from publicly available bacterial and archeal genomes. In their study they analysed the virus-host adaptation in compositions in terms of mono- di- tri- tetra-nucleotide frequency and codon usage [
<xref rid="B58-viruses-08-00116" ref-type="bibr">58</xref>
] showing the strongest signal of adaptation to host genome given by tetranucleotide frequency (TNF). A further classification method for phage host prediction, MGTAXA was developed by Williamson
<italic>et al.</italic>
in their metagenomic study of the marine microbe in the Indian Ocean [
<xref rid="B59-viruses-08-00116" ref-type="bibr">59</xref>
]. MGTAXA links viral sequences to the highest scoring host taxonomic model based on polynucleotide genome composition similarity between phage and bacterial genomes. The software is not conveniently available anymore (as of December 2015) and we therefore could not compare its performance to HostPhinder’s. Finally, a recent publication by Edwards
<italic>et al.</italic>
reviewed the predictive power of several computational tools for predicting the host of a given phage based on genome information [
<xref rid="B60-viruses-08-00116" ref-type="bibr">60</xref>
]. The authors highlighted the importance of such tools for the characterization of uncoltured virus from metagenomes, and found that homology-based approaches had the strongest signals for predicting phage-host interactions.</p>
<p>HostPhinder bases its predictions on co-occurring k-mers between the query phage genome and the genomes of reference phages with known hosts. Kmer-based approaches have recently been implemented for genome assembly [
<xref rid="B61-viruses-08-00116" ref-type="bibr">61</xref>
], fast classification [
<xref rid="B62-viruses-08-00116" ref-type="bibr">62</xref>
,
<xref rid="B63-viruses-08-00116" ref-type="bibr">63</xref>
] and annotation [
<xref rid="B64-viruses-08-00116" ref-type="bibr">64</xref>
] of metagenomes. Considering the highly mosaic structure of phage genomes, one of the advantages of using k-mers for phage host predictions is that the exact order of genetic elements does not influence the outcome, only their presence or absence.</p>
<p>On an independent evaluation set, HostPhinder was found to perform well, when predicting the hosts of phages currently found in public databases. A remarkable 74% accuracy for the host species and 81% for the host genus were obtained. Some hosts were consistently easier to predict than others. This was for example the case for
<italic>P. acnes</italic>
, where the host of all annotated
<italic>P. acnes</italic>
phages in the evaluation set were correctly predicted, while no non-
<italic>P. acnes</italic>
phages were erroneously predicted as such. The observation is in concordance with previous studies showing that
<italic>P. acnes</italic>
phages constitute a homogenous group, sharing 85% nucleotide sequence and having similar genome length [
<xref rid="B65-viruses-08-00116" ref-type="bibr">65</xref>
,
<xref rid="B66-viruses-08-00116" ref-type="bibr">66</xref>
]. Furthermore the examined
<italic>P. acnes</italic>
phages were not able to infect other members of the
<italic>Propionibacterium</italic>
genus [
<xref rid="B65-viruses-08-00116" ref-type="bibr">65</xref>
,
<xref rid="B67-viruses-08-00116" ref-type="bibr">67</xref>
]. For many of the mispredicted hosts of HostPhinder, the genus of the annotated and predicted host was the same, which might be considered concurrent with the ability of some phages to infect more than one species within a genus. Examples of such broad host range phages are
<italic>Salmonella</italic>
Phage Felix O1 [
<xref rid="B68-viruses-08-00116" ref-type="bibr">68</xref>
], Mycobacteriophage D29 [
<xref rid="B69-viruses-08-00116" ref-type="bibr">69</xref>
] and
<italic>Yersinia</italic>
Phage PY100 [
<xref rid="B70-viruses-08-00116" ref-type="bibr">70</xref>
]. It is hence possible that the mispredicted phages are polyvalent,
<italic>i.e.</italic>
, capable of infecting more than one bacterial species. Alternatively they may represent actual misprediction by HostPhinder caused by closely related phages targeting different host species. In some cases, the host predicted by HostPhinder did not even belong to the same genus as the annotated host, e.g., the three
<italic>Yersinia</italic>
phages were all predicted to infect
<italic>Escherichia</italic>
with coverage values that indicate a reliable result, namely 0.57, 0.6 and 0.13. Indeed the genome sequence of the
<italic>Y. pestis</italic>
phage phiA1122 has been found to be closely related to coliphage T7, sharing 89% nucleotide identity [
<xref rid="B71-viruses-08-00116" ref-type="bibr">71</xref>
]. Despite this high nucleotide identity, PhiA1122 is not able to infect
<italic>E. coli</italic>
, and has even been used by the Center for Disease Control and Prevention of the United States as a diagnostic agent to identify
<italic>Y. pestis</italic>
[
<xref rid="B72-viruses-08-00116" ref-type="bibr">72</xref>
].</p>
<p>When applying HostPhinder to phage draft genomes and fragments from the INTESTI phage cocktail, the predicted hosts corresponded well with the advertised targets of the cocktail. One phage draft genome was, however, predicted to target
<italic>Sodalis glossinidius</italic>
, an endosymbiont of the tsetse fly. Excluding the remote possibility that phages targeting this bacterium has been added to the cocktail, it is likely that the HostPhinder prediction is incorrect or that the phage is able to infect
<italic>S. glossinidius</italic>
as well as one of the targets of the cocktail. A study by Ho-Won and Kyoung-Ho Kim has shown close relation in comparative genomic and phylogenetic analyses between EP23, a phage that infects
<italic>E. coli</italic>
and
<italic>Shigella sonnei</italic>
and, SO-1, which infects
<italic>S. glossinidius</italic>
[
<xref rid="B73-viruses-08-00116" ref-type="bibr">73</xref>
]. It was, however, not examined if the phages were able to cross-infect the hosts.</p>
<p>Many phages have a very narrow host range and only target specific strains within a particular species. This feature has been used extensively previously, when typing, e.g.,
<italic>S. enterica</italic>
[
<xref rid="B74-viruses-08-00116" ref-type="bibr">74</xref>
] and
<italic>S. aureus</italic>
[
<xref rid="B75-viruses-08-00116" ref-type="bibr">75</xref>
]. HostPhinder is not able to perform predictions beyond species level, partly due to the hosts of most phages in the public databases not being annotated beyond this. Further, to perform predictions down to specific strains of bacteria more factors than the mere genome resemblance would likely have to be taken into account, e.g., by examining the receptor binding proteins, identifying the number of restriction sites in the phage genomes or analysing the CRISPR regions of the host genome.</p>
<p>Another limitation to the performance of HostPhinder is the accuracy of the breadth of annotated host(s) of the references phages. Most of the reference phages had only one annotated host, although many examples exist of phages that are able to infect closely or even distantly related bacteria [
<xref rid="B76-viruses-08-00116" ref-type="bibr">76</xref>
,
<xref rid="B77-viruses-08-00116" ref-type="bibr">77</xref>
,
<xref rid="B78-viruses-08-00116" ref-type="bibr">78</xref>
]. Further, the performance of HostPhinder depends on the size and completeness of the underlying database. As an example, at the time of compiling the database for this study, no
<italic>Proteus</italic>
phage genomes were available in public databases. Hence it is inherently impossible for the HostPhinder method to predict any query phage as a
<italic>Proteus</italic>
phage. Indeed, HostPhinder predicted an experimentally identified
<italic>Proteus</italic>
phage from the INTESTI phage cocktail as an
<italic>E. coli</italic>
phage, albeit based on a coverage value of 0.003 indicating that the prediction was not reliable. Carson
<italic>et al.</italic>
demonstrated the capability of a coli-proteus phage isolated from a Russian cocktail of equally eradicating
<italic>E. coli</italic>
and
<italic>Proteus mirabilis</italic>
biofilms [
<xref rid="B79-viruses-08-00116" ref-type="bibr">79</xref>
], evincing the potential of some phages to infect both species. As more phage genomes become available, we will update HostPhinder database to ensure its continued high performance.</p>
<p>Despite the limitations in HostPhinder, we envision that the tool will be useful for narrowing down the list of potential hosts. With the growing availability of metagenome samples, new approaches are necessary to firstly identify phages and secondly, determine their host. Thanks to its capability of promptly identifying potential phage-host interactions, the HostPhinder tool has potential applications in ecology, human gut microbiocenosis studies, and other viral metagenomics analyses, where there is need to shed light on the nature of phages.</p>
<p>The current of HostPhinder is very simple, only taking into account genomic information about the phage. Further development of the tool will expand this, taking the genome of the host into account, which we expect will enable us to make predictions beyond host species level.</p>
</sec>
<sec id="sec5-viruses-08-00116">
<title>5. Conclusions</title>
<p>The current antibiotics resistance crisis warrants new ways to combat bacterial infections. For decades, phage therapy has been used for this purpose in countries belonging to the former Eastern Bloc, and to ensure transfer of the technology to the West, it is important to establish a pool of well-characterized phages. The presented HostPhinder method provides the phage community with an easy-to-use tool for predicting the host genus and species of query phages, usable when searching for phages with appropriate host specificity and for correlating phages and hosts in ecological and metagenomic studies. HostPhinder is freely available as a web server [
<xref rid="B1-viruses-08-00116" ref-type="bibr">1</xref>
] and as a Docker image [
<xref rid="B2-viruses-08-00116" ref-type="bibr">2</xref>
].</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>This work was supported by the Center for Genomic Epidemiology at the Technical University of Denmark and funded by grant 09-067103/DSF from the Danish Council for Strategic Research, grant nr. 14-3056 from Oticon Fonden and grant 14-70-0955 from Otto Moensted Fonden.</p>
</ack>
<app-group>
<app>
<title>Supplementary Files</title>
<supplementary-material content-type="local-data" id="viruses-08-00116-s001">
<label>Supplementary File 1</label>
<media xlink:href="viruses-08-00116-s001.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</app>
</app-group>
<notes>
<title>Author Contributions</title>
<p>Mette Voldby Larsen conceived the method; Ole Lund wrote the script to read the k-mers; Kortine Annina Kleinheinz developed the preliminary version of the method; Morten Nielsen designed the method optimization; Julia Villarroel downloaded whole genome sequence data, performed method optimization, analysed the data and finalized the method; Henrike Zschach predicted the prophages; Vanessa Isabell Jurtz designed the Hobohm experiments and set up HostPhinder web server; Julia Villarroel built HostPhinder Docker image; Julia Villarroel, Mette Voldby Larsen and Morten Nielsen wrote the paper. All authors contributed in reviewing the paper.</p>
</notes>
<notes>
<title>Conflicts of Interest</title>
<p>The authors declare no conflict of interest.</p>
</notes>
<ref-list>
<title>References</title>
<ref id="B1-viruses-08-00116">
<label>1.</label>
<element-citation publication-type="webpage">
<article-title>HostPhinder web service</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://cge.cbs.dtu.dk/services/HostPhinder">http://cge.cbs.dtu.dk/services/HostPhinder</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B2-viruses-08-00116">
<label>2.</label>
<element-citation publication-type="webpage">
<article-title>HostPhinder Docker image</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="https://registry.hub.docker.com/u/julvi/hostphinder">https://registry.hub.docker.com/u/julvi/hostphinder</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B3-viruses-08-00116">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kapi</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>The evolving threat of antimicrobial resistance: Options for action</article-title>
<source>Indian J. Med. Res.</source>
<year>2014</year>
<volume>139</volume>
<fpage>182</fpage>
<lpage>183</lpage>
</element-citation>
</ref>
<ref id="B4-viruses-08-00116">
<label>4.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<collab>WHO</collab>
</person-group>
<source>Antimicrobial Resistance: Global Report on Surveillance</source>
<publisher-name>World Health Organization</publisher-name>
<publisher-loc>Geneva, Switzerland</publisher-loc>
<year>2014</year>
</element-citation>
</ref>
<ref id="B5-viruses-08-00116">
<label>5.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harper</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Enright</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Phage therapy: Delivering on the promise</article-title>
<source>Ther. Deliv.</source>
<year>2011</year>
<volume>2</volume>
<fpage>935</fpage>
<lpage>947</lpage>
<pub-id pub-id-type="doi">10.4155/tde.11.64</pub-id>
<pub-id pub-id-type="pmid">22833904</pub-id>
</element-citation>
</ref>
<ref id="B6-viruses-08-00116">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kutateladze</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Adamia</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Bacteriophages as potential new therapeutics to replace or supplement antibiotics</article-title>
<source>Trends Biotechnol.</source>
<year>2010</year>
<volume>28</volume>
<fpage>591</fpage>
<lpage>595</lpage>
<pub-id pub-id-type="doi">10.1016/j.tibtech.2010.08.001</pub-id>
<pub-id pub-id-type="pmid">20810181</pub-id>
</element-citation>
</ref>
<ref id="B7-viruses-08-00116">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kutateladze</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Adamia</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Phage therapy experience at the Eliava Institute</article-title>
<source>Méd. Mal. Infect.</source>
<year>2008</year>
<volume>38</volume>
<fpage>426</fpage>
<lpage>430</lpage>
<pub-id pub-id-type="doi">10.1016/j.medmal.2008.06.023</pub-id>
<pub-id pub-id-type="pmid">18687542</pub-id>
</element-citation>
</ref>
<ref id="B8-viruses-08-00116">
<label>8.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miedzybrodzki</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Borysowski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Weber-Dabrowska</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Fortuna</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Letkiewicz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Szufnarowski</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Pawelczyk</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Rogóz</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Klak</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wojtasik</surname>
<given-names>E.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Charpter 3—Clinical aspects of phage therapy</article-title>
<source>Adv. Virus Res.</source>
<year>2012</year>
<volume>83</volume>
<fpage>73</fpage>
<lpage>121</lpage>
<pub-id pub-id-type="pmid">22748809</pub-id>
</element-citation>
</ref>
<ref id="B9-viruses-08-00116">
<label>9.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Weber-Dąbrowska</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Mulczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Górski</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Bacteriophage therapy of bacterial infections: An update of our institute’s experience</article-title>
<source>Inflammation</source>
<publisher-name>Springer</publisher-name>
<publisher-loc>Netherlands</publisher-loc>
<year>2001</year>
<fpage>201</fpage>
<lpage>209</lpage>
</element-citation>
</ref>
<ref id="B10-viruses-08-00116">
<label>10.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Biswas</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Adhya</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Washart</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Paul</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Trostel</surname>
<given-names>A.N.</given-names>
</name>
<name>
<surname>Powell</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Carlton</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Merril</surname>
<given-names>C.R.</given-names>
</name>
</person-group>
<article-title>Bacteriophage therapy rescues mice bacteremic from a clinical isolate of vancomycin-resistant
<italic>Enterococcus faecium</italic>
</article-title>
<source>Infect. Immun.</source>
<year>2002</year>
<volume>70</volume>
<fpage>204</fpage>
<lpage>210</lpage>
<pub-id pub-id-type="doi">10.1128/IAI.70.1.204-210.2002</pub-id>
<pub-id pub-id-type="pmid">11748184</pub-id>
</element-citation>
</ref>
<ref id="B11-viruses-08-00116">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Capparelli</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Parlato</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Borriello</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Salvatore</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Iannelli</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>Experimental phage therapy against
<italic>Staphylococcus aureus</italic>
in mice</article-title>
<source>Antimicrob. Agents Chemother.</source>
<year>2007</year>
<volume>51</volume>
<fpage>2765</fpage>
<lpage>2773</lpage>
<pub-id pub-id-type="doi">10.1128/AAC.01513-06</pub-id>
<pub-id pub-id-type="pmid">17517843</pub-id>
</element-citation>
</ref>
<ref id="B12-viruses-08-00116">
<label>12.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>H.W.</given-names>
</name>
<name>
<surname>Huggins</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Successful treatment of experimental
<italic>Escherichia coli</italic>
infections in mice using phage: Its general superiority over antibiotics</article-title>
<source>J. Gen. Microbiol.</source>
<year>1982</year>
<volume>128</volume>
<fpage>307</fpage>
<lpage>318</lpage>
<pub-id pub-id-type="doi">10.1099/00221287-128-2-307</pub-id>
<pub-id pub-id-type="pmid">7042903</pub-id>
</element-citation>
</ref>
<ref id="B13-viruses-08-00116">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wright</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hawkins</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Änggård</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Harper</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>A controlled clinical trial of a therapeutic bacteriophage preparation in chronic otitis due to antibiotic-resistant
<italic>Pseudomonas aeruginosa</italic>
; A preliminary report of efficacy</article-title>
<source>Clin. Otolaryngol.</source>
<year>2009</year>
<volume>34</volume>
<fpage>349</fpage>
<lpage>357</lpage>
<pub-id pub-id-type="doi">10.1111/j.1749-4486.2009.01973.x</pub-id>
<pub-id pub-id-type="pmid">19673983</pub-id>
</element-citation>
</ref>
<ref id="B14-viruses-08-00116">
<label>14.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matsuzaki</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Uchiyama</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Takemura-Uchiyama</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Daibata</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Perspective: The age of the phage</article-title>
<source>Nature</source>
<year>2014</year>
<volume>509</volume>
<pub-id pub-id-type="doi">10.1038/509S9a</pub-id>
<pub-id pub-id-type="pmid">24784429</pub-id>
</element-citation>
</ref>
<ref id="B15-viruses-08-00116">
<label>15.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reardon</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Phage therapy gets revitalized</article-title>
<source>Nature</source>
<year>2014</year>
<volume>510</volume>
<pub-id pub-id-type="doi">10.1038/510015a</pub-id>
<pub-id pub-id-type="pmid">24899282</pub-id>
</element-citation>
</ref>
<ref id="B16-viruses-08-00116">
<label>16.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sulakvelidze</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Using lytic bacteriophages to eliminate or significantly reduce contamination of food by foodborne bacterial pathogens</article-title>
<source>J. Sci. Food Agric.</source>
<year>2013</year>
<volume>93</volume>
<fpage>3137</fpage>
<lpage>3146</lpage>
<pub-id pub-id-type="doi">10.1002/jsfa.6222</pub-id>
<pub-id pub-id-type="pmid">23670852</pub-id>
</element-citation>
</ref>
<ref id="B17-viruses-08-00116">
<label>17.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guenther</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Huwyler</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Richard</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Loessner</surname>
<given-names>M.J.</given-names>
</name>
</person-group>
<article-title>Virulent bacteriophage for efficient biocontrol of
<italic>Listeria monocytogenes</italic>
in ready-to-eat foods</article-title>
<source>Appl. Environ. Microbiol.</source>
<year>2009</year>
<volume>75</volume>
<fpage>93</fpage>
<lpage>100</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.01711-08</pub-id>
<pub-id pub-id-type="pmid">19011076</pub-id>
</element-citation>
</ref>
<ref id="B18-viruses-08-00116">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carrillo</surname>
<given-names>C.L.</given-names>
</name>
<name>
<surname>Atterbury</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>El-Shibiny</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Connerton</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Dillon</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Scott</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Connerton</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>Bacteriophage therapy to reduce
<italic>Campylobacter jejuni</italic>
colonization of broiler chickens</article-title>
<source>Appl. Environ. Microbiol.</source>
<year>2005</year>
<volume>71</volume>
<fpage>6554</fpage>
<lpage>6563</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.71.11.6554-6563.2005</pub-id>
<pub-id pub-id-type="pmid">16269681</pub-id>
</element-citation>
</ref>
<ref id="B19-viruses-08-00116">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McLean</surname>
<given-names>S.K.</given-names>
</name>
<name>
<surname>Dunn</surname>
<given-names>L.A.</given-names>
</name>
<name>
<surname>Palombo</surname>
<given-names>E.A.</given-names>
</name>
</person-group>
<article-title>Phage inhibition of
<italic>Escherichia coli</italic>
in ultrahigh-temperature-treated and raw milk</article-title>
<source>Foodborne Pathog. Dis.</source>
<year>2013</year>
<volume>10</volume>
<fpage>956</fpage>
<lpage>962</lpage>
<pub-id pub-id-type="doi">10.1089/fpd.2012.1473</pub-id>
<pub-id pub-id-type="pmid">23909774</pub-id>
</element-citation>
</ref>
<ref id="B20-viruses-08-00116">
<label>20.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stern</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sorek</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>The phage-host arms race: Shaping the evolution of microbes</article-title>
<source>Bioessays</source>
<year>2011</year>
<volume>33</volume>
<fpage>43</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="doi">10.1002/bies.201000071</pub-id>
<pub-id pub-id-type="pmid">20979102</pub-id>
</element-citation>
</ref>
<ref id="B21-viruses-08-00116">
<label>21.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Deveau</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Garneau</surname>
<given-names>J.E.</given-names>
</name>
<name>
<surname>Moineau</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>CRISPR/Cas system and its role in phage-bacteria interactions</article-title>
<source>Annu. Rev. Microbiol.</source>
<year>2010</year>
<volume>64</volume>
<fpage>475</fpage>
<lpage>493</lpage>
<pub-id pub-id-type="doi">10.1146/annurev.micro.112408.134123</pub-id>
<pub-id pub-id-type="pmid">20528693</pub-id>
</element-citation>
</ref>
<ref id="B22-viruses-08-00116">
<label>22.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fineran</surname>
<given-names>P.C.</given-names>
</name>
<name>
<surname>Blower</surname>
<given-names>T.R.</given-names>
</name>
<name>
<surname>Foulds</surname>
<given-names>I.J.</given-names>
</name>
<name>
<surname>Humphreys</surname>
<given-names>D.P.</given-names>
</name>
<name>
<surname>Lilley</surname>
<given-names>K.S.</given-names>
</name>
<name>
<surname>Salmond</surname>
<given-names>G.P.</given-names>
</name>
</person-group>
<article-title>The phage abortive infection system, ToxIN, functions as a protein-RNA toxin-antitoxin pair</article-title>
<source>Proc. Natl. Acad. Sci. USA</source>
<year>2009</year>
<volume>106</volume>
<fpage>894</fpage>
<lpage>899</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0808832106</pub-id>
<pub-id pub-id-type="pmid">19124776</pub-id>
</element-citation>
</ref>
<ref id="B23-viruses-08-00116">
<label>23.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carbone</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Codon bias is a major factor explaining phage evolution in translationally biased hosts</article-title>
<source>J. Mol. Evol.</source>
<year>2008</year>
<volume>66</volume>
<fpage>210</fpage>
<lpage>223</lpage>
<pub-id pub-id-type="doi">10.1007/s00239-008-9068-6</pub-id>
<pub-id pub-id-type="pmid">18286220</pub-id>
</element-citation>
</ref>
<ref id="B24-viruses-08-00116">
<label>24.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blower</surname>
<given-names>T.R.</given-names>
</name>
<name>
<surname>Evans</surname>
<given-names>T.J.</given-names>
</name>
<name>
<surname>Przybilski</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Fineran</surname>
<given-names>P.C.</given-names>
</name>
<name>
<surname>Salmond</surname>
<given-names>G.P.</given-names>
</name>
</person-group>
<article-title>Viral evasion of a bacterial suicide system by RNA-based molecular mimicry enables infectious altruism</article-title>
<source>PLoS Genet.</source>
<year>2012</year>
<volume>8</volume>
<elocation-id>116</elocation-id>
<pub-id pub-id-type="doi">10.1371/journal.pgen.1003023</pub-id>
<pub-id pub-id-type="pmid">23109916</pub-id>
</element-citation>
</ref>
<ref id="B25-viruses-08-00116">
<label>25.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Labrie</surname>
<given-names>S.J.</given-names>
</name>
<name>
<surname>Samson</surname>
<given-names>J.E.</given-names>
</name>
<name>
<surname>Moineau</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Bacteriophage resistance mechanisms</article-title>
<source>Nat. Rev. Microbiol.</source>
<year>2010</year>
<volume>8</volume>
<fpage>317</fpage>
<lpage>327</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro2315</pub-id>
<pub-id pub-id-type="pmid">20348932</pub-id>
</element-citation>
</ref>
<ref id="B26-viruses-08-00116">
<label>26.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weitz</surname>
<given-names>J.S.</given-names>
</name>
<name>
<surname>Hartman</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Levin</surname>
<given-names>S.A.</given-names>
</name>
</person-group>
<article-title>Coevolutionary arms races between bacteria and bacteriophage</article-title>
<source>Proc. Natl. Acad. Sci. USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>9535</fpage>
<lpage>9540</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0504062102</pub-id>
<pub-id pub-id-type="pmid">15976021</pub-id>
</element-citation>
</ref>
<ref id="B27-viruses-08-00116">
<label>27.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duffy</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>P.E.</given-names>
</name>
<name>
<surname>Burch</surname>
<given-names>C.L.</given-names>
</name>
</person-group>
<article-title>Pleiotropic costs of niche expansion in the RNA bacteriophage Φ6</article-title>
<source>Genetics</source>
<year>2006</year>
<volume>172</volume>
<fpage>751</fpage>
<lpage>757</lpage>
<pub-id pub-id-type="doi">10.1534/genetics.105.051136</pub-id>
<pub-id pub-id-type="pmid">16299384</pub-id>
</element-citation>
</ref>
<ref id="B28-viruses-08-00116">
<label>28.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amarillas</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Cháidez-Quiroz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sañudo-Barajas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>León-Félix</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Complete genome sequence of a polyvalent bacteriophage, phiKP26, active on
<italic>Salmonella</italic>
and
<italic>Escherichia coli</italic>
</article-title>
<source>Arch. Virol.</source>
<year>2013</year>
<volume>158</volume>
<fpage>2395</fpage>
<lpage>2398</lpage>
<pub-id pub-id-type="doi">10.1007/s00705-013-1725-4</pub-id>
<pub-id pub-id-type="pmid">23677676</pub-id>
</element-citation>
</ref>
<ref id="B29-viruses-08-00116">
<label>29.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Loessner</surname>
<given-names>M.J.</given-names>
</name>
<name>
<surname>Neugirg</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Zink</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Isolation, classification and molecular characterization of bacteriophages for Enterobacter species</article-title>
<source>J. Gen. Microbiol.</source>
<year>1993</year>
<volume>139</volume>
<fpage>2627</fpage>
<lpage>2633</lpage>
<pub-id pub-id-type="doi">10.1099/00221287-139-11-2627</pub-id>
<pub-id pub-id-type="pmid">8277246</pub-id>
</element-citation>
</ref>
<ref id="B30-viruses-08-00116">
<label>30.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koskella</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Meaden</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Understanding bacteriophage specificity in natural microbial communities</article-title>
<source>Viruses</source>
<year>2013</year>
<volume>5</volume>
<fpage>806</fpage>
<lpage>823</lpage>
<pub-id pub-id-type="doi">10.3390/v5030806</pub-id>
<pub-id pub-id-type="pmid">23478639</pub-id>
</element-citation>
</ref>
<ref id="B31-viruses-08-00116">
<label>31.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Casjens</surname>
<given-names>S.R.</given-names>
</name>
</person-group>
<article-title>Diversity among the tailed-bacteriophages that infect the Enterobacteriaceae</article-title>
<source>Res. Microbiol.</source>
<year>2008</year>
<volume>159</volume>
<fpage>340</fpage>
<lpage>348</lpage>
<pub-id pub-id-type="doi">10.1016/j.resmic.2008.04.005</pub-id>
<pub-id pub-id-type="pmid">18550341</pub-id>
</element-citation>
</ref>
<ref id="B32-viruses-08-00116">
<label>32.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohwer</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>The Phage Proteomic Tree: A genome-based taxonomy for phage</article-title>
<source>J. Bacteriol.</source>
<year>2002</year>
<volume>184</volume>
<fpage>4529</fpage>
<lpage>4535</lpage>
<pub-id pub-id-type="doi">10.1128/JB.184.16.4529-4535.2002</pub-id>
<pub-id pub-id-type="pmid">12142423</pub-id>
</element-citation>
</ref>
<ref id="B33-viruses-08-00116">
<label>33.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jacobs-Sera</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Marinelli</surname>
<given-names>L.J.</given-names>
</name>
<name>
<surname>Bowman</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Broussard</surname>
<given-names>G.W.</given-names>
</name>
<name>
<surname>Bustamante</surname>
<given-names>C.G.</given-names>
</name>
<name>
<surname>Boyle</surname>
<given-names>M.M.</given-names>
</name>
<name>
<surname>Petrova</surname>
<given-names>Z.O.</given-names>
</name>
<name>
<surname>Dedrick</surname>
<given-names>R.M.</given-names>
</name>
<name>
<surname>Pope</surname>
<given-names>W.H.</given-names>
</name>
<name>
<surname>Advancing</surname>
<given-names>S.E.A.P.H.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>On the nature of mycobacteriophage diversity and host preference</article-title>
<source>Virology</source>
<year>2012</year>
<volume>434</volume>
<fpage>187</fpage>
<lpage>201</lpage>
<pub-id pub-id-type="doi">10.1016/j.virol.2012.09.026</pub-id>
<pub-id pub-id-type="pmid">23084079</pub-id>
</element-citation>
</ref>
<ref id="B34-viruses-08-00116">
<label>34.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woese</surname>
<given-names>C.R.</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>G.E.</given-names>
</name>
</person-group>
<article-title>Phylogenetic structure of the prokaryotic domain: The primary kingdoms</article-title>
<source>Proc. Natl. Acad. Sci. USA</source>
<year>1977</year>
<volume>74</volume>
<fpage>5088</fpage>
<lpage>5090</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.74.11.5088</pub-id>
<pub-id pub-id-type="pmid">270744</pub-id>
</element-citation>
</ref>
<ref id="B35-viruses-08-00116">
<label>35.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Larsen</surname>
<given-names>M.V.</given-names>
</name>
<name>
<surname>Cosentino</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lukjancenko</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Saputra</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Rasmussen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hasman</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Sicheritz-Pontén</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Aarestrup</surname>
<given-names>F.M.</given-names>
</name>
<name>
<surname>Ussery</surname>
<given-names>D.W.</given-names>
</name>
<name>
<surname>Lund</surname>
<given-names>O.</given-names>
</name>
</person-group>
<article-title>Benchmarking of methods for genomic taxonomy</article-title>
<source>J. Clin. Microbiol.</source>
<year>2014</year>
<volume>52</volume>
<fpage>1529</fpage>
<lpage>1539</lpage>
<pub-id pub-id-type="doi">10.1128/JCM.02981-13</pub-id>
<pub-id pub-id-type="pmid">24574292</pub-id>
</element-citation>
</ref>
<ref id="B36-viruses-08-00116">
<label>36.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hendrix</surname>
<given-names>R.W.</given-names>
</name>
</person-group>
<article-title>Bacteriophage genomics</article-title>
<source>Curr. Opin. Microbiol.</source>
<year>2003</year>
<volume>6</volume>
<fpage>506</fpage>
<lpage>511</lpage>
<pub-id pub-id-type="doi">10.1016/j.mib.2003.09.004</pub-id>
<pub-id pub-id-type="pmid">14572544</pub-id>
</element-citation>
</ref>
<ref id="B37-viruses-08-00116">
<label>37.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lawrence</surname>
<given-names>J.G.</given-names>
</name>
<name>
<surname>Hatfull</surname>
<given-names>G.F.</given-names>
</name>
<name>
<surname>Hendrix</surname>
<given-names>R.W.</given-names>
</name>
</person-group>
<article-title>Imbroglios of viral taxonomy: Genetic exchange and failings of phenetic approaches</article-title>
<source>J. Bacteriol.</source>
<year>2002</year>
<volume>184</volume>
<fpage>4891</fpage>
<lpage>4905</lpage>
<pub-id pub-id-type="doi">10.1128/JB.184.17.4891-4905.2002</pub-id>
<pub-id pub-id-type="pmid">12169615</pub-id>
</element-citation>
</ref>
<ref id="B38-viruses-08-00116">
<label>38.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zschach</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Joensen</surname>
<given-names>K.G.</given-names>
</name>
<name>
<surname>Lindhard</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lund</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Goderdzishvili</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Chkonia</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Jgenti</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kvatadze</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Alavidze</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Kutter</surname>
<given-names>E.M.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>What can we learn from a metagenomic analysis of a Georgian bacteriophage cocktail?</article-title>
<source>Viruses</source>
<year>2015</year>
<volume>7</volume>
<fpage>6570</fpage>
<lpage>6589</lpage>
<pub-id pub-id-type="doi">10.3390/v7122958</pub-id>
<pub-id pub-id-type="pmid">26703713</pub-id>
</element-citation>
</ref>
<ref id="B39-viruses-08-00116">
<label>39.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nielsen</surname>
<given-names>H.B.</given-names>
</name>
<name>
<surname>Almeida</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Juncker</surname>
<given-names>A.S.</given-names>
</name>
<name>
<surname>Rasmussen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sunagawa</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Plichta</surname>
<given-names>D.R.</given-names>
</name>
<name>
<surname>Gautier</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>A.G.</given-names>
</name>
<name>
<surname>le Chatelier</surname>
<given-names>E.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes</article-title>
<source>Nat. Biotechnol.</source>
<year>2014</year>
<volume>32</volume>
<fpage>822</fpage>
<lpage>828</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2939</pub-id>
<pub-id pub-id-type="pmid">24997787</pub-id>
</element-citation>
</ref>
<ref id="B40-viruses-08-00116">
<label>40.</label>
<element-citation publication-type="webpage">
<article-title>Phages.ids - VBI mirrors page</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/genomes/IDS/Phages.ids">http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/genomes/IDS/Phages.ids</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B41-viruses-08-00116">
<label>41.</label>
<element-citation publication-type="webpage">
<article-title>NCBI viral Genome Resource</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/genomes/GenomesHome.cgi">http://www.ncbi.nlm.nih.gov/genomes/GenomesHome.cgi</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B42-viruses-08-00116">
<label>42.</label>
<element-citation publication-type="webpage">
<article-title>EMBL EBI phage genomes list</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/genomes/phage.html">http://www.ebi.ac.uk/genomes/phage.html</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B43-viruses-08-00116">
<label>43.</label>
<element-citation publication-type="webpage">
<article-title>phagesdb for Mycobacteriophages</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://phagesdb.org/">http://phagesdb.org/</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B44-viruses-08-00116">
<label>44.</label>
<element-citation publication-type="webpage">
<article-title>phagesdb for Arthrobacter</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://arthrobacter.phagesdb.org/">http://arthrobacter.phagesdb.org/</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B45-viruses-08-00116">
<label>45.</label>
<element-citation publication-type="webpage">
<article-title>phagesdb for Bacillus</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://bacillus.phagesdb.org/">http://bacillus.phagesdb.org/</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B46-viruses-08-00116">
<label>46.</label>
<element-citation publication-type="webpage">
<article-title>phagesdb for Streptomyces</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://streptomyces.phagesdb.org/">http://streptomyces.phagesdb.org/</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B47-viruses-08-00116">
<label>47.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Euzéby</surname>
<given-names>J.P.</given-names>
</name>
</person-group>
<article-title>List of Bacterial Names with Standing in Nomenclature: A folder available on the Internet</article-title>
<source>Int. J. Syst. Bacteriol.</source>
<year>1997</year>
<volume>47</volume>
<fpage>590</fpage>
<lpage>592</lpage>
<pub-id pub-id-type="doi">10.1099/00207713-47-2-590</pub-id>
<pub-id pub-id-type="pmid">9103655</pub-id>
</element-citation>
</ref>
<ref id="B48-viruses-08-00116">
<label>48.</label>
<element-citation publication-type="webpage">
<article-title>HostPhinder Github repository</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="https://github.com/julvi/HostPhinder">https://github.com/julvi/HostPhinder</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B49-viruses-08-00116">
<label>49.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hobohm</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Scharf</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>Selection of representative protein data sets</article-title>
<source>Protein Sci.</source>
<year>1992</year>
<volume>1</volume>
<fpage>409</fpage>
<lpage>417</lpage>
<pub-id pub-id-type="doi">10.1002/pro.5560010313</pub-id>
<pub-id pub-id-type="pmid">1304348</pub-id>
</element-citation>
</ref>
<ref id="B50-viruses-08-00116">
<label>50.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Bonferroni</surname>
<given-names>C.E.</given-names>
</name>
</person-group>
<source>Teoria Statistica Delle Classi e Calcolo Delle Probabilita</source>
<publisher-name>Libreria Internazionale Seeber</publisher-name>
<publisher-loc>Firenze, Italy</publisher-loc>
<year>1936</year>
</element-citation>
</ref>
<ref id="B51-viruses-08-00116">
<label>51.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>S.F.</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>E.W.</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>D.J.</given-names>
</name>
</person-group>
<article-title>Basic local alignment search tool</article-title>
<source>J. Mol. Biol.</source>
<year>1990</year>
<volume>215</volume>
<fpage>403</fpage>
<lpage>410</lpage>
<pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id>
<pub-id pub-id-type="pmid">2231712</pub-id>
</element-citation>
</ref>
<ref id="B52-viruses-08-00116">
<label>52.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akhter</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Aziz</surname>
<given-names>R.K.</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>R.A.</given-names>
</name>
</person-group>
<article-title>PhiSpy: A novel algorithm for finding prophages in bacterial genomes that combines similarity-and composition-based strategies</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<pub-id pub-id-type="doi">10.1093/nar/gks406</pub-id>
<pub-id pub-id-type="pmid">22584627</pub-id>
</element-citation>
</ref>
<ref id="B53-viruses-08-00116">
<label>53.</label>
<element-citation publication-type="webpage">
<article-title>NCBI complete bacterial genomes</article-title>
<comment>Available online:
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/">ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B54-viruses-08-00116">
<label>54.</label>
<element-citation publication-type="webpage">
<article-title>Phantome manually verified prophages, dating 14 March 2012</article-title>
<comment>Available online:
<ext-link ext-link-type="uri" xlink:href="http://www.phantome.org/Downloads/Prophages/PhiSpy/Manually_Verified/">http://www.phantome.org/Downloads/Prophages/PhiSpy/Manually_Verified/</ext-link>
</comment>
<date-in-citation>(accessed on 1 April 2016)</date-in-citation>
</element-citation>
</ref>
<ref id="B55-viruses-08-00116">
<label>55.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dang</surname>
<given-names>V.T.</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>M.B.</given-names>
</name>
</person-group>
<article-title>Emerging methods to study bacteriophage infection at the single-cell level</article-title>
<source>Front. Microbiol.</source>
<year>2014</year>
<volume>5</volume>
<pub-id pub-id-type="doi">10.3389/fmicb.2014.00724</pub-id>
<pub-id pub-id-type="pmid">25566233</pub-id>
</element-citation>
</ref>
<ref id="B56-viruses-08-00116">
<label>56.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martínez-García</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Moreno-Paz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Parro</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Antón</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Unveiling viral-host interactions within the ‘microbial dark matter’</article-title>
<source>Nat. Commun.</source>
<year>2014</year>
<volume>5</volume>
<pub-id pub-id-type="doi">10.1038/ncomms5542</pub-id>
<pub-id pub-id-type="pmid">25119473</pub-id>
</element-citation>
</ref>
<ref id="B57-viruses-08-00116">
<label>57.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roux</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Enault</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hurwitz</surname>
<given-names>B.L.</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>M.B.</given-names>
</name>
</person-group>
<article-title>VirSorter: Mining viral signal from microbial genomic data</article-title>
<source>PeerJ</source>
<year>2015</year>
<volume>3</volume>
<pub-id pub-id-type="doi">10.7717/peerj.985</pub-id>
<pub-id pub-id-type="pmid">26038737</pub-id>
</element-citation>
</ref>
<ref id="B58-viruses-08-00116">
<label>58.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roux</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>S.J.</given-names>
</name>
<name>
<surname>Woyke</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>M.B.</given-names>
</name>
</person-group>
<article-title>Viral dark matter and virus-host interactions resolved from publicly available microbial genomes</article-title>
<source>eLife</source>
<year>2015</year>
<volume>4</volume>
<pub-id pub-id-type="doi">10.7554/eLife.08490</pub-id>
<pub-id pub-id-type="pmid">26200428</pub-id>
</element-citation>
</ref>
<ref id="B59-viruses-08-00116">
<label>59.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Williamson</surname>
<given-names>S.J.</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>L.Z.</given-names>
</name>
<name>
<surname>Lorenzi</surname>
<given-names>H.A.</given-names>
</name>
<name>
<surname>Fadrosh</surname>
<given-names>D.W.</given-names>
</name>
<name>
<surname>Brami</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Thiagarajan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>McCrow</surname>
<given-names>J.P.</given-names>
</name>
<name>
<surname>Tovchigrechko</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Yooseph</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Venter</surname>
<given-names>J.C.</given-names>
</name>
</person-group>
<article-title>Metagenomic exploration of viruses throughout the Indian Ocean</article-title>
<source>PLoS ONE</source>
<year>2012</year>
<volume>7</volume>
<elocation-id>116</elocation-id>
</element-citation>
</ref>
<ref id="B60-viruses-08-00116">
<label>60.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>R.A.</given-names>
</name>
<name>
<surname>McNair</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Faust</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Raes</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dutilh</surname>
<given-names>B.E.</given-names>
</name>
</person-group>
<article-title>Computational approaches to predict bacteriophage-host relationships</article-title>
<source>FEMS Microbiol. Rev.</source>
<year>2016</year>
<volume>40</volume>
<fpage>258</fpage>
<lpage>272</lpage>
<pub-id pub-id-type="pmid">26657537</pub-id>
</element-citation>
</ref>
<ref id="B61-viruses-08-00116">
<label>61.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marçais</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>764</fpage>
<lpage>770</lpage>
<pub-id pub-id-type="pmid">21217122</pub-id>
</element-citation>
</ref>
<ref id="B62-viruses-08-00116">
<label>62.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kawulok</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Deorowicz</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>CoMeta: Classification of metagenomes using k-mers</article-title>
<source>PLoS ONE</source>
<year>2015</year>
<volume>10</volume>
<elocation-id>116</elocation-id>
</element-citation>
</ref>
<ref id="B63-viruses-08-00116">
<label>63.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wood</surname>
<given-names>D.E.</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>S.L.</given-names>
</name>
</person-group>
<article-title>Kraken: Ultrafast metagenomic sequence classification using exact alignments</article-title>
<source>Genome Biol.</source>
<year>2014</year>
<volume>15</volume>
<pub-id pub-id-type="doi">10.1186/gb-2014-15-3-r46</pub-id>
</element-citation>
</ref>
<ref id="B64-viruses-08-00116">
<label>64.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edwards</surname>
<given-names>R.A.</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Disz</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Pusch</surname>
<given-names>G.D.</given-names>
</name>
<name>
<surname>Vonstein</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Overbeek</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Real Time Metagenomics: Using k-mers to annotate metagenomes</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>3316</fpage>
<lpage>3317</lpage>
<pub-id pub-id-type="pmid">23047562</pub-id>
</element-citation>
</ref>
<ref id="B65-viruses-08-00116">
<label>65.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marinelli</surname>
<given-names>L.J.</given-names>
</name>
<name>
<surname>Fitz-Gibbon</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bowman</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Inkeles</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Loncaric</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>D.A.</given-names>
</name>
<name>
<surname>Jacobs-Sera</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Cokus</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pellegrini</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>
<italic>Propionibacterium acnes</italic>
bacteriophages display limited genetic diversity and broad killing activity against bacterial skin isolates</article-title>
<source>MBio</source>
<year>2012</year>
<volume>3</volume>
<pub-id pub-id-type="doi">10.1128/mBio.00279-12</pub-id>
</element-citation>
</ref>
<ref id="B66-viruses-08-00116">
<label>66.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Ngo</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bangayan</surname>
<given-names>N.J.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lui</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Erfe</surname>
<given-names>M.C.</given-names>
</name>
<name>
<surname>Craft</surname>
<given-names>N.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The diversity and host interactions of
<italic>Propionibacterium acnes</italic>
bacteriophages on human skin</article-title>
<source>ISME J.</source>
<year>2015</year>
<volume>9</volume>
<fpage>2078</fpage>
<lpage>2093</lpage>
<pub-id pub-id-type="doi">10.1038/ismej.2015.47</pub-id>
<pub-id pub-id-type="pmid">25848871</pub-id>
</element-citation>
</ref>
<ref id="B67-viruses-08-00116">
<label>67.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Farrar</surname>
<given-names>M.D.</given-names>
</name>
<name>
<surname>Howson</surname>
<given-names>K.M.</given-names>
</name>
<name>
<surname>Bojar</surname>
<given-names>R.A.</given-names>
</name>
<name>
<surname>West</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Towler</surname>
<given-names>J.C.</given-names>
</name>
<name>
<surname>Parry</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pelton</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Holland</surname>
<given-names>K.T.</given-names>
</name>
</person-group>
<article-title>Genome sequence and analysis of a
<italic>Propionibacterium acnes</italic>
bacteriophage</article-title>
<source>J. Bacteriol.</source>
<year>2007</year>
<volume>189</volume>
<fpage>4161</fpage>
<lpage>4167</lpage>
<pub-id pub-id-type="doi">10.1128/JB.00106-07</pub-id>
<pub-id pub-id-type="pmid">17400737</pub-id>
</element-citation>
</ref>
<ref id="B68-viruses-08-00116">
<label>68.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuhn</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Suissa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Chiswell</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Azriel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Berman</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Shahar</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Reznick</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sharf</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wyse</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bar-On</surname>
<given-names>T.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A bacteriophage reagent for Salmonella: Molecular studies on Felix 01</article-title>
<source>Int. J. Food Microbiol.</source>
<year>2002</year>
<volume>74</volume>
<fpage>217</fpage>
<lpage>227</lpage>
<pub-id pub-id-type="doi">10.1016/S0168-1605(01)00682-1</pub-id>
<pub-id pub-id-type="pmid">11981972</pub-id>
</element-citation>
</ref>
<ref id="B69-viruses-08-00116">
<label>69.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ford</surname>
<given-names>M.E.</given-names>
</name>
<name>
<surname>Sarkis</surname>
<given-names>G.J.</given-names>
</name>
<name>
<surname>Belanger</surname>
<given-names>A.E.</given-names>
</name>
<name>
<surname>Hendrix</surname>
<given-names>R.W.</given-names>
</name>
<name>
<surname>Hatfull</surname>
<given-names>G.F.</given-names>
</name>
</person-group>
<article-title>Genome structure of mycobacteriophage D29: Implications for phage evolution</article-title>
<source>J. Mol. Biol.</source>
<year>1998</year>
<volume>279</volume>
<fpage>143</fpage>
<lpage>164</lpage>
<pub-id pub-id-type="doi">10.1006/jmbi.1997.1610</pub-id>
<pub-id pub-id-type="pmid">9636706</pub-id>
</element-citation>
</ref>
<ref id="B70-viruses-08-00116">
<label>70.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schwudke</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ergin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Michael</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Volkmar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Appel</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Knabner</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Konietzny</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Strauch</surname>
<given-names>E.</given-names>
</name>
</person-group>
<article-title>Broad-host-range Yersinia phage PY100: Genome sequence, proteome analysis of virions, and DNA packaging strategy</article-title>
<source>J. Bacteriol.</source>
<year>2008</year>
<volume>190</volume>
<fpage>332</fpage>
<lpage>342</lpage>
<pub-id pub-id-type="doi">10.1128/JB.01402-07</pub-id>
<pub-id pub-id-type="pmid">17965162</pub-id>
</element-citation>
</ref>
<ref id="B71-viruses-08-00116">
<label>71.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garcia</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Elliott</surname>
<given-names>J.M.</given-names>
</name>
<name>
<surname>Ramanculov</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Chain</surname>
<given-names>P.S.</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>M.C.</given-names>
</name>
<name>
<surname>Molineux</surname>
<given-names>I.J.</given-names>
</name>
</person-group>
<article-title>The genome sequence of Yersinia pestis bacteriophage
<italic>φ</italic>
A1122 reveals an intimate history with the coliphage T3 and T7 genomes</article-title>
<source>J. Bacteriol.</source>
<year>2003</year>
<volume>185</volume>
<fpage>5248</fpage>
<lpage>5262</lpage>
<pub-id pub-id-type="doi">10.1128/JB.185.17.5248-5262.2003</pub-id>
<pub-id pub-id-type="pmid">12923098</pub-id>
</element-citation>
</ref>
<ref id="B72-viruses-08-00116">
<label>72.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Bi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>D.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Outer membrane proteins Ail and OmpF of Yersinia pestis are involved in the adsorption of T7-related bacteriophage Yep-phi</article-title>
<source>J. Virol.</source>
<year>2013</year>
<volume>87</volume>
<fpage>12260</fpage>
<lpage>12269</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.01948-13</pub-id>
<pub-id pub-id-type="pmid">24006436</pub-id>
</element-citation>
</ref>
<ref id="B73-viruses-08-00116">
<label>73.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>H.W.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>K.H.</given-names>
</name>
</person-group>
<article-title>Comparative genomic analysis of bacteriophage EP23 infecting
<italic>Shigella sonnei</italic>
and
<italic>Escherichia coli</italic>
</article-title>
<source>J. Microbiol.</source>
<year>2011</year>
<volume>49</volume>
<fpage>927</fpage>
<lpage>934</lpage>
<pub-id pub-id-type="doi">10.1007/s12275-011-1577-0</pub-id>
<pub-id pub-id-type="pmid">22203555</pub-id>
</element-citation>
</ref>
<ref id="B74-viruses-08-00116">
<label>74.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Lappe</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Doran</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>O’Connor</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>O’Hare</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Cormican</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Characterization of bacteriophages used in the
<italic>Salmonella enterica</italic>
serovar Enteritidis phage-typing scheme</article-title>
<source>J. Med. Microbiol.</source>
<year>2009</year>
<volume>58</volume>
<fpage>86</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="doi">10.1099/jmm.0.000034-0</pub-id>
<pub-id pub-id-type="pmid">19074657</pub-id>
</element-citation>
</ref>
<ref id="B75-viruses-08-00116">
<label>75.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hood</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Phage typing of Staphylococcus aureus</article-title>
<source>J. Hyg.</source>
<year>1953</year>
<volume>51</volume>
<fpage>1</fpage>
<lpage>15</lpage>
<pub-id pub-id-type="doi">10.1017/S0022172400015448</pub-id>
<pub-id pub-id-type="pmid">13044927</pub-id>
</element-citation>
</ref>
<ref id="B76-viruses-08-00116">
<label>76.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bielke</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Donoghue</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Donoghue</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Hargis</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>Salmonella host range of bacteriophages that infect multiple genera</article-title>
<source>Poult. Sci.</source>
<year>2007</year>
<volume>86</volume>
<fpage>2536</fpage>
<lpage>2540</lpage>
<pub-id pub-id-type="doi">10.3382/ps.2007-00250</pub-id>
<pub-id pub-id-type="pmid">18029799</pub-id>
</element-citation>
</ref>
<ref id="B77-viruses-08-00116">
<label>77.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jensen</surname>
<given-names>E.C.</given-names>
</name>
<name>
<surname>Schrader</surname>
<given-names>H.S.</given-names>
</name>
<name>
<surname>Rieland</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>T.L.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K.W.</given-names>
</name>
<name>
<surname>Nickerson</surname>
<given-names>K.W.</given-names>
</name>
<name>
<surname>Kokjohn</surname>
<given-names>T.A.</given-names>
</name>
</person-group>
<article-title>Prevalence of broad-host-range lytic bacteriophages of
<italic>Sphaerotilus natans</italic>
,
<italic>Escherichia coli</italic>
, and
<italic>Pseudomonas aeruginosa</italic>
</article-title>
<source>Appl. Environ. Microbiol.</source>
<year>1998</year>
<volume>64</volume>
<fpage>575</fpage>
<lpage>580</lpage>
<pub-id pub-id-type="pmid">9464396</pub-id>
</element-citation>
</ref>
<ref id="B78-viruses-08-00116">
<label>78.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Olsen</surname>
<given-names>R.H.</given-names>
</name>
<name>
<surname>Siak</surname>
<given-names>J.S.</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>R.H.</given-names>
</name>
</person-group>
<article-title>Characteristics of PRD1, a plasmid-dependent broad host range DNA bacteriophage</article-title>
<source>J. Virol.</source>
<year>1974</year>
<volume>14</volume>
<fpage>689</fpage>
<lpage>699</lpage>
<pub-id pub-id-type="pmid">4211861</pub-id>
</element-citation>
</ref>
<ref id="B79-viruses-08-00116">
<label>79.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carson</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gorman</surname>
<given-names>S.P.</given-names>
</name>
<name>
<surname>Gilmore</surname>
<given-names>B.F.</given-names>
</name>
</person-group>
<article-title>The use of lytic bacteriophages in the prevention and eradication of biofilms of
<italic>Proteus mirabilis</italic>
and
<italic>Escherichia coli</italic>
</article-title>
<source>FEMS Immunol. Med. Microbiol.</source>
<year>2010</year>
<volume>59</volume>
<fpage>447</fpage>
<lpage>455</lpage>
<pub-id pub-id-type="doi">10.1111/j.1574-695X.2010.00696.x</pub-id>
<pub-id pub-id-type="pmid">20528927</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="viruses-08-00116-f001" position="float">
<label>Figure 1</label>
<caption>
<p>Hosts represented in the database. Species (
<bold>a</bold>
) and genera (
<bold>b</bold>
) representations are displayed in the same genera-colour code.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g001a"></graphic>
<graphic xlink:href="viruses-08-00116-g001b"></graphic>
</fig>
<fig id="viruses-08-00116-f002" position="float">
<label>Figure 2</label>
<caption>
<p>Accuracy
<italic>vs.</italic>
<italic>f</italic>
values obtained from the 4 loops of inner cross validation. Each dot represents the averaged accuracy for species (a) and genus (b) prediction over 100 bootstrap resamplings. The bars cover the range of
<italic>f</italic>
values for which the accuracy is 99% the highest accuracy in the specific tripartite set.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g002"></graphic>
</fig>
<fig id="viruses-08-00116-f003" position="float">
<label>Figure 3</label>
<caption>
<p>Accuracy
<italic>vs.</italic>
<italic>α</italic>
values for prediction of species (a) and genus (b) in each tripartite set. Each dot represents the averaged accuracy over 100 bootstrap resamplings. The bars cover the range of
<italic>α</italic>
values for which the accuracy is 99% the highest accuracy.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g003"></graphic>
</fig>
<fig id="viruses-08-00116-f004" position="float">
<label>Figure 4</label>
<caption>
<p>HostPhinder’s accuracy (bar) and prediction counts (line) on
<inline-formula>
<mml:math id="mm87">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
at different coverage ranges. The values displayed on the
<italic>x</italic>
axis are the lower limit of that range. With exception of the last bin which includes all entries with coverage >0.9, all ranges are right-closed with upper limit
<italic>x</italic>
+ 0.1. Poorly reliable results are in grey, while reliable and highly reliable results are in green and dark green respectively. Results on HostPhinder’s web server [
<xref rid="B1-viruses-08-00116" ref-type="bibr">1</xref>
] are displayed using the same colour code.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g004"></graphic>
</fig>
<fig id="viruses-08-00116-f005" position="float">
<label>Figure 5</label>
<caption>
<p>HostPhinder’s accuracy (bar) and percentages of predictions (dots) on
<inline-formula>
<mml:math id="mm88">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
at different percentages of genome length from 10% to 100% of total genome length.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g005"></graphic>
</fig>
<fig id="viruses-08-00116-f006" position="float">
<label>Figure 6</label>
<caption>
<p>Heatmap of annotated
<italic>vs.</italic>
predicted host species in the
<inline-formula>
<mml:math id="mm95">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set. In this figure correct as well as mispredicted host species can be seen. Annotated host species are listed along the
<italic>y</italic>
axis, while predicted ones are on the
<italic>x</italic>
axis. The number after each species on the
<italic>y</italic>
axis and the
<italic>x</italic>
axis also indicate the occurrences of phages in the
<inline-formula>
<mml:math id="mm96">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and in the
<inline-formula>
<mml:math id="mm97">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
respectively. Host species are grouped according to the respective phylum, which are indicated on the left side of the figure. The colour scale indicates the fraction of phages predicted as targeting a particular host and goes from white, no phages, to red, 100% of the phages. Accordingly, the colour itself is not an indicator of correctness of the prediction, and red colours along the diagonal represent correct predictions.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g006"></graphic>
</fig>
<fig id="viruses-08-00116-f007" position="float">
<label>Figure 7</label>
<caption>
<p>Heatmap of annotated
<italic>vs.</italic>
predicted host genera in the
<inline-formula>
<mml:math id="mm98">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genera</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set. In this figure correct as well as mispredicted host genera can be seen. Annotated host genera are listed along the
<italic>y</italic>
axis, while predicted ones are on the
<italic>x</italic>
axis. The number after each genus on the
<italic>y</italic>
axis and the
<italic>x</italic>
axis indicate the number of occurrences of phages in the
<inline-formula>
<mml:math id="mm99">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>eval</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula>
<mml:math id="mm100">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test,genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
respectively. Host genera are grouped according to the respective phylum, which are indicated on the left side of the figure. The colour scale indicates the fraction of phages predicted as targeting a particular host and goes from white, no phages, to intense red, 100% of the phages. Accordingly, the colour is in itself not an indicator of correctness of the prediction, and red colours along the diagonal represent correct predictions.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g007"></graphic>
</fig>
<fig id="viruses-08-00116-f008" position="float">
<label>Figure 8</label>
<caption>
<p>HostPhinder’s accuracy (bar) and prediction counts (line) on prophages predicted by PhySpy, upper panels, and manually verified prophages, lower panels, at different coverage ranges. The values displayed on the
<italic>x</italic>
axis are the lower limit of that range. With exception of the last bin which includes all entries with coverage >0.9, all ranges are right-closed with upper limit
<italic>x</italic>
+ 0.1. Poorly reliable results are in grey, while reliable and highly reliable results are in green and dark green respectively.</p>
</caption>
<graphic xlink:href="viruses-08-00116-g008"></graphic>
</fig>
<table-wrap id="viruses-08-00116-t001" position="float">
<object-id pub-id-type="pii">viruses-08-00116-t001_Table 1</object-id>
<label>Table 1</label>
<caption>
<p>Overall performance of different similarity measures on
<inline-formula>
<mml:math id="mm53">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mtext>train-test</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1"></th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Score</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">z</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">frac
<inline-formula>
<mml:math id="mm54">
<mml:msub>
<mml:mrow></mml:mrow>
<mml:mi>q</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">frac
<inline-formula>
<mml:math id="mm55">
<mml:msub>
<mml:mrow></mml:mrow>
<mml:mi>d</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Coverage</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1">Species (%)</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm56">
<mml:mrow>
<mml:mn>77</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>03</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>112</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm57">
<mml:mrow>
<mml:mn>77</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>81</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>111</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm58">
<mml:mrow>
<mml:mn>77</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>24</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>111</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm59">
<mml:mrow>
<mml:mn>78</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>43</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>111</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm60">
<mml:mrow>
<mml:mn mathvariant="bold">78</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">76</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn mathvariant="bold">0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">108</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">Genus (%)</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm61">
<mml:mrow>
<mml:mn>81</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>43</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>096</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm62">
<mml:mrow>
<mml:mn>82</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>02</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>094</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm63">
<mml:mrow>
<mml:mn>81</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>78</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>094</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm64">
<mml:mrow>
<mml:mn mathvariant="bold">83</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">07</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn mathvariant="bold">0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">09</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm65">
<mml:mrow>
<mml:mn>82</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>84</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>092</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="viruses-08-00116-t002" position="float">
<object-id pub-id-type="pii">viruses-08-00116-t002_Table 2</object-id>
<label>Table 2</label>
<caption>
<p>Average and mean standard error of the overall HostPhinder performance over 100
<inline-formula>
<mml:math id="mm72">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set resamplings with replacement.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Method</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Criterion 1
<break></break>
(First Host)</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Criterion 2 (Majority Host among Top-10)</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Criterion 3 (Coverage Threshold,
<inline-formula>
<mml:math id="mm73">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
)</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Criterion 4 (Summing up Normalized Coverage Values,
<inline-formula>
<mml:math id="mm74">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>6</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">Accuracy, Species (%)</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm75">
<mml:mrow>
<mml:mn>78</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>76</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>108</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm76">
<mml:mrow>
<mml:mn>74</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>79</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>102</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm77">
<mml:mrow>
<mml:mn>79</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>104</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm78">
<mml:mrow>
<mml:mn mathvariant="bold">79</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">13</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn mathvariant="bold">0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">105</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">Accuracy, Genus (%)</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm79">
<mml:mrow>
<mml:mn>82</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>84</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>092</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm80">
<mml:mrow>
<mml:mn>80</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>41</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>099</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm81">
<mml:mrow>
<mml:mn>83</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>61</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>092</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<inline-formula>
<mml:math id="mm82">
<mml:mrow>
<mml:mn mathvariant="bold">83</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">72</mml:mn>
<mml:mo>±</mml:mo>
<mml:mn mathvariant="bold">0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn mathvariant="bold">092</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="viruses-08-00116-t003" position="float">
<object-id pub-id-type="pii">viruses-08-00116-t003_Table 3</object-id>
<label>Table 3</label>
<caption>
<p>List of host species (left) and genera (right), which HostPhinder predicts correctly.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Species</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Representation in
<inline-formula>
<mml:math id="mm91">
<mml:msub>
<mml:mi mathvariant="bold">phages</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">train-test,species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Genus</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">Representation in
<inline-formula>
<mml:math id="mm92">
<mml:msub>
<mml:mi mathvariant="bold">phages</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">train-test,genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Enterococcus faecalis</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">15</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Acinetobacter</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">16</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Listeria monocytogenes</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">21</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Listeria</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">26</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Propionibacterium acnes</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">21</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Propionibacterium</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">24</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Vibrio cholerae</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">35</td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Streptococcus</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">39</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1"></td>
<td align="center" valign="middle" rowspan="1" colspan="1"></td>
<td align="center" valign="middle" rowspan="1" colspan="1">
<italic>Streptomyces</italic>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">11</td>
</tr>
<tr>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1"></td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1"></td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">
<italic>Thermus</italic>
</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">5</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="viruses-08-00116-t004" position="float">
<object-id pub-id-type="pii">viruses-08-00116-t004_Table 4</object-id>
<label>Table 4</label>
<caption>
<p>HostPhinder and BLAST performance comparison on the
<inline-formula>
<mml:math id="mm103">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mi>eval</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
set.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1"></th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">BLAST</th>
<th align="center" valign="middle" style="border-bottom:solid thin;border-top:solid thin" rowspan="1" colspan="1">HostPhinder</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle" rowspan="1" colspan="1">No. of predictions, training on
<inline-formula>
<mml:math id="mm104">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>genus</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">90%</td>
<td align="center" valign="middle" rowspan="1" colspan="1">97%</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="1" colspan="1">No. of predictions, training on
<inline-formula>
<mml:math id="mm105">
<mml:mrow>
<mml:msub>
<mml:mi>phages</mml:mi>
<mml:mrow>
<mml:mi>train-test</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>species</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center" valign="middle" rowspan="1" colspan="1">91%</td>
<td align="center" valign="middle" rowspan="1" colspan="1">96%</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="1" colspan="1">Accuracy on common predictions (GENERA) (%)</td>
<td align="center" valign="middle" rowspan="1" colspan="1">84.66 ± 0.188</td>
<td align="center" valign="middle" rowspan="1" colspan="1">85.13 ± 0.176</td>
</tr>
<tr>
<td align="left" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">Accuracy on common predictions (SPECIES) (%)</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">76.92 ± 0.252</td>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">78.69 ± 0.237</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="viruses-08-00116-t005" position="float">
<object-id pub-id-type="pii">viruses-08-00116-t005_Table 5</object-id>
<label>Table 5</label>
<caption>
<p>Overview of the results of HostPhinder predicting the hosts of 19 phage draft genomes (name starts with a “D” and
<italic>Proteus</italic>
) and 4 phage genome fragments (name starts with an “F”) from the INTESTI phage cocktail.</p>
</caption>
<table frame="hsides" rules="groups" style="border-top: hidden; border-bottom: hidden">
<tbody>
<tr>
<td rowspan="1" colspan="1">
<inline-graphic xlink:href="viruses-08-00116-i001.jpg"></inline-graphic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0001069 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0001069 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021