Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Phenetic Comparison of Prokaryotic Genomes Using k-mers

Identifieur interne : 000F24 ( Pmc/Corpus ); précédent : 000F23; suivant : 000F25

Phenetic Comparison of Prokaryotic Genomes Using k-mers

Auteurs : Maxime Déraspe ; Frédéric Raymond ; Sébastien Boisvert ; Alexander Culley ; Paul H. Roy ; François Laviolette ; Jacques Corbeil

Source :

RBID : PMC:5850840

Abstract

Abstract

Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.


Url:
DOI: 10.1093/molbev/msx200
PubMed: 28957508
PubMed Central: 5850840

Links to Exploration step

PMC:5850840

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Phenetic Comparison of Prokaryotic Genomes Using k-mers</title>
<author>
<name sortKey="Deraspe, Maxime" sort="Deraspe, Maxime" uniqKey="Deraspe M" first="Maxime" last="Déraspe">Maxime Déraspe</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff3">Département de Médecine Moléculaire, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Raymond, Frederic" sort="Raymond, Frederic" uniqKey="Raymond F" first="Frédéric" last="Raymond">Frédéric Raymond</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boisvert, Sebastien" sort="Boisvert, Sebastien" uniqKey="Boisvert S" first="Sébastien" last="Boisvert">Sébastien Boisvert</name>
<affiliation>
<nlm:aff id="msx200-aff4">Gydle Inc., Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Culley, Alexander" sort="Culley, Alexander" uniqKey="Culley A" first="Alexander" last="Culley">Alexander Culley</name>
<affiliation>
<nlm:aff id="msx200-aff5">Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Roy, Paul H" sort="Roy, Paul H" uniqKey="Roy P" first="Paul H." last="Roy">Paul H. Roy</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff5">Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Laviolette, Francois" sort="Laviolette, Francois" uniqKey="Laviolette F" first="François" last="Laviolette">François Laviolette</name>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff6">Département d’Informatique et de Génie Logiciel, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Corbeil, Jacques" sort="Corbeil, Jacques" uniqKey="Corbeil J" first="Jacques" last="Corbeil">Jacques Corbeil</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff3">Département de Médecine Moléculaire, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28957508</idno>
<idno type="pmc">5850840</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850840</idno>
<idno type="RBID">PMC:5850840</idno>
<idno type="doi">10.1093/molbev/msx200</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000F24</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F24</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Phenetic Comparison of Prokaryotic Genomes Using k-mers</title>
<author>
<name sortKey="Deraspe, Maxime" sort="Deraspe, Maxime" uniqKey="Deraspe M" first="Maxime" last="Déraspe">Maxime Déraspe</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff3">Département de Médecine Moléculaire, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Raymond, Frederic" sort="Raymond, Frederic" uniqKey="Raymond F" first="Frédéric" last="Raymond">Frédéric Raymond</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boisvert, Sebastien" sort="Boisvert, Sebastien" uniqKey="Boisvert S" first="Sébastien" last="Boisvert">Sébastien Boisvert</name>
<affiliation>
<nlm:aff id="msx200-aff4">Gydle Inc., Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Culley, Alexander" sort="Culley, Alexander" uniqKey="Culley A" first="Alexander" last="Culley">Alexander Culley</name>
<affiliation>
<nlm:aff id="msx200-aff5">Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Roy, Paul H" sort="Roy, Paul H" uniqKey="Roy P" first="Paul H." last="Roy">Paul H. Roy</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff5">Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Laviolette, Francois" sort="Laviolette, Francois" uniqKey="Laviolette F" first="François" last="Laviolette">François Laviolette</name>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff6">Département d’Informatique et de Génie Logiciel, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Corbeil, Jacques" sort="Corbeil, Jacques" uniqKey="Corbeil J" first="Jacques" last="Corbeil">Jacques Corbeil</name>
<affiliation>
<nlm:aff id="msx200-aff1">Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff2">Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msx200-aff3">Département de Médecine Moléculaire, Université Laval, Quebec City, QC, Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Molecular Biology and Evolution</title>
<idno type="ISSN">0737-4038</idno>
<idno type="eISSN">1537-1719</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of
<italic>Streptococcus pneumoniae</italic>
and
<italic>Pseudomonas aeruginosa</italic>
. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in
<italic>P. aeruginosa</italic>
than in
<italic>S. pneumoniae</italic>
, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Allison, Ge" uniqKey="Allison G">GE Allison</name>
</author>
<author>
<name sortKey="Verma, Nk" uniqKey="Verma N">NK. Verma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andam, Cp" uniqKey="Andam C">CP Andam</name>
</author>
<author>
<name sortKey="Hanage, Wp" uniqKey="Hanage W">WP. Hanage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balvo It, M" uniqKey="Balvo It M">M Balvočit</name>
</author>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH. Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Biek, R" uniqKey="Biek R">R Biek</name>
</author>
<author>
<name sortKey="Pybus, Og" uniqKey="Pybus O">OG Pybus</name>
</author>
<author>
<name sortKey="Lloyd Smith, Jo" uniqKey="Lloyd Smith J">JO Lloyd-Smith</name>
</author>
<author>
<name sortKey="Didelot, X" uniqKey="Didelot X">X. Didelot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boc, A" uniqKey="Boc A">A Boc</name>
</author>
<author>
<name sortKey="Diallo, Ab" uniqKey="Diallo A">AB Diallo</name>
</author>
<author>
<name sortKey="Makarenkov, V" uniqKey="Makarenkov V">V. Makarenkov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J. Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Raymond, F" uniqKey="Raymond F">F Raymond</name>
</author>
<author>
<name sortKey="Godzaridis, E" uniqKey="Godzaridis E">E Godzaridis</name>
</author>
<author>
<name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J. Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Botzman, M" uniqKey="Botzman M">M Botzman</name>
</author>
<author>
<name sortKey="Margalit, H" uniqKey="Margalit H">H. Margalit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cimermancic, P" uniqKey="Cimermancic P">P Cimermancic</name>
</author>
<author>
<name sortKey="Medema, Mh" uniqKey="Medema M">MH Medema</name>
</author>
<author>
<name sortKey="Claesen, J" uniqKey="Claesen J">J Claesen</name>
</author>
<author>
<name sortKey="Kurita, K" uniqKey="Kurita K">K Kurita</name>
</author>
<author>
<name sortKey="Wieland Brown, Lc" uniqKey="Wieland Brown L">LC Wieland Brown</name>
</author>
<author>
<name sortKey="Mavrommatis, K" uniqKey="Mavrommatis K">K Mavrommatis</name>
</author>
<author>
<name sortKey="Pati, A" uniqKey="Pati A">A Pati</name>
</author>
<author>
<name sortKey="Godfrey, Pa" uniqKey="Godfrey P">PA Godfrey</name>
</author>
<author>
<name sortKey="Koehrsen, M" uniqKey="Koehrsen M">M Koehrsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cock, Pja" uniqKey="Cock P">PJA Cock</name>
</author>
<author>
<name sortKey="Antao, T" uniqKey="Antao T">T Antao</name>
</author>
<author>
<name sortKey="Chang, Jt" uniqKey="Chang J">JT Chang</name>
</author>
<author>
<name sortKey="Chapman, Ba" uniqKey="Chapman B">BA Chapman</name>
</author>
<author>
<name sortKey="Cox, Cj" uniqKey="Cox C">CJ Cox</name>
</author>
<author>
<name sortKey="Dalke, A" uniqKey="Dalke A">A Dalke</name>
</author>
<author>
<name sortKey="Friedberg, I" uniqKey="Friedberg I">I Friedberg</name>
</author>
<author>
<name sortKey="Hamelryck, T" uniqKey="Hamelryck T">T Hamelryck</name>
</author>
<author>
<name sortKey="Kauff, F" uniqKey="Kauff F">F Kauff</name>
</author>
<author>
<name sortKey="Wilczynski, B" uniqKey="Wilczynski B">B Wilczynski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Colombo, M L" uniqKey="Colombo M">M-L Colombo</name>
</author>
<author>
<name sortKey="Hanique, S" uniqKey="Hanique S">S Hanique</name>
</author>
<author>
<name sortKey="Baurin, Sl" uniqKey="Baurin S">SL Baurin</name>
</author>
<author>
<name sortKey="Bauvois, C" uniqKey="Bauvois C">C Bauvois</name>
</author>
<author>
<name sortKey="De Vriendt, K" uniqKey="De Vriendt K">K De Vriendt</name>
</author>
<author>
<name sortKey="Van Beeumen, Jj" uniqKey="Van Beeumen J">JJ Van Beeumen</name>
</author>
<author>
<name sortKey="Frere, J M" uniqKey="Frere J">J-M Frère</name>
</author>
<author>
<name sortKey="Joris, B" uniqKey="Joris B">B. Joris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Compeau, Pec" uniqKey="Compeau P">PEC Compeau</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G. Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Croucher, Nj" uniqKey="Croucher N">NJ Croucher</name>
</author>
<author>
<name sortKey="Finkelstein, Ja" uniqKey="Finkelstein J">JA Finkelstein</name>
</author>
<author>
<name sortKey="Pelton, Si" uniqKey="Pelton S">SI Pelton</name>
</author>
<author>
<name sortKey="Mitchell, Pk" uniqKey="Mitchell P">PK Mitchell</name>
</author>
<author>
<name sortKey="Lee, Gm" uniqKey="Lee G">GM Lee</name>
</author>
<author>
<name sortKey="Parkhill, J" uniqKey="Parkhill J">J Parkhill</name>
</author>
<author>
<name sortKey="Bentley, Sd" uniqKey="Bentley S">SD Bentley</name>
</author>
<author>
<name sortKey="Hanage, Wp" uniqKey="Hanage W">WP Hanage</name>
</author>
<author>
<name sortKey="Lipsitch, M" uniqKey="Lipsitch M">M. Lipsitch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Croucher, Nj" uniqKey="Croucher N">NJ Croucher</name>
</author>
<author>
<name sortKey="Finkelstein, Ja" uniqKey="Finkelstein J">JA Finkelstein</name>
</author>
<author>
<name sortKey="Pelton, Si" uniqKey="Pelton S">SI Pelton</name>
</author>
<author>
<name sortKey="Parkhill, J" uniqKey="Parkhill J">J Parkhill</name>
</author>
<author>
<name sortKey="Bentley, Sd" uniqKey="Bentley S">SD Bentley</name>
</author>
<author>
<name sortKey="Lipsitch, M" uniqKey="Lipsitch M">M Lipsitch</name>
</author>
<author>
<name sortKey="Hanage, Wp" uniqKey="Hanage W">WP. Hanage</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author>
<name sortKey="Kokot, M" uniqKey="Kokot M">M Kokot</name>
</author>
<author>
<name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
<author>
<name sortKey="Debudaj Grabysz, A" uniqKey="Debudaj Grabysz A">A. Debudaj-Grabysz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dobrindt, U" uniqKey="Dobrindt U">U Dobrindt</name>
</author>
<author>
<name sortKey="Hochhut, B" uniqKey="Hochhut B">B Hochhut</name>
</author>
<author>
<name sortKey="Hentschel, U" uniqKey="Hentschel U">U Hentschel</name>
</author>
<author>
<name sortKey="Hacker, J" uniqKey="Hacker J">J. Hacker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Donati, C" uniqKey="Donati C">C Donati</name>
</author>
<author>
<name sortKey="Hiller, Nl" uniqKey="Hiller N">NL Hiller</name>
</author>
<author>
<name sortKey="Tettelin, H" uniqKey="Tettelin H">H Tettelin</name>
</author>
<author>
<name sortKey="Muzzi, A" uniqKey="Muzzi A">A Muzzi</name>
</author>
<author>
<name sortKey="Croucher, Nj" uniqKey="Croucher N">NJ Croucher</name>
</author>
<author>
<name sortKey="Angiuoli, Sv" uniqKey="Angiuoli S">SV Angiuoli</name>
</author>
<author>
<name sortKey="Oggioni, M" uniqKey="Oggioni M">M Oggioni</name>
</author>
<author>
<name sortKey="Dunning Hotopp, Jc" uniqKey="Dunning Hotopp J">JC Dunning Hotopp</name>
</author>
<author>
<name sortKey="Hu, Fz" uniqKey="Hu F">FZ Hu</name>
</author>
<author>
<name sortKey="Riley, Dr" uniqKey="Riley D">DR Riley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drouin, A" uniqKey="Drouin A">A Drouin</name>
</author>
<author>
<name sortKey="Giguere, S" uniqKey="Giguere S">S Giguère</name>
</author>
<author>
<name sortKey="Deraspe, M" uniqKey="Deraspe M">M Déraspe</name>
</author>
<author>
<name sortKey="Marchand, M" uniqKey="Marchand M">M Marchand</name>
</author>
<author>
<name sortKey="Tyers, M" uniqKey="Tyers M">M Tyers</name>
</author>
<author>
<name sortKey="Loo, Vg" uniqKey="Loo V">VG Loo</name>
</author>
<author>
<name sortKey="Bourgault, A M" uniqKey="Bourgault A">A-M Bourgault</name>
</author>
<author>
<name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J. Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Federhen, S" uniqKey="Federhen S">S. Federhen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fenselau, C" uniqKey="Fenselau C">C Fenselau</name>
</author>
<author>
<name sortKey="Havey, C" uniqKey="Havey C">C Havey</name>
</author>
<author>
<name sortKey="Teerakulkittipong, N" uniqKey="Teerakulkittipong N">N Teerakulkittipong</name>
</author>
<author>
<name sortKey="Swatkoski, S" uniqKey="Swatkoski S">S Swatkoski</name>
</author>
<author>
<name sortKey="Laine, O" uniqKey="Laine O">O Laine</name>
</author>
<author>
<name sortKey="Edwards, N" uniqKey="Edwards N">N. Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Foerstner, Ku" uniqKey="Foerstner K">KU Foerstner</name>
</author>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Hooper, Sd" uniqKey="Hooper S">SD Hooper</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P. Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fowlkes, Eb" uniqKey="Fowlkes E">EB Fowlkes</name>
</author>
<author>
<name sortKey="Mallows, Cl" uniqKey="Mallows C">CL. Mallows</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Galili, T" uniqKey="Galili T">T. Galili</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gardner, Sn" uniqKey="Gardner S">SN Gardner</name>
</author>
<author>
<name sortKey="Slezak, T" uniqKey="Slezak T">T Slezak</name>
</author>
<author>
<name sortKey="Hall, Bg" uniqKey="Hall B">BG. Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gire, Sk" uniqKey="Gire S">SK Gire</name>
</author>
<author>
<name sortKey="Goba, A" uniqKey="Goba A">A Goba</name>
</author>
<author>
<name sortKey="Andersen, Kg" uniqKey="Andersen K">KG Andersen</name>
</author>
<author>
<name sortKey="Sealfon, Rsg" uniqKey="Sealfon R">RSG Sealfon</name>
</author>
<author>
<name sortKey="Park, Dj" uniqKey="Park D">DJ Park</name>
</author>
<author>
<name sortKey="Kanneh, L" uniqKey="Kanneh L">L Kanneh</name>
</author>
<author>
<name sortKey="Jalloh, S" uniqKey="Jalloh S">S Jalloh</name>
</author>
<author>
<name sortKey="Momoh, M" uniqKey="Momoh M">M Momoh</name>
</author>
<author>
<name sortKey="Fullah, M" uniqKey="Fullah M">M Fullah</name>
</author>
<author>
<name sortKey="Dudas, G" uniqKey="Dudas G">G Dudas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glaeser, Sp" uniqKey="Glaeser S">SP Glaeser</name>
</author>
<author>
<name sortKey="K Mpfer, P" uniqKey="K Mpfer P">P. Kämpfer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guindon, S" uniqKey="Guindon S">S Guindon</name>
</author>
<author>
<name sortKey="Gascuel, O" uniqKey="Gascuel O">O. Gascuel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haubold, B" uniqKey="Haubold B">B. Haubold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hazen, Th" uniqKey="Hazen T">TH Hazen</name>
</author>
<author>
<name sortKey="Pan, L" uniqKey="Pan L">L Pan</name>
</author>
<author>
<name sortKey="Gu, J D" uniqKey="Gu J">J-D Gu</name>
</author>
<author>
<name sortKey="Sobecky, Pa" uniqKey="Sobecky P">PA. Sobecky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hewitt, Ce" uniqKey="Hewitt C">CE. Hewitt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hilty, M" uniqKey="Hilty M">M Hilty</name>
</author>
<author>
<name sortKey="Wuthrich, D" uniqKey="Wuthrich D">D Wüthrich</name>
</author>
<author>
<name sortKey="Salter, Sj" uniqKey="Salter S">SJ Salter</name>
</author>
<author>
<name sortKey="Engel, H" uniqKey="Engel H">H Engel</name>
</author>
<author>
<name sortKey="Campbell, S" uniqKey="Campbell S">S Campbell</name>
</author>
<author>
<name sortKey="Sa Leao, R" uniqKey="Sa Leao R">R Sá-Leão</name>
</author>
<author>
<name sortKey="De Lencastre, H" uniqKey="De Lencastre H">H De Lencastre</name>
</author>
<author>
<name sortKey="Hermans, P" uniqKey="Hermans P">P Hermans</name>
</author>
<author>
<name sortKey="Sadowy, E" uniqKey="Sadowy E">E Sadowy</name>
</author>
<author>
<name sortKey="Turner, P" uniqKey="Turner P">P Turner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huerta Cepas, J" uniqKey="Huerta Cepas J">J Huerta-Cepas</name>
</author>
<author>
<name sortKey="Serra, F" uniqKey="Serra F">F Serra</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P. Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, E" uniqKey="Jones E">E Jones</name>
</author>
<author>
<name sortKey="Oliphant, T" uniqKey="Oliphant T">T Oliphant</name>
</author>
<author>
<name sortKey="Peterson, P" uniqKey="Peterson P">P Peterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Katoh, K" uniqKey="Katoh K">K Katoh</name>
</author>
<author>
<name sortKey="Standley, Dm" uniqKey="Standley D">DM. Standley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Konstantinidis, Kt" uniqKey="Konstantinidis K">KT Konstantinidis</name>
</author>
<author>
<name sortKey="Ramette, A" uniqKey="Ramette A">A Ramette</name>
</author>
<author>
<name sortKey="Tiedje, Jm" uniqKey="Tiedje J">JM. Tiedje</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kos, Vn" uniqKey="Kos V">VN Kos</name>
</author>
<author>
<name sortKey="Deraspe, M" uniqKey="Deraspe M">M Déraspe</name>
</author>
<author>
<name sortKey="Mclaughlin, Re" uniqKey="Mclaughlin R">RE McLaughlin</name>
</author>
<author>
<name sortKey="Whiteaker, Jd" uniqKey="Whiteaker J">JD Whiteaker</name>
</author>
<author>
<name sortKey="Roy, Ph" uniqKey="Roy P">PH Roy</name>
</author>
<author>
<name sortKey="Alm, Ra" uniqKey="Alm R">RA Alm</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
<author>
<name sortKey="Gardner, H" uniqKey="Gardner H">H. Gardner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuhner, Mk" uniqKey="Kuhner M">MK Kuhner</name>
</author>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J. Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Land, M" uniqKey="Land M">M Land</name>
</author>
<author>
<name sortKey="Hauser, L" uniqKey="Hauser L">L Hauser</name>
</author>
<author>
<name sortKey="Jun, S R" uniqKey="Jun S">S-R Jun</name>
</author>
<author>
<name sortKey="Nookaew, I" uniqKey="Nookaew I">I Nookaew</name>
</author>
<author>
<name sortKey="Leuze, Mr" uniqKey="Leuze M">MR Leuze</name>
</author>
<author>
<name sortKey="Ahn, T H" uniqKey="Ahn T">T-H Ahn</name>
</author>
<author>
<name sortKey="Karpinets, T" uniqKey="Karpinets T">T Karpinets</name>
</author>
<author>
<name sortKey="Lund, O" uniqKey="Lund O">O Lund</name>
</author>
<author>
<name sortKey="Kora, G" uniqKey="Kora G">G Kora</name>
</author>
<author>
<name sortKey="Wassenaar, T" uniqKey="Wassenaar T">T Wassenaar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Larsson, P" uniqKey="Larsson P">P Larsson</name>
</author>
<author>
<name sortKey="Elfsmark, D" uniqKey="Elfsmark D">D Elfsmark</name>
</author>
<author>
<name sortKey="Svensson, K" uniqKey="Svensson K">K Svensson</name>
</author>
<author>
<name sortKey="Wikstrom, P" uniqKey="Wikstrom P">P Wikström</name>
</author>
<author>
<name sortKey="Forsman, M" uniqKey="Forsman M">M Forsman</name>
</author>
<author>
<name sortKey="Brettin, T" uniqKey="Brettin T">T Brettin</name>
</author>
<author>
<name sortKey="Keim, P" uniqKey="Keim P">P Keim</name>
</author>
<author>
<name sortKey="Johansson, A" uniqKey="Johansson A">A. Johansson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lassalle, F" uniqKey="Lassalle F">F Lassalle</name>
</author>
<author>
<name sortKey="Perian, S" uniqKey="Perian S">S Périan</name>
</author>
<author>
<name sortKey="Bataillon, T" uniqKey="Bataillon T">T Bataillon</name>
</author>
<author>
<name sortKey="Nesme, X" uniqKey="Nesme X">X Nesme</name>
</author>
<author>
<name sortKey="Duret, L" uniqKey="Duret L">L Duret</name>
</author>
<author>
<name sortKey="Daubin, V" uniqKey="Daubin V">V. Daubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Yan, X" uniqKey="Yan X">X. Yan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loureiro, A" uniqKey="Loureiro A">A Loureiro</name>
</author>
<author>
<name sortKey="Torgo, L" uniqKey="Torgo L">L Torgo</name>
</author>
<author>
<name sortKey="Soares, C" uniqKey="Soares C">C. Soares</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C. Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Materon, Ic" uniqKey="Materon I">IC Materon</name>
</author>
<author>
<name sortKey="Queenan, Am" uniqKey="Queenan A">AM Queenan</name>
</author>
<author>
<name sortKey="Koehler, Tm" uniqKey="Koehler T">TM Koehler</name>
</author>
<author>
<name sortKey="Bush, K" uniqKey="Bush K">K Bush</name>
</author>
<author>
<name sortKey="Palzkill, T" uniqKey="Palzkill T">T. Palzkill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medema, Mh" uniqKey="Medema M">MH Medema</name>
</author>
<author>
<name sortKey="Kottmann, R" uniqKey="Kottmann R">R Kottmann</name>
</author>
<author>
<name sortKey="Yilmaz, P" uniqKey="Yilmaz P">P Yilmaz</name>
</author>
<author>
<name sortKey="Cummings, M" uniqKey="Cummings M">M Cummings</name>
</author>
<author>
<name sortKey="Biggins, Jb" uniqKey="Biggins J">JB Biggins</name>
</author>
<author>
<name sortKey="Blin, K" uniqKey="Blin K">K Blin</name>
</author>
<author>
<name sortKey="De Bruijn, I" uniqKey="De Bruijn I">I de Bruijn</name>
</author>
<author>
<name sortKey="Chooi, Yh" uniqKey="Chooi Y">YH Chooi</name>
</author>
<author>
<name sortKey="Claesen, J" uniqKey="Claesen J">J Claesen</name>
</author>
<author>
<name sortKey="Coates, Rc" uniqKey="Coates R">RC Coates</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medema, Mh" uniqKey="Medema M">MH Medema</name>
</author>
<author>
<name sortKey="Kottmann, R" uniqKey="Kottmann R">R Kottmann</name>
</author>
<author>
<name sortKey="Yilmaz, P" uniqKey="Yilmaz P">P Yilmaz</name>
</author>
<author>
<name sortKey="Cummings, M" uniqKey="Cummings M">M Cummings</name>
</author>
<author>
<name sortKey="Biggins, Jb" uniqKey="Biggins J">JB Biggins</name>
</author>
<author>
<name sortKey="Blin, K" uniqKey="Blin K">K Blin</name>
</author>
<author>
<name sortKey="De Bruijn, I" uniqKey="De Bruijn I">I de Bruijn</name>
</author>
<author>
<name sortKey="Chooi, Yh" uniqKey="Chooi Y">YH Chooi</name>
</author>
<author>
<name sortKey="Claesen, J" uniqKey="Claesen J">J Claesen</name>
</author>
<author>
<name sortKey="Coates, Rc" uniqKey="Coates R">RC Coates</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medini, D" uniqKey="Medini D">D Medini</name>
</author>
<author>
<name sortKey="Donati, C" uniqKey="Donati C">C Donati</name>
</author>
<author>
<name sortKey="Tettelin, H" uniqKey="Tettelin H">H Tettelin</name>
</author>
<author>
<name sortKey="Masignani, V" uniqKey="Masignani V">V Masignani</name>
</author>
<author>
<name sortKey="Rappuoli, R" uniqKey="Rappuoli R">R. Rappuoli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author>
<name sortKey="Pritchard, Jk" uniqKey="Pritchard J">JK. Pritchard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Metcalf, Ja" uniqKey="Metcalf J">JA Metcalf</name>
</author>
<author>
<name sortKey="Funkhouser Jones, Lj" uniqKey="Funkhouser Jones L">LJ Funkhouser-Jones</name>
</author>
<author>
<name sortKey="Brileya, K" uniqKey="Brileya K">K Brileya</name>
</author>
<author>
<name sortKey="Reysenbach, A L" uniqKey="Reysenbach A">A-L Reysenbach</name>
</author>
<author>
<name sortKey="Bordenstein, Sr" uniqKey="Bordenstein S">SR. Bordenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mooers, H" uniqKey="Mooers H">H. Mooers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nasser, W" uniqKey="Nasser W">W Nasser</name>
</author>
<author>
<name sortKey="Beres, Sb" uniqKey="Beres S">SB Beres</name>
</author>
<author>
<name sortKey="Olsen, Rj" uniqKey="Olsen R">RJ Olsen</name>
</author>
<author>
<name sortKey="Dean, Ma" uniqKey="Dean M">MA Dean</name>
</author>
<author>
<name sortKey="Rice, Ka" uniqKey="Rice K">KA Rice</name>
</author>
<author>
<name sortKey="Long, Sw" uniqKey="Long S">SW Long</name>
</author>
<author>
<name sortKey="Kristinsson, Kg" uniqKey="Kristinsson K">KG Kristinsson</name>
</author>
<author>
<name sortKey="Gottfredsson, M" uniqKey="Gottfredsson M">M Gottfredsson</name>
</author>
<author>
<name sortKey="Vuopio, J" uniqKey="Vuopio J">J Vuopio</name>
</author>
<author>
<name sortKey="Raisanen, K" uniqKey="Raisanen K">K Raisanen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ondov, Bd" uniqKey="Ondov B">BD Ondov</name>
</author>
<author>
<name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author>
<name sortKey="Mallonee, Ab" uniqKey="Mallonee A">AB Mallonee</name>
</author>
<author>
<name sortKey="Bergman, Nh" uniqKey="Bergman N">NH Bergman</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
<author>
<name sortKey="Phillippy, Am" uniqKey="Phillippy A">AM. Phillippy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paradis, E" uniqKey="Paradis E">E Paradis</name>
</author>
<author>
<name sortKey="Claude, J" uniqKey="Claude J">J Claude</name>
</author>
<author>
<name sortKey="Strimmer, K" uniqKey="Strimmer K">K. Strimmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="P Rn Nen, K" uniqKey="P Rn Nen K">K Pärnänen</name>
</author>
<author>
<name sortKey="Karkman, A" uniqKey="Karkman A">A Karkman</name>
</author>
<author>
<name sortKey="Tamminen, M" uniqKey="Tamminen M">M Tamminen</name>
</author>
<author>
<name sortKey="Lyra, C" uniqKey="Lyra C">C Lyra</name>
</author>
<author>
<name sortKey="Hultman, J" uniqKey="Hultman J">J Hultman</name>
</author>
<author>
<name sortKey="Paulin, L" uniqKey="Paulin L">L Paulin</name>
</author>
<author>
<name sortKey="Virta, M" uniqKey="Virta M">M. Virta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patwardhan, A" uniqKey="Patwardhan A">A Patwardhan</name>
</author>
<author>
<name sortKey="Ray, S" uniqKey="Ray S">S Ray</name>
</author>
<author>
<name sortKey="Roy, A" uniqKey="Roy A">A. Roy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pennisi, E" uniqKey="Pennisi E">E. Pennisi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
<author>
<name sortKey="Douady, Cj" uniqKey="Douady C">CJ. Douady</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Price, Mn" uniqKey="Price M">MN Price</name>
</author>
<author>
<name sortKey="Dehal, Ps" uniqKey="Dehal P">PS Dehal</name>
</author>
<author>
<name sortKey="Arkin, Ap" uniqKey="Arkin A">AP. Arkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Luo, H" uniqKey="Luo H">H Luo</name>
</author>
<author>
<name sortKey="Hao, B" uniqKey="Hao B">B. Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raymond, F" uniqKey="Raymond F">F Raymond</name>
</author>
<author>
<name sortKey="Ouameur, Aa" uniqKey="Ouameur A">AA Ouameur</name>
</author>
<author>
<name sortKey="Deraspe, M" uniqKey="Deraspe M">M Déraspe</name>
</author>
<author>
<name sortKey="Iqbal, N" uniqKey="Iqbal N">N Iqbal</name>
</author>
<author>
<name sortKey="Gingras, H" uniqKey="Gingras H">H Gingras</name>
</author>
<author>
<name sortKey="Dridi, B" uniqKey="Dridi B">B Dridi</name>
</author>
<author>
<name sortKey="Leprohon, P" uniqKey="Leprohon P">P Leprohon</name>
</author>
<author>
<name sortKey="Plante, P L" uniqKey="Plante P">P-L Plante</name>
</author>
<author>
<name sortKey="Giroux, R" uniqKey="Giroux R">R Giroux</name>
</author>
<author>
<name sortKey="Berube, E" uniqKey="Berube E">È Bérubé</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raymond, F" uniqKey="Raymond F">F Raymond</name>
</author>
<author>
<name sortKey="Deraspe, M" uniqKey="Deraspe M">M Déraspe</name>
</author>
<author>
<name sortKey="Boissinot, M" uniqKey="Boissinot M">M Boissinot</name>
</author>
<author>
<name sortKey="Bergeron, Mg" uniqKey="Bergeron M">MG Bergeron</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J. Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Chew, D" uniqKey="Chew D">D Chew</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS. Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author>
<name sortKey="Lavenier, D" uniqKey="Lavenier D">D Lavenier</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robinson, Df" uniqKey="Robinson D">DF Robinson</name>
</author>
<author>
<name sortKey="Foulds, Lr" uniqKey="Foulds L">LR. Foulds</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodionov, Da" uniqKey="Rodionov D">DA Rodionov</name>
</author>
<author>
<name sortKey="Gelfand, Ms" uniqKey="Gelfand M">MS Gelfand</name>
</author>
<author>
<name sortKey="Mironov, Aa" uniqKey="Mironov A">AA Mironov</name>
</author>
<author>
<name sortKey="Rakhmaninova, Ab" uniqKey="Rakhmaninova A">AB. Rakhmaninova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Romero, P" uniqKey="Romero P">P Romero</name>
</author>
<author>
<name sortKey="Llull, D" uniqKey="Llull D">D Llull</name>
</author>
<author>
<name sortKey="Garcia, E" uniqKey="Garcia E">E García</name>
</author>
<author>
<name sortKey="Mitchell, Tj" uniqKey="Mitchell T">TJ Mitchell</name>
</author>
<author>
<name sortKey="L Pez, R" uniqKey="L Pez R">R López</name>
</author>
<author>
<name sortKey="Moscoso, M" uniqKey="Moscoso M">M. Moscoso</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rossello Mora, R" uniqKey="Rossello Mora R">R Rossello-Mora</name>
</author>
<author>
<name sortKey="Amann, R" uniqKey="Amann R">R. Amann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sansinenea, E" uniqKey="Sansinenea E">E Sansinenea</name>
</author>
<author>
<name sortKey="Ortiz, A" uniqKey="Ortiz A">A. Ortiz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schuch, R" uniqKey="Schuch R">R Schuch</name>
</author>
<author>
<name sortKey="Fischetti, Va" uniqKey="Fischetti V">VA. Fischetti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shapiro, Bj" uniqKey="Shapiro B">BJ Shapiro</name>
</author>
<author>
<name sortKey="Friedman, J" uniqKey="Friedman J">J Friedman</name>
</author>
<author>
<name sortKey="Cordero, Ox" uniqKey="Cordero O">OX Cordero</name>
</author>
<author>
<name sortKey="Preheim, Sp" uniqKey="Preheim S">SP Preheim</name>
</author>
<author>
<name sortKey="Timberlake, Sc" uniqKey="Timberlake S">SC Timberlake</name>
</author>
<author>
<name sortKey="Szab, G" uniqKey="Szab G">G Szabó</name>
</author>
<author>
<name sortKey="Polz, Mf" uniqKey="Polz M">MF Polz</name>
</author>
<author>
<name sortKey="Alm, Ej" uniqKey="Alm E">EJ. Alm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Siva, N" uniqKey="Siva N">N. Siva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Snitkin, Es" uniqKey="Snitkin E">ES Snitkin</name>
</author>
<author>
<name sortKey="Zelazny, Am" uniqKey="Zelazny A">AM Zelazny</name>
</author>
<author>
<name sortKey="Thomas, Pj" uniqKey="Thomas P">PJ Thomas</name>
</author>
<author>
<name sortKey="Stock, F" uniqKey="Stock F">F Stock</name>
</author>
<author>
<name sortKey="Henderson, Dk" uniqKey="Henderson D">DK Henderson</name>
</author>
<author>
<name sortKey="Palmore, Tn" uniqKey="Palmore T">TN Palmore</name>
</author>
<author>
<name sortKey="Segre, Ja" uniqKey="Segre J">JA. Segre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sokal, R" uniqKey="Sokal R">R Sokal</name>
</author>
<author>
<name sortKey="Rohlf, F" uniqKey="Rohlf F">F. Rohlf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, K" uniqKey="Song K">K Song</name>
</author>
<author>
<name sortKey="Ren, J" uniqKey="Ren J">J Ren</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Deng, M" uniqKey="Deng M">M Deng</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F. Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sozhamannan, S" uniqKey="Sozhamannan S">S Sozhamannan</name>
</author>
<author>
<name sortKey="Chute, Md" uniqKey="Chute M">MD Chute</name>
</author>
<author>
<name sortKey="Mcafee, Fd" uniqKey="Mcafee F">FD McAfee</name>
</author>
<author>
<name sortKey="Fouts, De" uniqKey="Fouts D">DE Fouts</name>
</author>
<author>
<name sortKey="Akmal, A" uniqKey="Akmal A">A Akmal</name>
</author>
<author>
<name sortKey="Galloway, Dr" uniqKey="Galloway D">DR Galloway</name>
</author>
<author>
<name sortKey="Mateczun, A" uniqKey="Mateczun A">A Mateczun</name>
</author>
<author>
<name sortKey="Baillie, Lw" uniqKey="Baillie L">LW Baillie</name>
</author>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD. Read</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spielman, Sj" uniqKey="Spielman S">SJ Spielman</name>
</author>
<author>
<name sortKey="Wilke, Co" uniqKey="Wilke C">CO. Wilke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A. Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, Q" uniqKey="Sun Q">Q Sun</name>
</author>
<author>
<name sortKey="Lan, R" uniqKey="Lan R">R Lan</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
<author>
<name sortKey="Li, P" uniqKey="Li P">P Li</name>
</author>
<author>
<name sortKey="Du, P" uniqKey="Du P">P Du</name>
</author>
<author>
<name sortKey="Xu, J" uniqKey="Xu J">J. Xu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, F" uniqKey="Tang F">F Tang</name>
</author>
<author>
<name sortKey="Bossers, A" uniqKey="Bossers A">A Bossers</name>
</author>
<author>
<name sortKey="Harders, F" uniqKey="Harders F">F Harders</name>
</author>
<author>
<name sortKey="Lu, C" uniqKey="Lu C">C Lu</name>
</author>
<author>
<name sortKey="Smith, H" uniqKey="Smith H">H. Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tatusova, T" uniqKey="Tatusova T">T Tatusova</name>
</author>
<author>
<name sortKey="Ciufo, S" uniqKey="Ciufo S">S Ciufo</name>
</author>
<author>
<name sortKey="Fedorov, B" uniqKey="Fedorov B">B Fedorov</name>
</author>
<author>
<name sortKey="O Eill, K" uniqKey="O Eill K">K O’Neill</name>
</author>
<author>
<name sortKey="Tolstoy, I" uniqKey="Tolstoy I">I. Tolstoy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tu, Q" uniqKey="Tu Q">Q Tu</name>
</author>
<author>
<name sortKey="Lin, L" uniqKey="Lin L">L. Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Den Nieuwboer, M" uniqKey="Van Den Nieuwboer M">M van den Nieuwboer</name>
</author>
<author>
<name sortKey="Van Hemert, S" uniqKey="Van Hemert S">S van Hemert</name>
</author>
<author>
<name sortKey="Claassen, E" uniqKey="Claassen E">E Claassen</name>
</author>
<author>
<name sortKey="De Vos, Wm" uniqKey="De Vos W">WM. de Vos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vinga, S" uniqKey="Vinga S">S Vinga</name>
</author>
<author>
<name sortKey="Almeida, J" uniqKey="Almeida J">J. Almeida</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Walsh, R" uniqKey="Walsh R">R Walsh</name>
</author>
<author>
<name sortKey="Thomson, Kl" uniqKey="Thomson K">KL Thomson</name>
</author>
<author>
<name sortKey="Ware, Js" uniqKey="Ware J">JS Ware</name>
</author>
<author>
<name sortKey="Funke, Bh" uniqKey="Funke B">BH Funke</name>
</author>
<author>
<name sortKey="Woodley, J" uniqKey="Woodley J">J Woodley</name>
</author>
<author>
<name sortKey="Mcguire, Kj" uniqKey="Mcguire K">KJ McGuire</name>
</author>
<author>
<name sortKey="Mazzarotto, F" uniqKey="Mazzarotto F">F Mazzarotto</name>
</author>
<author>
<name sortKey="Blair, E" uniqKey="Blair E">E Blair</name>
</author>
<author>
<name sortKey="Seller, A" uniqKey="Seller A">A Seller</name>
</author>
<author>
<name sortKey="Taylor, Jc" uniqKey="Taylor J">JC Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wan, L" uniqKey="Wan L">L Wan</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS. Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wattam, Ar" uniqKey="Wattam A">AR Wattam</name>
</author>
<author>
<name sortKey="Abraham, D" uniqKey="Abraham D">D Abraham</name>
</author>
<author>
<name sortKey="Dalay, O" uniqKey="Dalay O">O Dalay</name>
</author>
<author>
<name sortKey="Disz, Tl" uniqKey="Disz T">TL Disz</name>
</author>
<author>
<name sortKey="Driscoll, T" uniqKey="Driscoll T">T Driscoll</name>
</author>
<author>
<name sortKey="Gabbard, Jl" uniqKey="Gabbard J">JL Gabbard</name>
</author>
<author>
<name sortKey="Gillespie, Jj" uniqKey="Gillespie J">JJ Gillespie</name>
</author>
<author>
<name sortKey="Gough, R" uniqKey="Gough R">R Gough</name>
</author>
<author>
<name sortKey="Hix, D" uniqKey="Hix D">D Hix</name>
</author>
<author>
<name sortKey="Kenyon, R" uniqKey="Kenyon R">R Kenyon</name>
</author>
<author>
<name sortKey="Machi, D" uniqKey="Machi D">D. Machi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wen, J" uniqKey="Wen J">J Wen</name>
</author>
<author>
<name sortKey="Chan, Rhf" uniqKey="Chan R">RHF Chan</name>
</author>
<author>
<name sortKey="Yau, Sc" uniqKey="Yau S">SC Yau</name>
</author>
<author>
<name sortKey="He, Rl" uniqKey="He R">RL He</name>
</author>
<author>
<name sortKey="Yau, Sst" uniqKey="Yau S">SST. Yau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xiong, J" uniqKey="Xiong J">J Xiong</name>
</author>
<author>
<name sortKey="Deraspe, M" uniqKey="Deraspe M">M Déraspe</name>
</author>
<author>
<name sortKey="Iqbal, N" uniqKey="Iqbal N">N Iqbal</name>
</author>
<author>
<name sortKey="Krajden, S" uniqKey="Krajden S">S Krajden</name>
</author>
<author>
<name sortKey="Chapman, W" uniqKey="Chapman W">W Chapman</name>
</author>
<author>
<name sortKey="Dewar, K" uniqKey="Dewar K">K Dewar</name>
</author>
<author>
<name sortKey="Roy, Ph" uniqKey="Roy P">PH. Roy</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Mol Biol Evol</journal-id>
<journal-id journal-id-type="iso-abbrev">Mol. Biol. Evol</journal-id>
<journal-id journal-id-type="publisher-id">molbev</journal-id>
<journal-title-group>
<journal-title>Molecular Biology and Evolution</journal-title>
</journal-title-group>
<issn pub-type="ppub">0737-4038</issn>
<issn pub-type="epub">1537-1719</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28957508</article-id>
<article-id pub-id-type="pmc">5850840</article-id>
<article-id pub-id-type="doi">10.1093/molbev/msx200</article-id>
<article-id pub-id-type="publisher-id">msx200</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Phenetic Comparison of Prokaryotic Genomes Using k-mers</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Déraspe</surname>
<given-names>Maxime</given-names>
</name>
<xref ref-type="author-notes" rid="msx200-FM1"></xref>
<xref ref-type="aff" rid="msx200-aff1">1</xref>
<xref ref-type="aff" rid="msx200-aff2">2</xref>
<xref ref-type="aff" rid="msx200-aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Raymond</surname>
<given-names>Frédéric</given-names>
</name>
<xref ref-type="author-notes" rid="msx200-FM1"></xref>
<xref ref-type="aff" rid="msx200-aff1">1</xref>
<xref ref-type="aff" rid="msx200-aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Boisvert</surname>
<given-names>Sébastien</given-names>
</name>
<xref ref-type="aff" rid="msx200-aff4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Culley</surname>
<given-names>Alexander</given-names>
</name>
<xref ref-type="aff" rid="msx200-aff5">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Roy</surname>
<given-names>Paul H.</given-names>
</name>
<xref ref-type="aff" rid="msx200-aff1">1</xref>
<xref ref-type="aff" rid="msx200-aff5">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Laviolette</surname>
<given-names>François</given-names>
</name>
<xref ref-type="author-notes" rid="msx200-FM2"></xref>
<xref ref-type="aff" rid="msx200-aff2">2</xref>
<xref ref-type="aff" rid="msx200-aff6">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Corbeil</surname>
<given-names>Jacques</given-names>
</name>
<xref ref-type="corresp" rid="msx200-cor1"></xref>
<pmc-comment>jacques.corbeil@genome.ulaval.ca</pmc-comment>
<xref ref-type="author-notes" rid="msx200-FM2"></xref>
<xref ref-type="aff" rid="msx200-aff1">1</xref>
<xref ref-type="aff" rid="msx200-aff2">2</xref>
<xref ref-type="aff" rid="msx200-aff3">3</xref>
</contrib>
</contrib-group>
<aff id="msx200-aff1">
<label>1</label>
Centre de Recherche en Infectiologie, CHU de Québec-Université Laval, Quebec City, QC, Canada</aff>
<aff id="msx200-aff2">
<label>2</label>
Centre de Recherche en Données Massives de l’Université Laval, Quebec City, QC, Canada</aff>
<aff id="msx200-aff3">
<label>3</label>
Département de Médecine Moléculaire, Université Laval, Quebec City, QC, Canada</aff>
<aff id="msx200-aff4">
<label>4</label>
Gydle Inc., Quebec City, QC, Canada</aff>
<aff id="msx200-aff5">
<label>5</label>
Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, Quebec City, QC, Canada</aff>
<aff id="msx200-aff6">
<label>6</label>
Département d’Informatique et de Génie Logiciel, Université Laval, Quebec City, QC, Canada</aff>
<author-notes>
<fn id="msx200-FM1">
<label></label>
<p>These authors contributed equally to this work.</p>
</fn>
<fn id="msx200-FM2">
<label></label>
<p>Shared senior authorship.</p>
</fn>
<corresp id="msx200-cor1">
<label>*</label>
<bold>Corresponding author:</bold>
E-mail:
<email>jacques.corbeil@genome.ulaval.ca</email>
.</corresp>
<fn id="msx200-FM3">
<p>
<bold>Associate editor:</bold>
Miriam Barlow</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<month>10</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="epub" iso-8601-date="2017-07-16">
<day>16</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>16</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>34</volume>
<issue>10</issue>
<fpage>2716</fpage>
<lpage>2729</lpage>
<permissions>
<copyright-statement>© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="cc-by-nc" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>
), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com</license-p>
</license>
</permissions>
<self-uri xlink:href="msx200.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<p>Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of
<italic>Streptococcus pneumoniae</italic>
and
<italic>Pseudomonas aeruginosa</italic>
. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in
<italic>P. aeruginosa</italic>
than in
<italic>S. pneumoniae</italic>
, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>comparative genomics</kwd>
<kwd>microbial evolution</kwd>
<kwd>population structure</kwd>
<kwd>horizontal gene transfer</kwd>
<kwd>software</kwd>
</kwd-group>
<counts>
<page-count count="14"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Genomic data sets are continuously increasing in size and a single study now contains hundreds to thousands of samples that must be rigorously compared and clustered (
<xref rid="msx200-B53" ref-type="bibr">Nasser etal. 2014</xref>
;
<xref rid="msx200-B86" ref-type="bibr">Walsh etal. 2016</xref>
). Large-scale genomic projects such as the 1,000 genomes project (
<xref rid="msx200-B73" ref-type="bibr">Siva 2010</xref>
), the Human Microbiome Project (
<xref rid="msx200-B34" ref-type="bibr">Integrative HMP [iHMP] Research Network Consortium 2014</xref>
) or any recent epidemiological studies of outbreaks (
<xref rid="msx200-B19" ref-type="bibr">Editor 2011</xref>
;
<xref rid="msx200-B74" ref-type="bibr">Snitkin etal. 2012</xref>
;
<xref rid="msx200-B26" ref-type="bibr">Gire etal. 2014</xref>
) rely on comparative genomics and large scale phylogenies to uncover underlying biological patterns and trends. Nowadays, sequenced genomes are compared based on conserved genes, polymorphic positions and/or annotations (16S rRNA,
<italic>rpoB</italic>
,
<italic>atpB</italic>
, etc.;
<xref rid="msx200-B57" ref-type="bibr">Patwardhan etal. 2014</xref>
). For example, multilocus sequence analysis (MLSA) uses the sequences of housekeeping genes to construct phylogenies (
<xref rid="msx200-B27" ref-type="bibr">Glaeser and Kämpfer 2015</xref>
). On a larger scale, phylogenomics often compare genomes using the conserved genes of the population under study (
<xref rid="msx200-B58" ref-type="bibr">Pennisi 2008</xref>
). Another common approach for whole genome comparison is the Average Nucleotide Identity (ANI) that relies on sequence alignments in order to determine the percentage of similarity between genomes (
<xref rid="msx200-B37" ref-type="bibr">Konstantinidis etal. 2006</xref>
). Researchers are thus often interpreting their results solely based on a comparison of the shared features of their samples, an approach that may omit important genomic determinants that could better characterize and discriminate subpopulations or phenotypes (
<xref rid="msx200-B83" ref-type="bibr">Tu and Lin 2016</xref>
). Indeed, the accessory or dispensable genome can be responsible for important phenotypes such as antibiotic resistance, adaptation to specific environments or colonization of different hosts (
<xref rid="msx200-B49" ref-type="bibr">Medini etal. 2005</xref>
). Genes acquired by horizontal gene transfer (HGT) are not measured by traditional methods that use conserved genes to compute evolutionary distance between bacteria. Given the importance of the accessory genome in pathogen traits, such as virulence and antibiotic resistance, it is of interest to have analytical tools capable of comparing thousands of genome sequences without reducing analysis to conserved features.</p>
<p>K-mer-based methodologies are not new and have attracted researchers’ interest for quite a while now (
<xref rid="msx200-B85" ref-type="bibr">Vinga and Almeida 2003</xref>
;
<xref rid="msx200-B76" ref-type="bibr">Song etal. 2014</xref>
;
<xref rid="msx200-B29" ref-type="bibr">Haubold 2014</xref>
). It is the gold standard for short read assemblies with De Bruijn graphs (
<xref rid="msx200-B12" ref-type="bibr">Compeau etal. 2011</xref>
;
<xref rid="msx200-B7" ref-type="bibr">Boisvert etal. 2012</xref>
) and there are several highly efficient k-mer counters, like MSPKmerCounter (
<xref rid="msx200-B43" ref-type="bibr">Li and Yan 2015</xref>
), DSK (
<xref rid="msx200-B65" ref-type="bibr">Rizk etal. 2013</xref>
), and KMC2 (
<xref rid="msx200-B15" ref-type="bibr">Deorowicz etal. 2014</xref>
). Alignment-free sequence comparisons have been studied in numerous ways and are competitive with alignment-based methods in terms of accuracy while being generally computationally more efficient (
<xref rid="msx200-B45" ref-type="bibr">Marçais and Kingsford 2011</xref>
;
<xref rid="msx200-B25" ref-type="bibr">Gardner etal. 2015</xref>
;
<xref rid="msx200-B54" ref-type="bibr">Ondov etal. 2016</xref>
). They have also been used for the comparison of assembled microbiomes (
<xref rid="msx200-B63" ref-type="bibr">Raymond etal. 2016b</xref>
) and proved to be an important tool in the phylogenetic analysis toolbox (
<xref rid="msx200-B61" ref-type="bibr">Qi etal. 2004</xref>
;
<xref rid="msx200-B88" ref-type="bibr">Wen etal. 2014</xref>
). Comparison of k-mer content can also be combined with machine learning algorithms to predict phenotypes such as antibiotic resistance (
<xref rid="msx200-B18" ref-type="bibr">Drouin etal. 2016</xref>
).</p>
<p>In this work, we evaluated whether k-mers can be used to rapidly and accurately compare large collections of genomes. With this approach, genomes are clustered based on the similarity of their complete sequence by counting the total number of shared k-mers, including the accessory genome. In addition, we tested the hypothesis that it is possible to characterize populations of genomes based on specific features using presence/absence of k-mers related to these features. To do so, we filtered genome sequences by selecting only k-mers that were also present in a reference sequence data set, and then compared the clustering of whole genomes against the filtered genomes. The purpose of the filtered data set is to establish a functional set of genes with common characteristics. We then suggest that if genome clustering based on specific gene functions restores the population structure based on whole genomes, this functional set of genes is linked to the structure of the population under study. This suggests that the functional set of genes could have a conserved function in the population and presumably a selective pressure similar to the whole genomes, for example. On the basis of this logic, we explored this relationship by comparing a large number of bacterial genomes with several gene sequence data sets, each one representing a different functional gene category. We used reference sequence data sets of antibiotic resistance genes (ARG), insertion sequences, plasmids, bacteriophages and biosynthetic gene clusters (BGC) and observed their relationship with genome population structure for different bacterial species. This approach is implemented in the Ray Surveyor software, which is built on top of the scalable Ray framework (
<xref rid="msx200-B7" ref-type="bibr">Boisvert etal. 2012</xref>
,
<xref rid="msx200-B6" ref-type="bibr">2010</xref>
). The defining feature of Ray Surveyor is the ability to compare whole genomes based on their complete set of k-mers along subsets of their k-mers, filtered with other sequence data sets. Ray Surveyor allowed us to determine how the five genetic element categories tested are linked with the population structure of 42 species of bacteria.</p>
</sec>
<sec>
<title>Results and Discussion</title>
<sec>
<title>Validation with Simulated Genome Populations</title>
<p>To overcome possible uncertainties introduced by real genome data sets, we started by generating random phylogenetic trees (
<xref rid="msx200-B39" ref-type="bibr">Kuhner and Felsenstein 1994</xref>
;
<xref rid="msx200-B28" ref-type="bibr">Guindon and Gascuel 2002</xref>
;
<xref rid="msx200-B5" ref-type="bibr">Boc etal. 2012</xref>
) and simulating genome sequences from these trees (
<xref rid="msx200-B78" ref-type="bibr">Spielman and Wilke 2015</xref>
). Three different branch lengths were used to simulate tree structures in order to measure the impact of this parameter on the clustering methods used in Ray Surveyor analyses. The branch lengths were computed using an exponential distribution, which yielded an average depth of
<inline-formula id="IE1">
<mml:math id="IM1">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
with
<italic>n</italic>
being the number of genomes in the tree, 100 in our case. For each average branch length, ten random trees were computed to evaluate reproducibility. Sequences of one million nucleotides were produced for each simulated genome in the phylogenies in the form of an alignment, using Pyvolve (
<xref rid="msx200-B78" ref-type="bibr">Spielman and Wilke 2015</xref>
).</p>
<p>The three branch lengths we examined were chosen to model bacterial populations of within-species genomes (0.001), within-genera genomes (0.005), and interspecies genomes (0.01). This assumption was based on the ANI of all simulated trees. The ANI cutoff to distinguish bacterial species is estimated to be between 93 and 96% ANI (
<xref rid="msx200-B69" ref-type="bibr">Rossello-Mora and Amann 2015</xref>
). Consequently, trees with an average branch length of 0.001 (average ANI = 98.3%) are akin to intraspecies data sets and branch lengths of 0.01 (average ANI = 85.4%) to interspecies data sets. An average branch length of 0.005 corresponds to an ANI of 92.1% between all pairs of genomes, with 56.5% of them being below 93%. Therefore, in trees with an average branch length of 0.005, half of the genomes belong to the same bacterial species whereas the other half belongs to different species from the same genera. Although these cut-offs do not apply to all bacterial species, they generally reflect the current state of the NCBI taxonomy and they allow the evaluation of the influence of strain diversity on comparative genomics methods.</p>
<p>To allow comparison of Ray Surveyor clusters with phylogenies, we took the distance matrices derived from the simulated trees and generated a dendrogram by hierarchical clustering with the UPGMA linkage method. Similarly, the k-mer Gram matrices generated with Ray Surveyor were transformed into distance matrices upon which hierarchical clustering dendrograms were computed. Those dendrograms are referred to as phenetic trees throughout the manuscript. The cophenetic correlation coefficient (CCC;
<xref rid="msx200-B75" ref-type="bibr">Sokal and Rohlf 1962</xref>
) was then used to assess how Ray Surveyor phenetic trees correlated with the simulated phenetic trees. The CCC in our case measures how well two phenetic trees preserve the pairwise distances between all pairs of genomes. We tested the impact of four distance metrics on the transformation of the Ray Surveyor Gram matrix using Euclidean, cosine, correlation and Canberra distances. Ray Surveyor analyses were also performed with k-mer lengths ranging from 11 to 101 nucleotides to evaluate their impact on accuracy.</p>
<p>The cophenetic correlation results from Ray Surveyor analyses were affected by the average pairwise phylogenetic distance of genomes and the k-mer lengths used in the analysis (
<xref ref-type="fig" rid="msx200-F1">fig. 1
<italic>A</italic>
</xref>
). Indeed, CCCs were higher for intraspecies genome populations (lower average pairwise distance) and were only slightly affected by k-mer length or distance metrics. When genome populations grew more distant, crossing the species boundary, CCCs decreased with increasing k-mer length. By comparing distance metrics used to construct phenetic trees based on Ray Surveyor results, we observed that Euclidean, cosine, and correlation distances behaved similarly on simulated genome populations (see Materials and Methods). The Canberra distance provided lower CCC for more closely related genomes, but it was less affected by more heterogeneous populations when the k-mer lengths were increased. This result is likely due to the fact that the Canberra distance is more tolerant of low absolute values (the number of shared k-mers), as observed by
<xref rid="msx200-B44" ref-type="bibr">Loureiro etal. (2004)</xref>
. For a control, we produced alignment-based phylogenies of the simulated sequences that had average CCCs of 0.98 for branch lengths of 0.001, 0.97 for 0.005 and 0.99 for 0.01. </p>
<fig id="msx200-F1" orientation="portrait" position="float">
<label>
<sc>Fig</sc>
. 1.</label>
<caption>
<p>Evaluation of simulated genome populations with Ray Surveyor. Colors and symbols represent the distance metrics used to transform the Ray Surveyor’s Gram matrix into a distance matrix. Each column represents a different evolutionary distance between the genomes, based on the average branch length and bacterial species definition. Ten replicates were performed for each point. First row (
<italic>A</italic>
) is the cophenetic correlation between the reference phylogeny and the phenetic tree. Second row (
<italic>B</italic>
) is the Robinson–Foulds metric between the reference phylogeny and the Ray Surveyor derived tree.</p>
</caption>
<graphic xlink:href="msx200f1"></graphic>
</fig>
<p>In order to test the capability of Ray Surveyor to restore good topologies for phylogenetic trees, we also computed a Neighbor-Joining tree for all the distance matrices. In this comparison, we used the original simulated phylogenetic tree against those derived from Ray Surveyor. The Robinson–Foulds (RF) metric allows a comparison of unrooted phylogenetic trees, essentially by measuring the number of changes required to align two trees together by transforming one tree into the other (
<xref rid="msx200-B66" ref-type="bibr">Robinson and Foulds 1981</xref>
). Similar to the cophenetic correlation, the RF results varied with sequence diversity and k-mer length (
<xref ref-type="fig" rid="msx200-F1">fig. 1
<italic>B</italic>
</xref>
). For the intraspecies genome populations (branch length = 0.001, average ANI = 98.3%) longer k-mer length performed better and peaked with the 101-mers and the cosine metrics. At the species boundary, the cosine distance metrics yielded the best topological trees with 31-mers. When comparing genomes of different species (branch length = 0.01, average ANI = 85.4%), a k-mer length < of 31 yielded better topological trees for the cosine, correlation and Euclidean metrics.</p>
<p>On the basis of these results and on the literature, the choice of k-mer length can be seen as a trade-off between sensitivity and specificity (
<xref rid="msx200-B54" ref-type="bibr">Ondov etal. 2016</xref>
). Evolutionarily distant genomes require shorter k-mers to get a good signal (sensitivity) whereas more similar genomes benefit from larger k-mer lengths for more specificity. Moreover, previous studies have shown the efficiency of 31-mers in genome clustering (
<xref rid="msx200-B50" ref-type="bibr">Melsted and Pritchard 2011</xref>
) and the robustness in bacterial metagenome profiling (
<xref rid="msx200-B7" ref-type="bibr">Boisvert etal. 2012</xref>
) when this length of k-mer is used. For the following analyses on real genome data sets, we selected a length of 31-mers, which offers a compromise between sensitivity and specificity for both intraspecies and interspecies comparison. We also focused our analyses on the cophenetic correlation for the phenetic trees, since we needed to characterize genomes based on specific genetic elements rather than finding their ancestral history.</p>
</sec>
<sec>
<title>Population Scale Genomics with k-mers</title>
<p>This section aims to benchmark the application of the Ray Surveyor genome comparison in comparative genomics projects and to assess how it performs on microbial populations of different scales. As a first step, we validated that k-mer-based phenetic trees accurately reflected previously determined phylogenies based on publicly available comparative genomic studies of
<italic>Streptococcus pneumoniae</italic>
and
<italic>Pseudomonas aeruginosa</italic>
(
<xref ref-type="fig" rid="msx200-F2">fig. 2</xref>
). For
<italic>P. aeruginosa</italic>
, 387 genomes were taken from a study by
<xref rid="msx200-B38" ref-type="bibr">Kos etal. (2015</xref>
;
<xref ref-type="fig" rid="msx200-F2">fig. 2
<italic>A</italic>
</xref>
). For
<italic>S. pneumoniae</italic>
, a first data set of 616 genomes from Croucher and collaborators was used, along with a second data set comprising 173 genomes previously studied by Hilty and collaborators to investigate the difference between encapsulated and nonencapsulated pneumococci (
<xref rid="msx200-B13" ref-type="bibr">Croucher etal. 2013</xref>
;
<xref rid="msx200-B17" ref-type="bibr">Donati etal. 2010</xref>
;
<xref rid="msx200-B32" ref-type="bibr">Hilty etal. 2014</xref>
). Whole genome phylogenies were obtained from the authors for the Kos and the Croucher data sets, while the phylogeny for the Hilty collection was built using 602 conserved genes. We calculated the cophenetic correlation between the phenetic trees (hierarchical cluster dendrograms) created using Ray Surveyor and the derived phenetic trees from the phylogenies for these three data sets (
<xref ref-type="fig" rid="msx200-F2">fig. 2
<italic>A</italic>
</xref>
). All four distance metrics (see Materials and Methods) performed above 0.91 CCC on
<italic>P. aeruginosa</italic>
, with the Canberra distance yielding the highest CCC of 0.97. Correlation distance had the highest CCC (0.92) compared with other distance metrics (<0.75) for
<italic>S. pneumoniae</italic>
. The Hilty and collaborators data set of
<italic>S. pneumoniae</italic>
genomes was tested and provided 0.89 CCC between correlation distance based on k-mers and the core genome phenetic tree. Heatmaps representing the clustering based on the distance between isolates of the Croucher and Kos data sets are shown in
<xref ref-type="supplementary-material" rid="sup1">supplementary figures 1 and 2</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online. </p>
<fig id="msx200-F2" orientation="portrait" position="float">
<label>
<sc>Fig</sc>
. 2.</label>
<caption>
<p>Comparison of phenetic trees created using Ray Surveyor to phylogenies calculated using conserved genomes or marker genes for
<italic>Pseudomonas aeruginosa</italic>
and
<italic>Streptococcus pneumoniae.</italic>
(
<italic>A</italic>
) Cophenetic correlation between alignment-based phylogeny and phenetic trees calculated using four different distance metrics. (
<italic>B</italic>
) Fowlkes–Marlows index comparing clustering done using Ray Surveyor (correlation distance metric) and phylogeny compared with classification based on multiple locus sequence typing or serotypes.</p>
</caption>
<graphic xlink:href="msx200f2"></graphic>
</fig>
<p>This approach can also be used to quickly add a new genome to an existing phylogeny. For example, we added the recently sequenced genome of
<italic>P. aeruginosa</italic>
strain E6130952 to the Kos
<italic>etal.</italic>
genome collection (CP020603.1 [
<ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/nuccore/CP020603">https://www.ncbi.nlm.nih.gov/nuccore/CP020603</ext-link>
; last accessed July 19, 2017];
<xref ref-type="supplementary-material" rid="sup1">supplementary fig. 3</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online). This pathogenic strain was isolated from a patient with respiratory failure and was resistant to all tested antibiotics, including colistin (
<xref rid="msx200-B89" ref-type="bibr">Xiong etal. 2017</xref>
). The closest isolate in the phylogeny (AZPAE14730) was also resistant to levofloxacin, meropenem, and amikacin, but not to colistin (
<xref rid="msx200-B38" ref-type="bibr">Kos etal. 2015</xref>
). Both strains have a similar genome size and share 97% of their k-mers.</p>
<p>In epidemiological studies, genomes are often classified based on experimentally derived categories such as multilocus sequence typing or serotypes. The Fowlkes–Mallows index (FMI) allows calculation of the similarity between two clusterings (
<xref rid="msx200-B23" ref-type="bibr">Fowlkes and Mallows 1983</xref>
) and can be used to compare clustering based on k-mers or phylogeny to categorical information of clinical relevance. Thus, we used this metric to quantify the concordance between the clusters generated with Ray Surveyor or with phylogeny to metadata associated with genomes. Therefore, we calculated the FMI between clustering based on the phylogenetic and phenetic trees of
<italic>P. aeruginosa</italic>
and
<italic>S. pneumoniae</italic>
when compared with MLST and serotype genome classification, for a range of 2 to
<italic>N</italic>
clusters (
<xref ref-type="fig" rid="msx200-F2">fig. 2
<italic>B</italic>
</xref>
). Phylogenetic genome comparison and k-mer-based comparison provided similar results when compared with MLST or serotype categorization. The highest divergence in FMI between phylogeny and k-mers was <5%. Similarity with MLST was higher (≥ 85%) than similarity with serotype (≤ 67%), suggesting that MLST is more related to complete genome phylogeny than serotype. Indeed, in
<italic>S. pneumoniae</italic>
, the capsular operon can be modified through capsular switching, a process that decouples serotypes from the core and accessory genomes (
<xref rid="msx200-B2" ref-type="bibr">Andam and Hanage 2015</xref>
). In the Hilty data set, genomes from different strain types could be associated within the category of nonencapsulated
<italic>S. pneumoniae</italic>
, thus explaining the low FMI of serotypes in comparison to the near-perfect FMI obtained when benchmarking against MLST results.</p>
<p>In order to explore Ray Surveyor’s capacity to work with a large number of distantly related genomes, we created a data set of 2,429 complete genomes from 30 phyla in the domain
<italic>Bacteria</italic>
. The 2,429 bacterial genomes from which this data set was derived were selected in order to limit the bias caused by a relative overrepresentation of certain genomes in the public database, such as laboratory strains of
<italic>Escherichia coli</italic>
or clonal isolates from epidemiological studies. We compared the phenetic tree built with these genomes using Ray Surveyor to the 16S rRNA phylogenetic tree of these strains. Canberra distance was the best performing metric (0.69 CCC compared with <0.10 for other distance metrics) for the tree of 2,429 bacterial genomes, most certainly because of the low number of shared k-mers between distant genomes (
<xref ref-type="fig" rid="msx200-F3">fig. 3
<italic>A</italic>
</xref>
). We also used the FMI to compare 16S phylogenetic and Ray Surveyor phenetic trees to the taxonomical classification of genomes at the family rank based on the NCBI taxonomy. Although the NCBI taxonomy may not always be in line with other taxonomies, it provides a convenient way to perform taxonomy-related analyses with genomic sequences obtained from NCBI (
<xref rid="msx200-B20" ref-type="bibr">Federhen 2012</xref>
;
<xref rid="msx200-B3" ref-type="bibr">Balvočit and Huson 2017</xref>
). When comparing the classification of 2,429 genomes from 262 bacterial families to genome-based clustering, the peak FMI was 67% for k-mers (469 clusters) compared with 68% for 16S phylogeny (310 clusters;
<xref ref-type="fig" rid="msx200-F3">fig. 3
<italic>B</italic>
</xref>
). While these methods had similar correlations with current NCBI taxonomy at the family rank, we also observed that the accuracy of clusters was influenced by the number of genomes within each bacterial family (
<xref ref-type="fig" rid="msx200-F3">fig. 3
<italic>B</italic>
</xref>
). When considering only bacterial families represented by at least 20 genome sequences (39 families), k-mers had a maximal FMI value of 78% (at 167 clusters) compared with phylogeny which had a maximal value of 77% at 107 clusters. In contrast, when considering families represented by <20 genomes (223 families), FMI was 62% for k-mer analysis (378 clusters) compared with 71% for 16S rRNA phylogenetic trees (451 clusters). The discrepancies between 16S rRNA phylogeny and k-mer-based clustering were mainly associated with regions where only a small number of genomes were included in the analysis. Additionally, the low count of shared k-mers between these small groups of genomes and the rest of the taxa makes it hard to find common ancestors and thus infer their correct placement in the final dendrogram. Hence, efficient clustering of phylogenetically distant bacteria that share a nonsignificant amount of k-mers would require more intermediate genomes to effectively drive the hierarchical clustering and a shorter k-mer length to get more signal. </p>
<fig id="msx200-F3" orientation="portrait" position="float">
<label>
<sc>Fig</sc>
. 3.</label>
<caption>
<p>Comparison of phenetic trees created using Ray Surveyor to phylogeny based on 16S gene sequence for 2,429 bacterial genomes. (
<italic>A</italic>
) Cophenetic correlation between alignment-based phylogeny and phenetic trees calculated using four different distance metrics. (
<italic>B</italic>
) Fowlkes–Marlows index comparing clustering done using Ray Surveyor (correlation distance metric) and phylogeny compared with taxonomical classification at the family rank.</p>
</caption>
<graphic xlink:href="msx200f3"></graphic>
</fig>
<p>To investigate the relationship between traits and genome clustering, quantitative and qualitative metadata can be plotted against a phenetic tree. For example,
<xref ref-type="supplementary-material" rid="sup1">supplementary figure 4</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online, plots a phenetic tree of 2,429 bacterial genomes versus their GC-content and their taxonomic class rank. In this representation, differences in GC-content seem related to the taxonomical classification. Because phenetic trees do not rely on sequence alignments, we cannot correct for GC-content or codon bias using substitution models or other methods, as suggested in the literature (
<xref rid="msx200-B52" ref-type="bibr">Mooers etal. 2000</xref>
). Therefore, we do not expect branch lengths, generated using our k-mer approach, to be representative of evolutionary distance. The clustering of high taxonomic rank could also be biased by GC content (Mooers and Holmes 2000). At the k-mer level, differences in GC-content and codon usage should negatively affect k-mer similarity. Indeed, k-mer similarity is expected to decrease quickly as the number of mismatches increase. Previous studies have shown that the type of environment and particular lifestyles of the bacteria is related to genomic GC-content and codon usage (
<xref rid="msx200-B22" ref-type="bibr">Foerstner etal. 2005</xref>
;
<xref rid="msx200-B8" ref-type="bibr">Botzman and Margalit 2011</xref>
;
<xref rid="msx200-B42" ref-type="bibr">Lassalle etal. 2015</xref>
). Differences in ecological niches are also reflected in the accessory genome, which can lead to large differences in k-mer content (
<xref rid="msx200-B49" ref-type="bibr">Medini etal. 2005</xref>
).</p>
</sec>
<sec>
<title>Comparing Genomes Based on Specific Traits</title>
<p>Not all genes within a genome have the same association with the evolutionary story of a species as inferred from phylogeny (
<xref rid="msx200-B40" ref-type="bibr">Land etal. 2015</xref>
). For example, genes acquired by HGT may not be linked to the phylogeny of a species and may have been acquired independently by different strains, for example, genes from mobile elements, bacteriophages or plasmids (
<xref rid="msx200-B59" ref-type="bibr">Philippe and Douady 2003</xref>
). Resistance genes as well as secondary metabolite operons (
<xref rid="msx200-B16" ref-type="bibr">Dobrindt etal. 2004</xref>
) can also be disseminated by HGT (
<xref rid="msx200-B56" ref-type="bibr">Pärnänen etal. 2016</xref>
).</p>
<p>In order to investigate HGT patterns in our data set, we developed an approach to quantify how the phenetic tree generated using a subset of k-mers reflected the tree generated using the total k-mer content of a genome. We hypothesize that if the two trees are correlated, the group of k-mers is linked to the phylogeny of the studied population. Conversely, the absence of correlation indicates independence between the whole genome population and the filtered genome population. The first steps to conduct the analyses are similar to the ones’ explained in the two previous sections. We first calculated a Gram matrix of shared k-mers for all pairs of genomes. For each population two Gram matrices were produced, one with the total count of shared k-mers between the genomes and the second containing only the count of shared k-mers included in the filtering data sets. We then generated a distance matrix for the complete and filtered Gram matrices using the Canberra distance, which we chose in order to reduce bias caused by samples with a limited number of filtered k-mers. Phenetic trees were then built using UPGMA clustering on the distance matrices. We aligned the heatmaps of the clusters based on the whole genome phenetic tree to visualize its similarity with the filtered phenetic tree. In addition, the correlation between phenetic trees based on complete k-mer content and filtered k-mer sets was quantified using CCC. A coefficient of 0 indicates the absence of correlation whereas a coefficient of 1 indicates perfect cophenetic correlation between selected k-mers and complete genomes, thereby suggesting that these k-mers are associated with the phylogeny of the population.</p>
<p>In our initial analysis, we further investigated genome populations of
<italic>S. pneumoniae</italic>
and
<italic>P. aeruginosa</italic>
and the 2,429 bacterial genomes using subsets of k-mers that could be acquired through HGT and may have an impact on the evolution of bacterial species. We used five filtering data sets: mobile elements (insertion sequences), resistance genes, bacteriophages, plasmids, and BGC. The filtering analyses were produced using the strict inclusion of k-mers from the filtering data sets. However, for the plasmids filtering, we also excluded the k-mers from the resistance genes and mobile elements data sets as these genetic elements often co-appear on plasmids and chromosomes. As represented in
<xref ref-type="fig" rid="msx200-F4">figure 4</xref>
, the coherence between the heatmaps based on filtering and those based on complete genomes, also expressed quantitatively by the cophenetic correlation, is different between filtering data sets and genome collections.
<italic>Streptococcus pneumoniae</italic>
showed low (0.28 CCC) correlation between antibiotic resistance k-mers and complete genome clustering. BGC (0.48 CCC) and plasmids (0.52 CCC) had moderate correlation with complete genome clustering. The genomes harbor on average 403 and 5,636 k-mers for BGC and plasmids, respectively, suggesting sequences from these origins are not widely abundant in the species, although they are correlated with the structure of the population.
<italic>Streptococcus pneumoniae</italic>
does not frequently harbor plasmids, which is reflected in the count of k-mers related to these genetic elements (
<xref rid="msx200-B68" ref-type="bibr">Romero etal. 2007</xref>
). The lack of characterized BGC from the species in the filtering data set could also have an impact on the moderate correlation. In contrast, the
<italic>P. aeruginosa</italic>
phenetic trees based on resistance genes (0.98 CCC) and BGC (0.92 CCC) were highly correlated with phenetic trees based on the whole genome. The number of shared k-mers associated with the two filtering data sets was on average 41,240 and 63,820 k-mers, respectively. Similar results were obtained on 71 genomes of
<italic>P. aeruginosa</italic>
downloaded from the PATRIC database (
<xref rid="msx200-B90" ref-type="bibr">Wattam etal. 2014</xref>
), which included some environmental samples, and on 500
<italic>P. aeruginosa</italic>
genomes randomly selected from NCBI (
<xref ref-type="supplementary-material" rid="sup1">supplementary fig. 5</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online). In the case of the 2,429 bacterial genomes data set, the whole genome phylogeny was highly correlated with plasmids and BGC. The overall relationship between representative taxa in the domain
<italic>Bacteria</italic>
was not distinctively defined by resistance genes, which are broadly distributed in the microbial tree of life and can be associated with HGT (
<xref rid="msx200-B51" ref-type="bibr">Metcalf etal. 2014</xref>
) </p>
<fig id="msx200-F4" orientation="portrait" position="float">
<label>
<sc>Fig</sc>
. 4.</label>
<caption>
<p>Comparison of the relationship between strains when genome sequences are filtered using one of five filtering data sets for
<italic>Streptococcus pneumoniae</italic>
,
<italic>Pseudomonas aeruginosa</italic>
and the 2,429 representative bacterial genomes. The Heatmap represents the Canberra distance between genomes collated on a subset of k-mers. The
<italic>X</italic>
and
<italic>Y</italic>
axis of the heatmap are genomes ordered based on hierarchical clustering of the complete genome. The number in top left corner of heatmaps is the cophenetic distance, expressed in percentages, between filtered data sets and whole genome phenetic tree. The darker the shade of blue, the higher the similarity between samples.</p>
</caption>
<graphic xlink:href="msx200f4"></graphic>
</fig>
<p>In order to dissect the relationship between bacterial pathogens and the five filtering data sets, we applied the methodology described above to 42 bacterial species for which at least 100 genomes were available in the NCBI RefSeq database (
<xref ref-type="fig" rid="msx200-F5">fig. 5</xref>
). These taxa are associated with human infections, with the exception of
<italic>Lactobacillus plantarum</italic>
which is found in fermented food (
<xref rid="msx200-B84" ref-type="bibr">van den Nieuwboer etal. 2016</xref>
). Our hypothesis is that high cophenetic correlation of clustering between complete and filtered k-mer content is a good indicator of how the tested elements are related to the phylogeny of the species. </p>
<fig id="msx200-F5" orientation="portrait" position="float">
<label>
<sc>Fig</sc>
. 5.</label>
<caption>
<p>Cophenetic distance between phenetic trees based on whole genome and filtered data sets for 42 bacterial species from RefSeq that included at least 100 genomes. Intensity of heatmap represents the cophenetic correlation as shown in the legend. Numbers in the heatmap are percentages of genomes with zero k-mers associated with relevant filtering data set.</p>
</caption>
<graphic xlink:href="msx200f5"></graphic>
</fig>
<p>The majority of the gammaproteobacteria had strong correlations with the ARG data set, especially species from
<italic>Klebsiella</italic>
,
<italic>Escherichia</italic>
,
<italic>Enterobacter</italic>
,
<italic>Vibrio</italic>
,
<italic>Pseudomonas</italic>
, and
<italic>Acinetobacter</italic>
. This could be related to the large number of intrinsic resistance determinants characterized in those species, especially the drug efflux systems (
<xref rid="msx200-B67" ref-type="bibr">Rodionov etal. 2001</xref>
). Other studies have put into evidence the importance of bacteriophages and plasmids in the ongoing evolution of the
<italic>Vibrio</italic>
genus (
<xref rid="msx200-B30" ref-type="bibr">Hazen etal. 2010</xref>
), as reflected in
<xref ref-type="fig" rid="msx200-F5">figure 5</xref>
.
<italic>Shigella flexneri</italic>
is the Proteobacteria with the highest correlation with bacteriophages (0.83 CCC). Indeed, their O-antigens were often modified by serotype-converting bacteriophages (
<xref rid="msx200-B1" ref-type="bibr">Allison and Verma 2000</xref>
;
<xref rid="msx200-B80" ref-type="bibr">Sun etal. 2013</xref>
). To further investigate this question, we used alignments to validate which bacteriophages used for filtering would be found in the 147
<italic>Shigella</italic>
genomes. Interestingly, we found some specific prophage sequences that could delineate the clusters seen with clustering based only on phage k-mers (
<xref ref-type="supplementary-material" rid="sup1">supplementary fig. 6</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online). Polysaccharides-related BGC, which encode capsular antigens and O-antigens, could thus explain the high CCC of BGC for
<italic>S. flexneri</italic>
and
<italic>Vibrio cholerae</italic>
(
<xref rid="msx200-B9" ref-type="bibr">Cimermancic etal. 2014</xref>
). On the other hand,
<italic>E. coli</italic>
has several characterized BGC in the MIBiG database while showing moderate correlation with the whole genome (0.49 CCC;
<xref rid="msx200-B48" ref-type="bibr">Medema etal. 2015b</xref>
). Comparison of clustering between whole genome and BGC of
<italic>E. coli</italic>
indicate that a portion of the population can be delineated by BGC while others seem unrelated (
<xref ref-type="supplementary-material" rid="sup1">supplementary fig. 7</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online). The
<italic>Francisella tularensis</italic>
genome can contain over 100 insertion sequence genes (
<xref rid="msx200-B41" ref-type="bibr">Larsson etal. 2009</xref>
), which could explain its high correlation with mobile elements. In opposition to most of the tested species,
<italic>F. tularensis</italic>
was also significantly correlated with plasmids. This high correlation could be related to a misannotated 100 kb plasmid that is in fact part of the
<italic>F. tularensis</italic>
genome (CP010448.1 which was replaced by CP010446.2). This large chromosomal region could indeed have boosted the impact of plasmids in the correlation observed, as it is integrated to the genome. It is important to consider that for most genomes in RefSeq, the plasmid sequences are found under a different accession number than the genome, therefore it is not considered in the clustering. In whole genome shotgun sequencing, plasmid sequences are generally included in the assemblies, thus plasmid filtering could prove useful to exclude these sequences from whole genome comparisons.</p>
<p>Six species from the
<italic>Firmicutes</italic>
phylum had correlation >0.70 CCC with ARG.
<italic>Bacillus anthracis</italic>
,
<italic>B. cereus</italic>
and
<italic>B. subtilis</italic>
were all above 0.85 CCC for ARG. This high correlation could originate from the chromosome-encoded
<italic>β</italic>
-lactamases harbored by the species (
<xref rid="msx200-B11" ref-type="bibr">Colombo etal. 2004</xref>
;
<xref rid="msx200-B21" ref-type="bibr">Fenselau etal. 2008</xref>
;
<xref rid="msx200-B46" ref-type="bibr">Materon etal. 2003</xref>
). The other members of the phylum,
<italic>Firmicutes</italic>
having good correlation with ARG, were
<italic>Listeria monocytogenes</italic>
0.93 CCC,
<italic>Enterococcus faecalis</italic>
0.82 CCC, and
<italic>S. pneumoniae</italic>
0.70 CCC. The three
<italic>Bacillus</italic>
species also had CCC >0.70 for BGC. Bacilli are known to produce several types of secondary metabolites (
<xref rid="msx200-B70" ref-type="bibr">Sansinenea and Ortiz 2011</xref>
). All the
<italic>Firmicutes</italic>
analyzed were below 0.55 CCC with the mobile elements data set. In
<italic>Firmicutes</italic>
, bacteriophages had best correlations with
<italic>Streptococcus suis</italic>
(0.87 CCC) and
<italic>B. anthracis</italic>
(0.98 CCC). The 500
<italic>S. suis</italic>
genomes had an important number of k-mers associated with phages (18,642 in average), that along with the correlation, supported the idea that prophage sequences in the species are linked to the whole genome phylogeny. It was also shown in previous observations that remnants of phage sequences are distributed throughout
<italic>S. suis</italic>
genomes (
<xref rid="msx200-B81" ref-type="bibr">Tang etal. 2013</xref>
).
<italic>Bacillus anthracis</italic>
had a high correlation with bacteriophages compared with the other
<italic>Bacillus</italic>
species, although shared k-mers from the filtering data set were not numerous (3,936 in average). The correlation could be related to four defective and conserved prophages harbored by the species as reported in
<xref rid="msx200-B77" ref-type="bibr">Sozhamannan etal. (2006)</xref>
. In agreement with our results, they suggested that these prophages could be used as a chromosomal signature of the species. Bacteriophages could also be associated with ecological adaptation in
<italic>B. anthracis</italic>
(
<xref rid="msx200-B71" ref-type="bibr">Schuch and Fischetti 2009</xref>
).</p>
<p>Overall, the interpretation of the results represented in
<xref ref-type="fig" rid="msx200-F5">figure 5</xref>
supports our hypothesis that correlation between filtered genomes and complete genomes indicates a relationship between selected k-mers and a species. In many cases, we observed that a cophenetic correlation occurred in species where potentially mobile genetic elements were integrated in the genome. Thus, this methodology could potentially indicate integration and conservation of these elements in the genome of a particular species, or at least their phylotype dependence.</p>
</sec>
</sec>
<sec sec-type="conclusion">
<title>Conclusion</title>
<p>By comparing the k-mer composition of genomes, we were able to reconstruct the phenetic tree of large bacterial epidemiological genomics data sets, as we demonstrated with the
<italic>S. pneumoniae</italic>
and
<italic>P. aeruginosa</italic>
data sets. We also evaluated the accuracy of the methods on synthetic genome data sets by testing different parameters that influence this kind of analysis. The methodology is based on whole genome analysis rather than on a subset of core genes, which has been shown to introduce bias (
<xref rid="msx200-B72" ref-type="bibr">Shapiro etal. 2012</xref>
;
<xref rid="msx200-B4" ref-type="bibr">Biek etal. 2015</xref>
). The use of k-mers allows comparison of genomes based on characteristics that are either conserved or specific. We also applied the method to a data set of 2,429 bacterial genomes spanning the whole bacterial tree of life, without a selection of features such as conserved genes or ribosomal RNA. This approach makes Ray Surveyor an effective tool for scalable analyses in comparative genomics research, among other applications. Using k-mers to build phenetic trees could be used to easily position newly sequenced genomes in the microbial tree of life and infer classification or to determine which branches of the tree of life are not well represented in terms of genome sequences relative to internal taxa diversity.</p>
<p>Analysis of population structures can further be partitioned by filtering subsets of k-mers associated with gene categories or functions. Our results demonstrate that comparison of genomes based on specific subsets of k-mers can reveal their relationship at the population scale. Indeed, without being specific about the genetic determinants involved, the method allows easy determination of strain clusters with similar potential regarding the functions of the filtered data set, such as antibiotic resistance or HGT as shown in this study. A limitation of the filtering approach is that it involves the gathering of sequence data that adequately represents the diversity of the genes or functional category under study. For example, using only reference resistance genes instead of a large collection of orthologs, paralogs, and variants would underestimate the abundance of resistance genes in genomes containing variants of the reference gene. Still, some sequence types, such as bacteriophages or BGC, could be underrepresented in the databases used in this study. Such sequences, could have potentially resulted in more significant results, provided the availability of a more exhaustive and diverse sequence data set. As seen in
<xref ref-type="fig" rid="msx200-F4">figure 4</xref>
for the 2,429 bacterial genomes, some clusters of genomes show high bacteriophage signals in comparison to other regions of the heatmap. Indeed, of the 262 bacterial families included in the 2,429 genome analysis, 147 families had ≤ 100 k-mers associated with phage sequences, suggesting that some families could suffer from a lack of characterized phages in the database used for profiling (EBI). This issue should be alleviated by better filtering data sets as more sequences and better annotations become available in public databases.</p>
<p>Ray Surveyor is a powerful tool that allows the reconstruction and interpretation of the phenetic relationships underlying populations of bacterial species. By taking into account clinical or environmental context with the sequence filtering capabilities, this method could allow an intuitive representation of population structures and the genomic features related to their differentiation or phenotype. It is thus a hypothesis-generating tool that could be applied to investigate the importance of specific gene categories not only in pathogens but also in environmental microbial communities and in the analysis of transcriptomic and metagenomic-based research.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Theoretical Background and Software Implementation</title>
<p>Ray Surveyor is built on top of the highly scalable Ray framework, which includes the Ray assembler and RayPlatform (
<xref rid="msx200-B7" ref-type="bibr">Boisvert etal. 2012</xref>
,
<xref rid="msx200-B6" ref-type="bibr">2010</xref>
). It uses the message-passing interface (MPI) to scale analysis on supercomputers. However, depending on their size, data sets can be analyzed on smaller servers or personal computers. Components of the software include, among others, a sparse distributed hash table to store the k-mers on each computer across a cluster, as well as a graph coloring scheme that associates each k-mer vertex of the de Bruijn graph with its profiling data sets. Ray Surveyor is also based on the actor model (
<xref rid="msx200-B31" ref-type="bibr">Hewitt 1977</xref>
); each actor takes care of its own task such as reading and k-merizing input sequences, gathering k-mers into a store keeper, counting the k-mers and building the Gram matrix.
<xref ref-type="supplementary-material" rid="sup1">supplementary figure 8</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online, provides further details on the actors’ roles and their ways of communicating.</p>
<p>The first step of Ray Surveyor is to split the genome sequences into k-mers and build a graph of the pangenome. The k-mer length is set by the user. We recommend using a length between 21 and 61 nucleotides, usually 31 for the comparison of bacterial genomes. The workflow then proceeds with graph coloring, which assigns a virtual color for each k-mer according to the combination of genomes or functional data sets that carry it. The next step is to iterate over each k-mer and increment the count of shared k-mers between each pair of genomes of that color and store them in the Gram matrix. Formally, each pair of genome comparisons can be seen as a simple D
<sub>2</sub>
statistic (
<xref rid="msx200-B64" ref-type="bibr">Reinert etal. 2009</xref>
;
<xref rid="msx200-B87" ref-type="bibr">Wan etal. 2010</xref>
) with a binary count (presence/absence) of their k-mers. Since our counts are dichotomic, we can formally define the Ray Surveyor mechanics based on set theory.</p>
<p>Let
<inline-formula id="IE2">
<mml:math id="IM2">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mn></mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
be the set of all the k-mers of genome
<italic>i</italic>
, and similarly
<inline-formula id="IE3">
<mml:math id="IM3">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mn></mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
the set of all the k-mers of genome
<italic>j</italic>
. Then, the Gram matrix (
<italic>K</italic>
) is defined such that
<inline-formula id="IE4">
<mml:math id="IM4">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
. Let
<inline-formula id="IE5">
<mml:math id="IM5">
<mml:mrow>
<mml:mi>Z</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mn></mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
be
<italic>m</italic>
filtering data sets and
<inline-formula id="IE6">
<mml:math id="IM6">
<mml:mrow>
<mml:mi>Y</mml:mi>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
their union. To filter in (include only) the k-mer set
<italic>Y</italic>
,
<inline-formula id="IE7">
<mml:math id="IM7">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
and to filter out (exclude) the k-mer set
<italic>Y</italic>
, then
<inline-formula id="IE8">
<mml:math id="IM8">
<mml:msub>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>(</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>\</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>|</mml:mo>
</mml:math>
</inline-formula>
. The resulting matrix K is then normalized to have values in the range [0, 1], with the diagonal entries equal to 1. Consequently, the entries of the normalized matrix
<inline-formula id="IE9">
<mml:math id="IM9">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
are given by
<inline-formula id="IE10">
<mml:math id="IM10">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>*</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
. However, when filtering is used, we recommend division of the entries
<italic>k
<sub>i</sub>
</italic>
<sub>,</sub>
<sub>
<italic>j</italic>
</sub>
by the
<italic>k
<sub>i</sub>
</italic>
<sub>,</sub>
<sub>
<italic>i</italic>
</sub>
and
<italic>k
<sub>j</sub>
</italic>
<sub>,</sub>
<sub>
<italic>j</italic>
</sub>
of the full k-mer matrix, rather than the filtered version. The reason is that the diagonal of the filtered matrix no longer represents the total number of k-mers per genome, but only the number of filtered k-mers, a subset of the genome. This renders the matrices more comparable, as they are all normalized with respect to the same total k-mer content.</p>
<p>After normalization, the matrix is transformed into a distance matrix with a chosen metric. We focused our experiments on four metrics that are the cosine, correlation, Euclidean and Canberra. Below, we formally define the distance formulae by using
<italic>u</italic>
and
<italic>v</italic>
and the normalized vectors of shared k-mers between a genome and all the other genomes in the population. For instance, the entry
<italic>d</italic>
<sub>1,2</sub>
in the distance matrix
<italic>D</italic>
, would be defined as
<inline-formula id="IE11">
<mml:math id="IM11">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo></mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:munderover>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo></mml:mo>
<mml:mo>1</mml:mo>
</mml:munderover>
<mml:mo>·</mml:mo>
<mml:munderover>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo></mml:mo>
<mml:mo>2</mml:mo>
</mml:munderover>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:munderover>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo></mml:mo>
<mml:mo>1</mml:mo>
</mml:munderover>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:munderover>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo></mml:mo>
<mml:mo>2</mml:mo>
</mml:munderover>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
for the cosine distance metric. With the vectors
<inline-formula id="IE12">
<mml:math id="IM12">
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo></mml:mo>
<mml:mo>1</mml:mo>
</mml:munderover>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula id="IE13">
<mml:math id="IM13">
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mi mathvariant="bold-italic">k</mml:mi>
<mml:mo></mml:mo>
<mml:mo>2</mml:mo>
</mml:munderover>
</mml:mrow>
</mml:math>
</inline-formula>
, here are the formula of the four distance metrics tested in our study:
<list list-type="bullet">
<list-item>
<p>cosine:
<disp-formula id="E1">
<label>(1)</label>
<mml:math id="M1">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo></mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>·</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
<list-item>
<p>correlation:
<disp-formula id="E2">
<label>(2)</label>
<mml:math id="M2">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo></mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>u</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>·</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>v</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>u</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>v</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
<list-item>
<p>Euclidean
<disp-formula id="E3">
<label>(3)</label>
<mml:math id="M3">
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo></mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mo>|</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
<list-item>
<p>Canberra:
<disp-formula id="E4">
<label>(4)</label>
<mml:math id="M4">
<mml:mrow>
<mml:munder>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mo>+</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
</list>
</p>
<p>An important limitation of the cosine and correlation distances is that they cannot be evaluated if one of the vectors only contain zeros. This means that if a genome does not share any k-mer with all the other genomes, the two metrics will fail with an undefined behavior due to the division by zero (from ‖
<italic>u</italic>
<sub>2</sub>
or ‖
<italic>v</italic>
<sub>2</sub>
). This may also happen when we filter the comparison with a functional data set and there is one genome that doesn’t harbor any k-mer from it. The two other metrics (Euclidean and Canberra) are robust to those outliers without shared k-mers but their results are still influenced by them. Hence, species with a large proportion of genomes containing no k-mer of the filtering data set should not be interpreted with this methodology. Undefined distances with cosine and correlation metrics were set to zero in our experiments. For this reason, in the manuscript, figures showing cophenetic distance of filtered data sets used the Canberra distance.</p>
<p>The matrix computation in Ray Surveyor uses the SciPy python package (
<xref rid="msx200-B35" ref-type="bibr">Jones etal. 2001</xref>
). Computation of distance metrics can also be performed with R software. Moreover, the Ray Surveyor scripts allow computation of a Newick tree from the distance matrix either with the Neighbor-Joining or UPGMA method (unweighted pair group method with arithmetic mean) based on the scikit-bio and BioPython packages (
<xref rid="msx200-B10" ref-type="bibr">Cock etal. 2009</xref>
).</p>
</sec>
</sec>
<sec>
<title>Phenetic and Phylogenetic Analysis</title>
<sec>
<title>Simulated Data Sets</title>
<p>Simulated trees with three different average branch lengths (0.001, 0.005, 0.01) were randomly produced to represent different evolutionary distances of 100 genomes (
<xref rid="msx200-B39" ref-type="bibr">Kuhner and Felsenstein 1994</xref>
;
<xref rid="msx200-B28" ref-type="bibr">Guindon and Gascuel 2002</xref>
). For each of the three average branch lengths, we generated 10 trees to evaluate reproducibility. Sequence alignments of 1,000,000 sites were derived from the 100 genomes’ trees based on a simple nucleotide model (equal equilibrium frequencies and equal mutation rates) from the Pyvolve python package (
<xref rid="msx200-B78" ref-type="bibr">Spielman and Wilke 2015</xref>
). The sequences obtained from the gapless alignments were used for subsequent Ray Surveyor analyses. The four distance metrics (Euclidean, cosine, correlation, Canberra) were tested in our simulation to transform Ray Surveyor’s similarity matrix into a distance matrix. We also tested ten different k-mer lengths—ranging from 11 to 101 with an increment of 10—to evaluate their performance. To ensure the validity of our tree and sequence models, an alignment-based phylogeny with the FastTree NT-GTR model (
<xref rid="msx200-B60" ref-type="bibr">Price etal. 2010</xref>
) was made for all the trees. The alignment-based phylogenies were also compared with the reference phylogeny using the same methods as for Ray Surveyor clusters (phenetic trees) or Neighbor-Joining trees. Two evaluations were made to test how well our method would replicate the reference simulated trees. First, the simulated tree distance matrices were compared with Ray Surveyor’s distance matrices with the CCC using the ape (
<xref rid="msx200-B55" ref-type="bibr">Paradis etal. 2004</xref>
) and dendextend (
<xref rid="msx200-B24" ref-type="bibr">Galili 2015</xref>
) R packages. CCC indicates how similar the pairwise distances are between two dendrograms obtained by hierarchical clustering from the distance matrix. Secondly, the topology of the trees was compared with the RF metric with the ETE3 python package (
<xref rid="msx200-B33" ref-type="bibr">Huerta-Cepas etal. 2016</xref>
). RF counts the minimal number of branch operations required to change one tree into the other. The ANI was also computed for all the simulated alignment sequences. The ANI statistics for all the trees are reported in
<xref ref-type="supplementary-material" rid="sup1">supplementary table 1</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online.</p>
</sec>
<sec>
<title>Real Prokaryotic Genome Data Sets</title>
<p>The phylogenies and metadata for the Croucher etal.
<italic>S. pneumoniae</italic>
data set and the Kos etal.
<italic>P. aeruginosa</italic>
data set were obtained from the authors (
<xref rid="msx200-B13" ref-type="bibr">Croucher etal. 2013</xref>
,
<xref rid="msx200-B14" ref-type="bibr">2015</xref>
;
<xref rid="msx200-B38" ref-type="bibr">Kos etal. 2015</xref>
;
<xref rid="msx200-B17" ref-type="bibr">Donati etal. 2010</xref>
). The phylogeny of the Hilty etal.
<italic>S. pneumoniae</italic>
data set was obtained using 602 conserved genes aligned with MAFFT v7.221 (
<xref rid="msx200-B36" ref-type="bibr">Katoh and Standley 2013</xref>
). A maximum likelihood phylogeny was the performed on the 602 concatenated genes with RAxML version 8.1.20 (
<xref rid="msx200-B79" ref-type="bibr">Stamatakis 2014</xref>
). In order to compare phylogenetic trees with the clusters of Ray Surveyor, the trees were converted from their Newick format into a cophenetic distance matrix using the R package: Ape (
<xref rid="msx200-B55" ref-type="bibr">Paradis etal. 2004</xref>
). Hierarchical clustering was performed using the UPGMA (average) method. The 2,429 bacteria genome phylogenetic tree was based on the 16S rRNA gene and taxonomical annotation was based on the established NCBI taxonomy. Initially, 2,429 bacterial genomes were obtained from NCBI (see
<xref ref-type="supplementary-material" rid="sup1">supplementary table 2</xref>
,
<xref ref-type="supplementary-material" rid="sup1">Supplementary Material</xref>
online for a list). To build the phylogeny of the bacterial tree of life, the 16S rRNA gene sequences were extracted from each genome. Then, the 2,429 16S rRNA genes were aligned using MAFFT v7.221 (
<xref rid="msx200-B36" ref-type="bibr">Katoh and Standley 2013</xref>
) and a maximum likelihood phylogeny was produced with RAxML version 8.1.20 (
<xref rid="msx200-B79" ref-type="bibr">Stamatakis 2014</xref>
). CCC and FMI were calculated with the dendextend R package (
<xref rid="msx200-B24" ref-type="bibr">Galili 2015</xref>
). Ray Surveyor was run with a k-mer length of 31 to keep a high stringency in the coloring of the graph (
<xref rid="msx200-B7" ref-type="bibr">Boisvert etal. 2012</xref>
). The 2,429 bacterial genomes similarity matrix was produced with Ray Surveyor on a computer cluster using four nodes of 48 cores with 256GB of RAM for a total compute time of <6 h.</p>
</sec>
<sec>
<title>Source of Tools and Data Sets</title>
<p>Ray Surveyor is freely available under the GPLv3 license at
<ext-link ext-link-type="uri" xlink:href="https://github.com/zorino/ray">https://github.com/zorino/ray</ext-link>
(last accessed July 19, 2017). A tutorial on how to run an analysis is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/zorino/raysurveyor-tutorial">https://github.com/zorino/raysurveyor-tutorial</ext-link>
(last accessed July 19, 2017). The 2,429 bacterial genomes were downloaded from the NCBI GenBank (
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/">ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/</ext-link>
; last accessed July 19, 2017) in September 2015. Only the sequences marked either as a representative or a reference genome in the assembly reports were selected. The goal was to compute phylogenetic trees and clustering from a limited number of genomes that represented a broad taxonomical overview of the domain Bacteria. Since the NCBI GenBank genome database has an inherent bias towards certain taxa (
<xref rid="msx200-B82" ref-type="bibr">Tatusova etal. 2015</xref>
), such as clinically relevant pathogens, it allowed us to discard a large number of similar genomes. The total number of nucleotides analyzed in this data set was 11.4 billion with an average of 3.9 million per genome. The targeted analyses of
<italic>S. pneumoniae</italic>
and
<italic>P. aeruginosa</italic>
were extracted from the literature (
<xref rid="msx200-B13" ref-type="bibr">Croucher etal. 2013</xref>
;
<xref rid="msx200-B38" ref-type="bibr">Kos etal. 2015</xref>
) and downloaded from NCBI GenBank or ENA. The data sets of resistance genes and mobile elements were taken from the MERGEM database (
<ext-link ext-link-type="uri" xlink:href="http://mergem.genome.ulaval.ca">http://mergem.genome.ulaval.ca</ext-link>
; last accessed July 19, 2017;
<xref rid="msx200-B62" ref-type="bibr">Raymond etal. 2016a</xref>
), the plasmids were taken from the NCBI Plasmids collection in June 2015, the bacteriophage from the EBI collection in June 2015 (
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/genomes/phage.html">http://www.ebi.ac.uk/genomes/phage.html</ext-link>
; last accessed July 19, 2017) and the BGC from the MIBIG v1.0 database (
<xref rid="msx200-B47" ref-type="bibr">Medema etal. 2015a</xref>
).</p>
</sec>
</sec>
<sec>
<title>Supplementary Material</title>
<p>
<xref ref-type="supplementary-material" rid="sup1">Supplementary data</xref>
are available at
<italic>Molecular Biology and Evolution</italic>
online.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>M.D. and F.R. performed bioinformatics analyses. M.D. and S.B. programmed the Ray Surveyor software. M.D., S.B., and F.L. designed algorithms. M.D., F.R., A.C., P.H.R., and J.C. interpreted biological results. M.D., F.R., A.C., and J.C. contributed to the preparation of the manuscript. All authors critically reviewed the manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="sup1">
<label>Supplementary Data</label>
<media xlink:href="msx200_Supp.zip">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>This study was financed by the Canada Research Chair in Medical Genomics (J.C.). F.R. was supported by a Mitacs post-doctoral fellowship. M.D. was supported by the Fonds de recherche du Québec—Santé. The authors thank Pier-Luc Plante, Alexandre Drouin, Pascal Belleau, and Maurice Boissinot for their comments. Computations were performed under the auspices of Calcul Québec and Compute Canada. The operations of Compute Canada are funded by the Canada Foundation for Innovation (CFI), the National Science and Engineering Research Council (NSERC), NanoQuébec, and the Fonds Québécois de Recherche sur la Nature et les Technologies (FQRNT).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="msx200-B1">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Allison</surname>
<given-names>GE</given-names>
</name>
,
<name name-style="western">
<surname>Verma</surname>
<given-names>NK.</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Serotype-converting bacteriophages and O-antigen modification in
<italic>Shigella flexneri</italic>
</article-title>
.
<source>Trends Microbiol</source>
.
<volume>8</volume>
<issue>1</issue>
:
<fpage>17</fpage>
<lpage>23</lpage>
.
<pub-id pub-id-type="pmid">10637639</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B2">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Andam</surname>
<given-names>CP</given-names>
</name>
,
<name name-style="western">
<surname>Hanage</surname>
<given-names>WP.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Mechanisms of genome evolution of Streptococcus</article-title>
.
<source>Infect Genet Evol</source>
.
<volume>33</volume>
:
<fpage>334</fpage>
<lpage>342</lpage>
.
<pub-id pub-id-type="pmid">25461843</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B3">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Balvočit</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Huson</surname>
<given-names>DH.</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>SILVA, RDP, Greengenes, NCBI and OTT – how do these taxonomies compare?</article-title>
<source>BMC Genomics</source>
.
<volume>18</volume>
(
<issue>Suppl 2</issue>
):
<fpage>114.</fpage>
<pub-id pub-id-type="pmid">28361695</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Biek</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Pybus</surname>
<given-names>OG</given-names>
</name>
,
<name name-style="western">
<surname>Lloyd-Smith</surname>
<given-names>JO</given-names>
</name>
,
<name name-style="western">
<surname>Didelot</surname>
<given-names>X.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Measurably evolving pathogens in the genomic era</article-title>
.
<source>Trends Ecol Evol</source>
.
<volume>30</volume>
<issue>6</issue>
:
<fpage>306</fpage>
<lpage>313</lpage>
.
<pub-id pub-id-type="pmid">25887947</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B5">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Boc</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Diallo</surname>
<given-names>AB</given-names>
</name>
,
<name name-style="western">
<surname>Makarenkov</surname>
<given-names>V.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>40</volume>
:(Web Server issue):
<fpage>W573</fpage>
<lpage>W579</lpage>
.
<pub-id pub-id-type="pmid">22675075</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Boisvert</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Laviolette</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Corbeil</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies</article-title>
.
<source>J Comput Biol</source>
.
<volume>17</volume>
<issue>11</issue>
:
<fpage>1519</fpage>
<lpage>1533</lpage>
.
<pub-id pub-id-type="pmid">20958248</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B7">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Boisvert</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Raymond</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Godzaridis</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Laviolette</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Corbeil</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Ray Meta: scalable de novo metagenome assembly and profiling</article-title>
.
<source>Genome Biol</source>
.
<volume>13</volume>
<issue>12</issue>
:
<fpage>R122.</fpage>
<pub-id pub-id-type="pmid">23259615</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Botzman</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Margalit</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles</article-title>
.
<source>Genome Biol</source>
.
<volume>12</volume>
<issue>10</issue>
:
<fpage>R109.</fpage>
<pub-id pub-id-type="pmid">22032172</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B9">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Cimermancic</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Medema</surname>
<given-names>MH</given-names>
</name>
,
<name name-style="western">
<surname>Claesen</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Kurita</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Wieland Brown</surname>
<given-names>LC</given-names>
</name>
,
<name name-style="western">
<surname>Mavrommatis</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Pati</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Godfrey</surname>
<given-names>PA</given-names>
</name>
,
<name name-style="western">
<surname>Koehrsen</surname>
<given-names>M</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2014</year>
<article-title>Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters</article-title>
.
<source>Cell</source>
<volume>158</volume>
<issue>2</issue>
:
<fpage>412</fpage>
<lpage>421</lpage>
.
<pub-id pub-id-type="pmid">25036635</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B10">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Cock</surname>
<given-names>PJA</given-names>
</name>
,
<name name-style="western">
<surname>Antao</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Chang</surname>
<given-names>JT</given-names>
</name>
,
<name name-style="western">
<surname>Chapman</surname>
<given-names>BA</given-names>
</name>
,
<name name-style="western">
<surname>Cox</surname>
<given-names>CJ</given-names>
</name>
,
<name name-style="western">
<surname>Dalke</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Friedberg</surname>
<given-names>I</given-names>
</name>
,
<name name-style="western">
<surname>Hamelryck</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Kauff</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Wilczynski</surname>
<given-names>B</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2009</year>
<article-title>Biopython: freely available Python tools for computational molecular biology and bioinformatics</article-title>
.
<source>Bioinformatics (Oxf, Engl)</source>
<volume>25</volume>
<issue>11</issue>
:
<fpage>1422</fpage>
<lpage>1423</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B11">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Colombo</surname>
<given-names>M-L</given-names>
</name>
,
<name name-style="western">
<surname>Hanique</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Baurin</surname>
<given-names>SL</given-names>
</name>
,
<name name-style="western">
<surname>Bauvois</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>De Vriendt</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Van Beeumen</surname>
<given-names>JJ</given-names>
</name>
,
<name name-style="western">
<surname>Frère</surname>
<given-names>J-M</given-names>
</name>
,
<name name-style="western">
<surname>Joris</surname>
<given-names>B.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>The
<italic>ybxI</italic>
gene of
<italic>Bacillus subtilis</italic>
168 encodes a class D beta-lactamase of low activity</article-title>
.
<source>Antimicrob Agents Chemother</source>
.
<volume>48</volume>
<issue>2</issue>
:
<fpage>484</fpage>
<lpage>490</lpage>
.
<pub-id pub-id-type="pmid">14742199</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B12">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Compeau</surname>
<given-names>PEC</given-names>
</name>
,
<name name-style="western">
<surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
,
<name name-style="western">
<surname>Tesler</surname>
<given-names>G.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>How to apply de Bruijn graphs to genome assembly</article-title>
.
<source>Nat Biotechnol</source>
.
<volume>29</volume>
<issue>11</issue>
:
<fpage>987</fpage>
<lpage>991</lpage>
.
<pub-id pub-id-type="pmid">22068540</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B13">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Croucher</surname>
<given-names>NJ</given-names>
</name>
,
<name name-style="western">
<surname>Finkelstein</surname>
<given-names>JA</given-names>
</name>
,
<name name-style="western">
<surname>Pelton</surname>
<given-names>SI</given-names>
</name>
,
<name name-style="western">
<surname>Mitchell</surname>
<given-names>PK</given-names>
</name>
,
<name name-style="western">
<surname>Lee</surname>
<given-names>GM</given-names>
</name>
,
<name name-style="western">
<surname>Parkhill</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Bentley</surname>
<given-names>SD</given-names>
</name>
,
<name name-style="western">
<surname>Hanage</surname>
<given-names>WP</given-names>
</name>
,
<name name-style="western">
<surname>Lipsitch</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Population genomics of post-vaccine changes in pneumococcal epidemiology</article-title>
.
<source>Nat Genet</source>
.
<volume>45</volume>
<issue>6</issue>
:
<fpage>656</fpage>
<lpage>663</lpage>
.
<pub-id pub-id-type="pmid">23644493</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B14">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Croucher</surname>
<given-names>NJ</given-names>
</name>
,
<name name-style="western">
<surname>Finkelstein</surname>
<given-names>JA</given-names>
</name>
,
<name name-style="western">
<surname>Pelton</surname>
<given-names>SI</given-names>
</name>
,
<name name-style="western">
<surname>Parkhill</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Bentley</surname>
<given-names>SD</given-names>
</name>
,
<name name-style="western">
<surname>Lipsitch</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Hanage</surname>
<given-names>WP.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Population genomic datasets describing the post-vaccine evolutionary epidemiology of
<italic>Streptococcus pneumoniae</italic>
</article-title>
.
<source>Sci Data</source>
.
<volume>2</volume>
:
<fpage>150058.</fpage>
<pub-id pub-id-type="pmid">26528397</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B15">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Deorowicz</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Kokot</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Grabowski</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Debudaj-Grabysz</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>KMC 2: fast and resource-frugal k-mer counting</article-title>
.
<source>Bioinformatics</source>
<volume>31</volume>
<issue>10</issue>
:
<fpage>1569</fpage>
<lpage>1576</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B16">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Dobrindt</surname>
<given-names>U</given-names>
</name>
,
<name name-style="western">
<surname>Hochhut</surname>
<given-names>B</given-names>
</name>
,
<name name-style="western">
<surname>Hentschel</surname>
<given-names>U</given-names>
</name>
,
<name name-style="western">
<surname>Hacker</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Genomic islands in pathogenic and environmental microorganisms</article-title>
.
<source>Nat Rev Microbiol</source>
.
<volume>2</volume>
<issue>5</issue>
:
<fpage>414</fpage>
<lpage>424</lpage>
.
<pub-id pub-id-type="pmid">15100694</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B17">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Donati</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Hiller</surname>
<given-names>NL</given-names>
</name>
,
<name name-style="western">
<surname>Tettelin</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Muzzi</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Croucher</surname>
<given-names>NJ</given-names>
</name>
,
<name name-style="western">
<surname>Angiuoli</surname>
<given-names>SV</given-names>
</name>
,
<name name-style="western">
<surname>Oggioni</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Dunning Hotopp</surname>
<given-names>JC</given-names>
</name>
,
<name name-style="western">
<surname>Hu</surname>
<given-names>FZ</given-names>
</name>
,
<name name-style="western">
<surname>Riley</surname>
<given-names>DR</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2010</year>
<article-title>Structure and dynamics of the pan-genome of
<italic>Streptococcus pneumoniae</italic>
and closely related species</article-title>
.
<source>Genome Biol</source>
.
<volume>11</volume>
<issue>10</issue>
:
<fpage>R107.</fpage>
<pub-id pub-id-type="pmid">21034474</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B18">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Drouin</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Giguère</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Déraspe</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Marchand</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Tyers</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Loo</surname>
<given-names>VG</given-names>
</name>
,
<name name-style="western">
<surname>Bourgault</surname>
<given-names>A-M</given-names>
</name>
,
<name name-style="western">
<surname>Laviolette</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Corbeil</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons</article-title>
.
<source>BMC Genomics</source>
.
<volume>17</volume>
<issue>1</issue>
:
<fpage>754.</fpage>
<pub-id pub-id-type="pmid">27671088</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B19">
<mixed-citation publication-type="journal">Editor.
<year>2011</year>
<article-title>Outbreak genomics</article-title>
.
<source>Nat Biotechnol</source>
.
<volume>29</volume>
<issue>9</issue>
:
<fpage>769.</fpage>
<pub-id pub-id-type="pmid">21904301</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B20">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Federhen</surname>
<given-names>S.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>The NCBI Taxonomy database</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>40</volume>
:(Database issue):
<fpage>D136</fpage>
<lpage>D143</lpage>
.
<pub-id pub-id-type="pmid">22139910</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B21">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fenselau</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Havey</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Teerakulkittipong</surname>
<given-names>N</given-names>
</name>
,
<name name-style="western">
<surname>Swatkoski</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Laine</surname>
<given-names>O</given-names>
</name>
,
<name name-style="western">
<surname>Edwards</surname>
<given-names>N.</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Identification of beta-lactamase in antibiotic-resistant
<italic>Bacillus cereus</italic>
spores</article-title>
.
<source>Appl Environ Microbiol</source>
.
<volume>74</volume>
<issue>3</issue>
:
<fpage>904</fpage>
<lpage>906</lpage>
.
<pub-id pub-id-type="pmid">18065609</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B22">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Foerstner</surname>
<given-names>KU</given-names>
</name>
,
<name name-style="western">
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Hooper</surname>
<given-names>SD</given-names>
</name>
,
<name name-style="western">
<surname>Bork</surname>
<given-names>P.</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Environments shape the nucleotide composition of genomes</article-title>
.
<source>EMBO Rep</source>
.
<volume>6</volume>
<issue>12</issue>
:
<fpage>1208</fpage>
<lpage>1213</lpage>
.
<pub-id pub-id-type="pmid">16200051</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B23">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fowlkes</surname>
<given-names>EB</given-names>
</name>
,
<name name-style="western">
<surname>Mallows</surname>
<given-names>CL.</given-names>
</name>
</person-group>
<year>1983</year>
<article-title>A method for comparing two hierarchical clusterings</article-title>
.
<source>J Am Stat Assoc</source>
.
<volume>78</volume>
<issue>383</issue>
:
<fpage>553.</fpage>
</mixed-citation>
</ref>
<ref id="msx200-B24">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Galili</surname>
<given-names>T.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering</article-title>
.
<source>Bioinformatics (Oxf, Engl)</source>
<volume>31</volume>
<issue>22</issue>
:
<fpage>3718</fpage>
<lpage>3720</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B25">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Gardner</surname>
<given-names>SN</given-names>
</name>
,
<name name-style="western">
<surname>Slezak</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Hall</surname>
<given-names>BG.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome</article-title>
.
<source>Bioinformatics</source>
<volume>31</volume>
<issue>17</issue>
:
<fpage>2877</fpage>
<lpage>2878</lpage>
.
<pub-id pub-id-type="pmid">25913206</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B26">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Gire</surname>
<given-names>SK</given-names>
</name>
,
<name name-style="western">
<surname>Goba</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Andersen</surname>
<given-names>KG</given-names>
</name>
,
<name name-style="western">
<surname>Sealfon</surname>
<given-names>RSG</given-names>
</name>
,
<name name-style="western">
<surname>Park</surname>
<given-names>DJ</given-names>
</name>
,
<name name-style="western">
<surname>Kanneh</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Jalloh</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Momoh</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Fullah</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Dudas</surname>
<given-names>G</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2014</year>
<article-title>Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak</article-title>
.
<source>Science</source>
<volume>345</volume>
<issue>6202</issue>
:
<fpage>1369</fpage>
<lpage>1372</lpage>
.
<pub-id pub-id-type="pmid">25214632</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B27">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Glaeser</surname>
<given-names>SP</given-names>
</name>
,
<name name-style="western">
<surname>Kämpfer</surname>
<given-names>P.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Multilocus sequence analysis (MLSA) in prokaryotic taxonomy</article-title>
.
<source>Syst Appl Microbiol</source>
.
<volume>38</volume>
<issue>4</issue>
:
<fpage>237</fpage>
<lpage>245</lpage>
.
<pub-id pub-id-type="pmid">25959541</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B28">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Guindon</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Gascuel</surname>
<given-names>O.</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Efficient biased estimation of evolutionary distances when substitution rates vary across sites</article-title>
.
<source>Mol Biol Evol</source>
.
<volume>19</volume>
<issue>4</issue>
:
<fpage>534</fpage>
<lpage>543</lpage>
.
<pub-id pub-id-type="pmid">11919295</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B29">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Haubold</surname>
<given-names>B.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Alignment-free phylogenetics and population genetics</article-title>
.
<source>Brief Bioinform</source>
.
<volume>15</volume>
<issue>3</issue>
:
<fpage>407</fpage>
<lpage>418</lpage>
.
<pub-id pub-id-type="pmid">24291823</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B30">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Hazen</surname>
<given-names>TH</given-names>
</name>
,
<name name-style="western">
<surname>Pan</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Gu</surname>
<given-names>J-D</given-names>
</name>
,
<name name-style="western">
<surname>Sobecky</surname>
<given-names>PA.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>The contribution of mobile genetic elements to the evolution and ecology of Vibrios</article-title>
.
<source>FEMS Microbiol Ecol</source>
.
<volume>74</volume>
<issue>3</issue>
:
<fpage>485</fpage>
<lpage>499</lpage>
.
<pub-id pub-id-type="pmid">20662928</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B31">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Hewitt</surname>
<given-names>CE.</given-names>
</name>
</person-group>
<year>1977</year>
<article-title>Viewing control structures as patterns of message passing</article-title>
.
<source>Artif Intell</source>
.
<volume>8</volume>
<issue>3</issue>
:
<fpage>323</fpage>
<lpage>364</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B32">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Hilty</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Wüthrich</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Salter</surname>
<given-names>SJ</given-names>
</name>
,
<name name-style="western">
<surname>Engel</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Campbell</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Sá-Leão</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>De Lencastre</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Hermans</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Sadowy</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Turner</surname>
<given-names>P</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2014</year>
<article-title>Global phylogenomic analysis of nonencapsulated
<italic>Streptococcus pneumoniae</italic>
reveals a deep-branching classic lineage that is distinct from multiple sporadic lineages</article-title>
.
<source>Genome Biol Evol</source>
.
<volume>6</volume>
<issue>12</issue>
:
<fpage>3281</fpage>
<lpage>3294</lpage>
.
<pub-id pub-id-type="pmid">25480686</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B33">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Huerta-Cepas</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Serra</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Bork</surname>
<given-names>P.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>ETE 3: reconstruction, analysis, and visualization of phylogenomic data</article-title>
.
<source>Mol Biol Evol</source>
.
<volume>33</volume>
<issue>6</issue>
:
<fpage>1635</fpage>
<lpage>1638</lpage>
.
<pub-id pub-id-type="pmid">26921390</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B34">
<mixed-citation publication-type="journal">
<collab>Integrative HMP (iHMP) Research Network Consortium</collab>
<year>2014</year>
<article-title>The integrative human microbiome project: dynamic analysis of microbiome–host omics profiles during periods of human health and disease corresponding author</article-title>
.
<source>Cell Host Microbe</source>
.
<volume>16</volume>
<issue>3</issue>
:
<fpage>276</fpage>
<lpage>289</lpage>
.
<pub-id pub-id-type="pmid">25211071</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B35">
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name name-style="western">
<surname>Jones</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Oliphant</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Peterson</surname>
<given-names>P</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2001</year>
. {SciPy}: open source scientific tools for {Python}.</mixed-citation>
</ref>
<ref id="msx200-B36">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Katoh</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Standley</surname>
<given-names>DM.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>MAFFT multiple sequence alignment software version 7: improvements in performance and usability</article-title>
.
<source>Mol Biol Evol</source>
.
<volume>30</volume>
<issue>4</issue>
:
<fpage>772</fpage>
<lpage>780</lpage>
.
<pub-id pub-id-type="pmid">23329690</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B37">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Konstantinidis</surname>
<given-names>KT</given-names>
</name>
,
<name name-style="western">
<surname>Ramette</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Tiedje</surname>
<given-names>JM.</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Toward a more robust assessment of intraspecies diversity, using fewer genetic markers</article-title>
.
<source>Appl Environ Microbiol</source>
.
<volume>72</volume>
<issue>11</issue>
:
<fpage>7286</fpage>
<lpage>7293</lpage>
.
<pub-id pub-id-type="pmid">16980418</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B38">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kos</surname>
<given-names>VN</given-names>
</name>
,
<name name-style="western">
<surname>Déraspe</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>McLaughlin</surname>
<given-names>RE</given-names>
</name>
,
<name name-style="western">
<surname>Whiteaker</surname>
<given-names>JD</given-names>
</name>
,
<name name-style="western">
<surname>Roy</surname>
<given-names>PH</given-names>
</name>
,
<name name-style="western">
<surname>Alm</surname>
<given-names>RA</given-names>
</name>
,
<name name-style="western">
<surname>Corbeil</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Gardner</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>The resistome of
<italic>Pseudomonas aeruginosa</italic>
in relationship to phenotypic susceptibility</article-title>
.
<source>Antimicrob Agents Chemother</source>
.
<volume>59</volume>
<issue>1</issue>
:
<fpage>427</fpage>
<lpage>436</lpage>
.
<pub-id pub-id-type="pmid">25367914</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B39">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kuhner</surname>
<given-names>MK</given-names>
</name>
,
<name name-style="western">
<surname>Felsenstein</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates</article-title>
.
<source>Mol Biol Evol</source>
.
<volume>11</volume>
<issue>3</issue>
:
<fpage>459</fpage>
<lpage>468</lpage>
.
<pub-id pub-id-type="pmid">8015439</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B40">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Land</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Hauser</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Jun</surname>
<given-names>S-R</given-names>
</name>
,
<name name-style="western">
<surname>Nookaew</surname>
<given-names>I</given-names>
</name>
,
<name name-style="western">
<surname>Leuze</surname>
<given-names>MR</given-names>
</name>
,
<name name-style="western">
<surname>Ahn</surname>
<given-names>T-H</given-names>
</name>
,
<name name-style="western">
<surname>Karpinets</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Lund</surname>
<given-names>O</given-names>
</name>
,
<name name-style="western">
<surname>Kora</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Wassenaar</surname>
<given-names>T</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015</year>
<article-title>Insights from 20 years of bacterial genome sequencing</article-title>
.
<source>Funct Integr Genomics</source>
.
<volume>15</volume>
<issue>2</issue>
:
<fpage>141</fpage>
<lpage>161</lpage>
.
<pub-id pub-id-type="pmid">25722247</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B41">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Larsson</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Elfsmark</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Svensson</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Wikström</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Forsman</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Brettin</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Keim</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Johansson</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Molecular evolutionary consequences of niche restriction in
<italic>Francisella tularensis</italic>
, a facultative intracellular pathogen</article-title>
.
<source>PLoS Pathog</source>
.
<volume>5</volume>
<issue>6</issue>
:
<fpage>e1000472.</fpage>
<pub-id pub-id-type="pmid">19521508</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B42">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Lassalle</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Périan</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Bataillon</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Nesme</surname>
<given-names>X</given-names>
</name>
,
<name name-style="western">
<surname>Duret</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Daubin</surname>
<given-names>V.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands</article-title>
.
<source>PLoS Genet</source>
.
<volume>11</volume>
<issue>2</issue>
:
<fpage>e1004941.</fpage>
<pub-id pub-id-type="pmid">25659072</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B43">
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name name-style="western">
<surname>Li</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Yan</surname>
<given-names>X.</given-names>
</name>
</person-group>
<year>2015</year>
MSPKmerCounter: a fast and memory efficient approach for k-mer counting.
<italic>Cs.Ucsb.Edu</italic>
. p. 1–7.</mixed-citation>
</ref>
<ref id="msx200-B44">
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name name-style="western">
<surname>Loureiro</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Torgo</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Soares</surname>
<given-names>C.</given-names>
</name>
</person-group>
<year>2004</year>
Outlier detection using clustering methods: a data cleaning application.
<italic>Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector.</italic>
</mixed-citation>
</ref>
<ref id="msx200-B45">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Marçais</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Kingsford</surname>
<given-names>C.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers</article-title>
.
<source>Bioinformatics</source>
<volume>27</volume>
<issue>6</issue>
:
<fpage>764</fpage>
<lpage>770</lpage>
.
<pub-id pub-id-type="pmid">21217122</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B46">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Materon</surname>
<given-names>IC</given-names>
</name>
,
<name name-style="western">
<surname>Queenan</surname>
<given-names>AM</given-names>
</name>
,
<name name-style="western">
<surname>Koehler</surname>
<given-names>TM</given-names>
</name>
,
<name name-style="western">
<surname>Bush</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Palzkill</surname>
<given-names>T.</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Biochemical characterization of beta-lactamases Bla1 and Bla2 from
<italic>Bacillus anthracis</italic>
</article-title>
.
<source>Antimicrob Agents Chemother</source>
.
<volume>47</volume>
<issue>6</issue>
:
<fpage>2040</fpage>
<lpage>2042</lpage>
.
<pub-id pub-id-type="pmid">12760895</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B47">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Medema</surname>
<given-names>MH</given-names>
</name>
,
<name name-style="western">
<surname>Kottmann</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Yilmaz</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Cummings</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Biggins</surname>
<given-names>JB</given-names>
</name>
,
<name name-style="western">
<surname>Blin</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>de Bruijn</surname>
<given-names>I</given-names>
</name>
,
<name name-style="western">
<surname>Chooi</surname>
<given-names>YH</given-names>
</name>
,
<name name-style="western">
<surname>Claesen</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Coates</surname>
<given-names>RC</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015a</year>
<article-title>Minimum information about a biosynthetic gene cluster</article-title>
.
<source>Nat Chem Biol</source>
.
<volume>11</volume>
<issue>9</issue>
:
<fpage>625</fpage>
<lpage>631</lpage>
.
<pub-id pub-id-type="pmid">26284661</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B48">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Medema</surname>
<given-names>MH</given-names>
</name>
,
<name name-style="western">
<surname>Kottmann</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Yilmaz</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Cummings</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Biggins</surname>
<given-names>JB</given-names>
</name>
,
<name name-style="western">
<surname>Blin</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>de Bruijn</surname>
<given-names>I</given-names>
</name>
,
<name name-style="western">
<surname>Chooi</surname>
<given-names>YH</given-names>
</name>
,
<name name-style="western">
<surname>Claesen</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Coates</surname>
<given-names>RC</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015b</year>
<article-title>The Minimum Information about a Biosynthetic Gene cluster (MIBiG) specification</article-title>
.
<source>Nat Chem Biol</source>
.
<volume>11</volume>
<issue>9</issue>
:
<fpage>625</fpage>
<lpage>631</lpage>
.
<pub-id pub-id-type="pmid">26284661</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B49">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Medini</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Donati</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Tettelin</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Masignani</surname>
<given-names>V</given-names>
</name>
,
<name name-style="western">
<surname>Rappuoli</surname>
<given-names>R.</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>The microbial pan-genome</article-title>
.
<source>Curr Opin Genet Dev</source>
.
<volume>15</volume>
<issue>6</issue>
:
<fpage>589</fpage>
<lpage>594</lpage>
.
<pub-id pub-id-type="pmid">16185861</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B50">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Melsted</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Pritchard</surname>
<given-names>JK.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Efficient counting of k-mers in DNA sequences using a bloom filter</article-title>
.
<source>BMC Bioinformatics</source>
.
<volume>12</volume>
<issue>1</issue>
:
<fpage>333.</fpage>
<pub-id pub-id-type="pmid">21831268</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B51">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Metcalf</surname>
<given-names>JA</given-names>
</name>
,
<name name-style="western">
<surname>Funkhouser-Jones</surname>
<given-names>LJ</given-names>
</name>
,
<name name-style="western">
<surname>Brileya</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Reysenbach</surname>
<given-names>A-L</given-names>
</name>
,
<name name-style="western">
<surname>Bordenstein</surname>
<given-names>SR.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Antibacterial gene transfer across the tree of life</article-title>
.
<source>eLife</source>
<volume>3</volume>
:
<fpage>e04266</fpage>
.</mixed-citation>
</ref>
<ref id="msx200-B52">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Mooers</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>The evolution of base composition and phylogenetic inference</article-title>
.
<source>Trends Ecol Evol</source>
.
<volume>15</volume>
<issue>9</issue>
:
<fpage>365</fpage>
<lpage>369</lpage>
.
<pub-id pub-id-type="pmid">10931668</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B53">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Nasser</surname>
<given-names>W</given-names>
</name>
,
<name name-style="western">
<surname>Beres</surname>
<given-names>SB</given-names>
</name>
,
<name name-style="western">
<surname>Olsen</surname>
<given-names>RJ</given-names>
</name>
,
<name name-style="western">
<surname>Dean</surname>
<given-names>MA</given-names>
</name>
,
<name name-style="western">
<surname>Rice</surname>
<given-names>KA</given-names>
</name>
,
<name name-style="western">
<surname>Long</surname>
<given-names>SW</given-names>
</name>
,
<name name-style="western">
<surname>Kristinsson</surname>
<given-names>KG</given-names>
</name>
,
<name name-style="western">
<surname>Gottfredsson</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Vuopio</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Raisanen</surname>
<given-names>K</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2014</year>
<article-title>Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<volume>111</volume>
<issue>17</issue>
:
<fpage>E1768</fpage>
<lpage>E1776</lpage>
.
<pub-id pub-id-type="pmid">24733896</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B54">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ondov</surname>
<given-names>BD</given-names>
</name>
,
<name name-style="western">
<surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
,
<name name-style="western">
<surname>Melsted</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Mallonee</surname>
<given-names>AB</given-names>
</name>
,
<name name-style="western">
<surname>Bergman</surname>
<given-names>NH</given-names>
</name>
,
<name name-style="western">
<surname>Koren</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Phillippy</surname>
<given-names>AM.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Mash: fast genome and metagenome distance estimation using MinHash</article-title>
.
<source>Genome Biol</source>
.
<volume>17</volume>
<issue>1</issue>
:
<fpage>132.</fpage>
<pub-id pub-id-type="pmid">27323842</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B55">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Paradis</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Claude</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Strimmer</surname>
<given-names>K.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>APE: analyses of phylogenetics and evolution in R language</article-title>
.
<source>Bioinformatics (Oxf, Engl)</source>
<volume>20</volume>
<issue>2</issue>
:
<fpage>289</fpage>
<lpage>290</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B56">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Pärnänen</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Karkman</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Tamminen</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Lyra</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Hultman</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Paulin</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Virta</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Evaluating the mobility potential of antibiotic resistance genes in environmental resistomes without metagenomics</article-title>
.
<source>Sci Rep</source>
.
<volume>6</volume>
:
<fpage>35790.</fpage>
<pub-id pub-id-type="pmid">27767072</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B57">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Patwardhan</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Ray</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Roy</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Molecular markers in phylogenetic studies – a review</article-title>
.
<source>J Phylogenet Evol Biol</source>
.
<volume>2</volume>
:
<fpage>131</fpage>
.</mixed-citation>
</ref>
<ref id="msx200-B58">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Pennisi</surname>
<given-names>E.</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Evolution. Building the tree of life, genome by genome</article-title>
.
<source>Science (New York, N.Y.)</source>
<volume>320</volume>
<issue>5884</issue>
:
<fpage>1716</fpage>
<lpage>1717</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B59">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Douady</surname>
<given-names>CJ.</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Horizontal gene transfer and phylogenetics</article-title>
.
<source>Curr Opin Microbiol</source>
.
<volume>6</volume>
<issue>5</issue>
:
<fpage>498</fpage>
<lpage>505</lpage>
.
<pub-id pub-id-type="pmid">14572543</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B60">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Price</surname>
<given-names>MN</given-names>
</name>
,
<name name-style="western">
<surname>Dehal</surname>
<given-names>PS</given-names>
</name>
,
<name name-style="western">
<surname>Arkin</surname>
<given-names>AP.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>FastTree 2 – approximately maximum-likelihood trees for large alignments</article-title>
.
<source>PLoS ONE</source>
.
<volume>5</volume>
<issue>3</issue>
:
<fpage>e9490.</fpage>
<pub-id pub-id-type="pmid">20224823</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B61">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Qi</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Luo</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Hao</surname>
<given-names>B.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>CVTree: a phylogenetic tree reconstruction tool based on whole genomes</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>32</volume>
(Web Server issue):
<fpage>W45</fpage>
<lpage>W47</lpage>
.
<pub-id pub-id-type="pmid">15215347</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B62">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Raymond</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Ouameur</surname>
<given-names>AA</given-names>
</name>
,
<name name-style="western">
<surname>Déraspe</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Iqbal</surname>
<given-names>N</given-names>
</name>
,
<name name-style="western">
<surname>Gingras</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Dridi</surname>
<given-names>B</given-names>
</name>
,
<name name-style="western">
<surname>Leprohon</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Plante</surname>
<given-names>P-L</given-names>
</name>
,
<name name-style="western">
<surname>Giroux</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Bérubé</surname>
<given-names>È</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2016a</year>
<article-title>The initial state of the human gut microbiome determines its reshaping by antibiotics</article-title>
.
<source>ISME J</source>
.
<volume>10</volume>
<issue>3</issue>
:
<fpage>707</fpage>
<lpage>720</lpage>
.
<pub-id pub-id-type="pmid">26359913</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B63">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Raymond</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Déraspe</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Boissinot</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Bergeron</surname>
<given-names>MG</given-names>
</name>
,
<name name-style="western">
<surname>Corbeil</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2016b</year>
<article-title>Partial recovery of microbiomes after antibiotic treatment</article-title>
.
<source>Gut Microb</source>
.
<volume>7</volume>
<issue>5</issue>
:
<fpage>428</fpage>
<lpage>434</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B64">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Chew</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Waterman</surname>
<given-names>MS.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Alignment-free sequence comparison (I): statistics and power</article-title>
.
<source>J Comput Biol</source>
.
<volume>16</volume>
<issue>12</issue>
:
<fpage>1615</fpage>
<lpage>1634</lpage>
.
<pub-id pub-id-type="pmid">20001252</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B65">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Rizk</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Lavenier</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Chikhi</surname>
<given-names>R.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>DSK: K-mer counting with very low memory usage</article-title>
.
<source>Bioinformatics</source>
<volume>29</volume>
<issue>5</issue>
:
<fpage>652</fpage>
<lpage>653</lpage>
.
<pub-id pub-id-type="pmid">23325618</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B66">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Robinson</surname>
<given-names>DF</given-names>
</name>
,
<name name-style="western">
<surname>Foulds</surname>
<given-names>LR.</given-names>
</name>
</person-group>
<year>1981</year>
<article-title>Comparison of phylogenetic trees</article-title>
.
<source>Math Biosci</source>
.
<volume>53</volume>
(
<issue>1–2</issue>
):
<fpage>131</fpage>
<lpage>147</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B67">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Rodionov</surname>
<given-names>DA</given-names>
</name>
,
<name name-style="western">
<surname>Gelfand</surname>
<given-names>MS</given-names>
</name>
,
<name name-style="western">
<surname>Mironov</surname>
<given-names>AA</given-names>
</name>
,
<name name-style="western">
<surname>Rakhmaninova</surname>
<given-names>AB.</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Comparative approach to analysis of regulation in complete genomes: multidrug resistance systems in gamma-proteobacteria</article-title>
.
<source>J Mol Microbiol Biotechnol</source>
.
<volume>3</volume>
<issue>2</issue>
:
<fpage>319</fpage>
<lpage>324</lpage>
.
<pub-id pub-id-type="pmid">11321589</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B68">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Romero</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Llull</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>García</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Mitchell</surname>
<given-names>TJ</given-names>
</name>
,
<name name-style="western">
<surname>López</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Moscoso</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Isolation and characterization of a new plasmid pSpnP1 from a multidrug-resistant clone of
<italic>Streptococcus pneumoniae</italic>
</article-title>
.
<source>Plasmid</source>
<volume>58</volume>
<issue>1</issue>
:
<fpage>51</fpage>
<lpage>60</lpage>
.
<pub-id pub-id-type="pmid">17275906</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B69">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Rossello-Mora</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Amann</surname>
<given-names>R.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Past and future species definitions for Bacteria and Archaea</article-title>
.
<source>Syst Appl Microbiol</source>
.
<volume>38</volume>
<issue>4</issue>
:
<fpage>209</fpage>
<lpage>216</lpage>
.
<pub-id pub-id-type="pmid">25747618</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B70">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Sansinenea</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Ortiz</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Secondary metabolites of soil
<italic>Bacillus</italic>
spp</article-title>
.
<source>Biotechnol Lett</source>
.
<volume>33</volume>
<issue>8</issue>
:
<fpage>1523</fpage>
<lpage>1538</lpage>
.
<pub-id pub-id-type="pmid">21528405</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B71">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Schuch</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Fischetti</surname>
<given-names>VA.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>The secret life of the anthrax agent
<italic>Bacillus anthracis</italic>
: bacteriophage-mediated ecological adaptations</article-title>
.
<source>PLoS ONE</source>
.
<volume>4</volume>
<issue>8</issue>
:
<fpage>e6532.</fpage>
<pub-id pub-id-type="pmid">19672290</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B72">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Shapiro</surname>
<given-names>BJ</given-names>
</name>
,
<name name-style="western">
<surname>Friedman</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Cordero</surname>
<given-names>OX</given-names>
</name>
,
<name name-style="western">
<surname>Preheim</surname>
<given-names>SP</given-names>
</name>
,
<name name-style="western">
<surname>Timberlake</surname>
<given-names>SC</given-names>
</name>
,
<name name-style="western">
<surname>Szabó</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Polz</surname>
<given-names>MF</given-names>
</name>
,
<name name-style="western">
<surname>Alm</surname>
<given-names>EJ.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Population genomics of early events in the ecological differentiation of bacteria</article-title>
.
<source>Science (New York, N.Y.)</source>
.
<volume>336</volume>
<issue>6077</issue>
:
<fpage>48</fpage>
<lpage>51</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B73">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Siva</surname>
<given-names>N.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>1000 genomes project</article-title>
.
<source>ATLA Altern Lab Anim</source>
.
<volume>38</volume>
<issue>6</issue>
:
<fpage>445.</fpage>
</mixed-citation>
</ref>
<ref id="msx200-B74">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Snitkin</surname>
<given-names>ES</given-names>
</name>
,
<name name-style="western">
<surname>Zelazny</surname>
<given-names>AM</given-names>
</name>
,
<name name-style="western">
<surname>Thomas</surname>
<given-names>PJ</given-names>
</name>
,
<name name-style="western">
<surname>Stock</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Henderson</surname>
<given-names>DK</given-names>
</name>
,
<name name-style="western">
<surname>Palmore</surname>
<given-names>TN</given-names>
</name>
,
<name name-style="western">
<surname>Segre</surname>
<given-names>JA.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Tracking a hospital outbreak of carbapenem-resistant
<italic>Klebsiella pneumoniae</italic>
with whole-genome sequencing</article-title>
.
<source>Sci Transl Med</source>
.
<volume>4</volume>
<issue>148</issue>
:
<fpage>148ra116</fpage>
<lpage>148ra116</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B75">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Sokal</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Rohlf</surname>
<given-names>F.</given-names>
</name>
</person-group>
<year>1962</year>
<article-title>The comparisons of dendrograms by objective methods</article-title>
.
<source>Taxon</source>
<volume>11</volume>
:
<fpage>33</fpage>
<lpage>40</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B76">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Song</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Ren</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Deng</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing</article-title>
.
<source>Brief Bioinf</source>
.
<volume>15</volume>
<issue>3</issue>
:
<fpage>343</fpage>
<lpage>353</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B77">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Sozhamannan</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Chute</surname>
<given-names>MD</given-names>
</name>
,
<name name-style="western">
<surname>McAfee</surname>
<given-names>FD</given-names>
</name>
,
<name name-style="western">
<surname>Fouts</surname>
<given-names>DE</given-names>
</name>
,
<name name-style="western">
<surname>Akmal</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Galloway</surname>
<given-names>DR</given-names>
</name>
,
<name name-style="western">
<surname>Mateczun</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Baillie</surname>
<given-names>LW</given-names>
</name>
,
<name name-style="western">
<surname>Read</surname>
<given-names>TD.</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>The
<italic>Bacillus anthracis</italic>
chromosome contains four conserved, excision-proficient, putative prophages</article-title>
.
<source>BMC Microbiol</source>
.
<volume>6</volume>
:
<fpage>34.</fpage>
<pub-id pub-id-type="pmid">16600039</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B78">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Spielman</surname>
<given-names>SJ</given-names>
</name>
,
<name name-style="western">
<surname>Wilke</surname>
<given-names>CO.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Pyvolve: a flexible python module for simulating sequences along phylogenies</article-title>
.
<source>PLoS ONE</source>
.
<volume>10</volume>
<issue>9</issue>
:
<fpage>e0139047.</fpage>
<pub-id pub-id-type="pmid">26397960</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B79">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Stamatakis</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies</article-title>
.
<source>Bioinformatics</source>
<volume>30</volume>
<issue>9</issue>
:
<fpage>1312</fpage>
<lpage>1313</lpage>
.
<pub-id pub-id-type="pmid">24451623</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B80">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Sun</surname>
<given-names>Q</given-names>
</name>
,
<name name-style="western">
<surname>Lan</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Wang</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Li</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Du</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Isolation and genomic characterization of SfI, a serotype-converting bacteriophage of
<italic>Shigella flexneri</italic>
</article-title>
.
<source>BMC Microbiol</source>
.
<volume>13</volume>
:
<fpage>39.</fpage>
<pub-id pub-id-type="pmid">23414301</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B81">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Tang</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Bossers</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Harders</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Lu</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Smith</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Comparative genomic analysis of twelve
<italic>Streptococcus suis</italic>
(pro)phages</article-title>
.
<source>Genomics</source>
<volume>101</volume>
<issue>6</issue>
:
<fpage>336</fpage>
<lpage>344</lpage>
.
<pub-id pub-id-type="pmid">23587535</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B82">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Tatusova</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Ciufo</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Fedorov</surname>
<given-names>B</given-names>
</name>
,
<name name-style="western">
<surname>O’Neill</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Tolstoy</surname>
<given-names>I.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>RefSeq microbial genomes database: new representation and annotation strategy</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>43</volume>
<issue>7</issue>
:
<fpage>3872.</fpage>
<pub-id pub-id-type="pmid">25824943</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B83">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Tu</surname>
<given-names>Q</given-names>
</name>
,
<name name-style="western">
<surname>Lin</surname>
<given-names>L.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Gene content dissimilarity for subclassification of highly similar microbial strains</article-title>
.
<source>BMC Genomics</source>
.
<volume>17</volume>
:
<fpage>647.</fpage>
<pub-id pub-id-type="pmid">27530250</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B84">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>van den Nieuwboer</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>van Hemert</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Claassen</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>de Vos</surname>
<given-names>WM.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>
<italic>Lactobacillus plantarum</italic>
WCFS1 and its host interaction: a dozen years after the genome</article-title>
.
<source>Microb Biotechnol</source>
.
<volume>9</volume>
<issue>4</issue>
:
<fpage>452</fpage>
<lpage>465</lpage>
.
<pub-id pub-id-type="pmid">27231133</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B85">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Vinga</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Almeida</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Alignment-free sequence comparison – a review</article-title>
.
<source>Bioinformatics (Oxf, Engl)</source>
<volume>19</volume>
<issue>4</issue>
:
<fpage>513</fpage>
<lpage>523</lpage>
.</mixed-citation>
</ref>
<ref id="msx200-B86">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Walsh</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Thomson</surname>
<given-names>KL</given-names>
</name>
,
<name name-style="western">
<surname>Ware</surname>
<given-names>JS</given-names>
</name>
,
<name name-style="western">
<surname>Funke</surname>
<given-names>BH</given-names>
</name>
,
<name name-style="western">
<surname>Woodley</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>McGuire</surname>
<given-names>KJ</given-names>
</name>
,
<name name-style="western">
<surname>Mazzarotto</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Blair</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Seller</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Taylor</surname>
<given-names>JC</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2016</year>
<article-title>Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples</article-title>
.
<source>Genet Med</source>
.
<volume>19</volume>
<issue>2</issue>
:
<fpage>192</fpage>
<lpage>203</lpage>
.
<pub-id pub-id-type="pmid">27532257</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B87">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Wan</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Waterman</surname>
<given-names>MS.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Alignment-free sequence comparison (II): theoretical power of comparison statistics</article-title>
.
<source>J Comput Biol</source>
.
<volume>17</volume>
<issue>11</issue>
:
<fpage>1467</fpage>
<lpage>1490</lpage>
.
<pub-id pub-id-type="pmid">20973742</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B90">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Wattam</surname>
<given-names>AR</given-names>
</name>
,
<name name-style="western">
<surname>Abraham</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Dalay</surname>
<given-names>O</given-names>
</name>
,
<name name-style="western">
<surname>Disz</surname>
<given-names>TL</given-names>
</name>
,
<name name-style="western">
<surname>Driscoll</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Gabbard</surname>
<given-names>JL</given-names>
</name>
,
<name name-style="western">
<surname>Gillespie</surname>
<given-names>JJ</given-names>
</name>
,
<name name-style="western">
<surname>Gough</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Hix</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Kenyon</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Machi</surname>
<given-names>D.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>PATRIC, the bacterial bioinformatics database and analysis resource</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>42</volume>
(
<issue>D1</issue>
):
<fpage>D581</fpage>
<lpage>D591</lpage>
.
<pub-id pub-id-type="pmid">24225323</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B88">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Wen</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>RHF</given-names>
</name>
,
<name name-style="western">
<surname>Yau</surname>
<given-names>SC</given-names>
</name>
,
<name name-style="western">
<surname>He</surname>
<given-names>RL</given-names>
</name>
,
<name name-style="western">
<surname>Yau</surname>
<given-names>SST.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>K-mer natural vector and its application to the phylogenetic analysis of genetic sequences</article-title>
.
<source>Gene</source>
<volume>546</volume>
<issue>1</issue>
:
<fpage>25</fpage>
<lpage>34</lpage>
.
<pub-id pub-id-type="pmid">24858075</pub-id>
</mixed-citation>
</ref>
<ref id="msx200-B89">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Xiong</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Déraspe</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Iqbal</surname>
<given-names>N</given-names>
</name>
,
<name name-style="western">
<surname>Krajden</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Chapman</surname>
<given-names>W</given-names>
</name>
,
<name name-style="western">
<surname>Dewar</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Roy</surname>
<given-names>PH.</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>Complete genome of a pan-resistant
<italic>P. aeruginosa</italic>
isolated from a patient with respiratory failure in a Canadian Community Hospital</article-title>
.
<source>Genome Announc</source>
.
<volume>5</volume>
<issue>22</issue>
:
<fpage>e00458</fpage>
<lpage>17</lpage>
.
<pub-id pub-id-type="pmid">28572328</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F24 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000F24 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:5850840
   |texte=   Phenetic Comparison of Prokaryotic Genomes Using k-mers
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:28957508" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021