Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank

Identifieur interne : 001255 ( Pmc/Corpus ); précédent : 001254; suivant : 001256

k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank

Auteurs : Guillaume Bernard ; Paul Greenfield ; Mark A. Ragan ; Cheong Xin Chan

Source :

RBID : PMC:6247013

Abstract

Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.


Url:
DOI: 10.1128/mSystems.00257-18
PubMed: 30505941
PubMed Central: 6247013

Links to Exploration step

PMC:6247013

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">
<italic>k</italic>
-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank</title>
<author>
<name sortKey="Bernard, Guillaume" sort="Bernard, Guillaume" uniqKey="Bernard G" first="Guillaume" last="Bernard">Guillaume Bernard</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Greenfield, Paul" sort="Greenfield, Paul" uniqKey="Greenfield P" first="Paul" last="Greenfield">Paul Greenfield</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A." last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chan, Cheong Xin" sort="Chan, Cheong Xin" uniqKey="Chan C" first="Cheong Xin" last="Chan">Cheong Xin Chan</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30505941</idno>
<idno type="pmc">6247013</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247013</idno>
<idno type="RBID">PMC:6247013</idno>
<idno type="doi">10.1128/mSystems.00257-18</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">001255</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001255</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">
<italic>k</italic>
-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank</title>
<author>
<name sortKey="Bernard, Guillaume" sort="Bernard, Guillaume" uniqKey="Bernard G" first="Guillaume" last="Bernard">Guillaume Bernard</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Greenfield, Paul" sort="Greenfield, Paul" uniqKey="Greenfield P" first="Paul" last="Greenfield">Paul Greenfield</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A." last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chan, Cheong Xin" sort="Chan, Cheong Xin" uniqKey="Chan C" first="Cheong Xin" last="Chan">Cheong Xin Chan</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">mSystems</title>
<idno type="eISSN">2379-5077</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly
<italic>Proteobacteria</italic>
. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="De Bary, A" uniqKey="De Bary A">A de Bary</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woese, Cr" uniqKey="Woese C">CR Woese</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bartlett, Jms" uniqKey="Bartlett J">JMS Bartlett</name>
</author>
<author>
<name sortKey="Stirling, D" uniqKey="Stirling D">D Stirling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, Jr" uniqKey="Brown J">JR Brown</name>
</author>
<author>
<name sortKey="Masuchi, Y" uniqKey="Masuchi Y">Y Masuchi</name>
</author>
<author>
<name sortKey="Robb, Ft" uniqKey="Robb F">FT Robb</name>
</author>
<author>
<name sortKey="Doolittle, Wf" uniqKey="Doolittle W">WF Doolittle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hug, La" uniqKey="Hug L">LA Hug</name>
</author>
<author>
<name sortKey="Baker, Bj" uniqKey="Baker B">BJ Baker</name>
</author>
<author>
<name sortKey="Anantharaman, K" uniqKey="Anantharaman K">K Anantharaman</name>
</author>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
<author>
<name sortKey="Probst, Aj" uniqKey="Probst A">AJ Probst</name>
</author>
<author>
<name sortKey="Castelle, Cj" uniqKey="Castelle C">CJ Castelle</name>
</author>
<author>
<name sortKey="Butterfield, Cn" uniqKey="Butterfield C">CN Butterfield</name>
</author>
<author>
<name sortKey="Hernsdorf, Aw" uniqKey="Hernsdorf A">AW Hernsdorf</name>
</author>
<author>
<name sortKey="Amano, Y" uniqKey="Amano Y">Y Amano</name>
</author>
<author>
<name sortKey="Ise, K" uniqKey="Ise K">K Ise</name>
</author>
<author>
<name sortKey="Suzuki, Y" uniqKey="Suzuki Y">Y Suzuki</name>
</author>
<author>
<name sortKey="Dudek, N" uniqKey="Dudek N">N Dudek</name>
</author>
<author>
<name sortKey="Relman, Da" uniqKey="Relman D">DA Relman</name>
</author>
<author>
<name sortKey="Finstad, Km" uniqKey="Finstad K">KM Finstad</name>
</author>
<author>
<name sortKey="Amundson, R" uniqKey="Amundson R">R Amundson</name>
</author>
<author>
<name sortKey="Thomas, Bc" uniqKey="Thomas B">BC Thomas</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Forterre, P" uniqKey="Forterre P">P Forterre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kunin, V" uniqKey="Kunin V">V Kunin</name>
</author>
<author>
<name sortKey="Goldovsky, L" uniqKey="Goldovsky L">L Goldovsky</name>
</author>
<author>
<name sortKey="Darzentas, N" uniqKey="Darzentas N">N Darzentas</name>
</author>
<author>
<name sortKey="Ouzounis, Ca" uniqKey="Ouzounis C">CA Ouzounis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rivera, Mc" uniqKey="Rivera M">MC Rivera</name>
</author>
<author>
<name sortKey="Lake, Ja" uniqKey="Lake J">JA Lake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lake, Ja" uniqKey="Lake J">JA Lake</name>
</author>
<author>
<name sortKey="Servin, Ja" uniqKey="Servin J">JA Servin</name>
</author>
<author>
<name sortKey="Herbold, Cw" uniqKey="Herbold C">CW Herbold</name>
</author>
<author>
<name sortKey="Skophammer, Rg" uniqKey="Skophammer R">RG Skophammer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fournier, Gp" uniqKey="Fournier G">GP Fournier</name>
</author>
<author>
<name sortKey="Huang, J" uniqKey="Huang J">J Huang</name>
</author>
<author>
<name sortKey="Gogarten, Jp" uniqKey="Gogarten J">JP Gogarten</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cong, Y" uniqKey="Cong Y">Y Cong</name>
</author>
<author>
<name sortKey="Chan, Yb" uniqKey="Chan Y">YB Chan</name>
</author>
<author>
<name sortKey="Phillips, Ca" uniqKey="Phillips C">CA Phillips</name>
</author>
<author>
<name sortKey="Langston, Ma" uniqKey="Langston M">MA Langston</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doolittle, Wf" uniqKey="Doolittle W">WF Doolittle</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dagan, T" uniqKey="Dagan T">T Dagan</name>
</author>
<author>
<name sortKey="Martin, W" uniqKey="Martin W">W Martin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beiko, Rg" uniqKey="Beiko R">RG Beiko</name>
</author>
<author>
<name sortKey="Harlow, Tj" uniqKey="Harlow T">TJ Harlow</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Chan, Yb" uniqKey="Chan Y">YB Chan</name>
</author>
<author>
<name sortKey="Chua, Xy" uniqKey="Chua X">XY Chua</name>
</author>
<author>
<name sortKey="Cong, Y" uniqKey="Cong Y">Y Cong</name>
</author>
<author>
<name sortKey="Hogan, Jm" uniqKey="Hogan J">JM Hogan</name>
</author>
<author>
<name sortKey="Maetschke, Sr" uniqKey="Maetschke S">SR Maetschke</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ren, J" uniqKey="Ren J">J Ren</name>
</author>
<author>
<name sortKey="Bai, X" uniqKey="Bai X">X Bai</name>
</author>
<author>
<name sortKey="Lu, Yy" uniqKey="Lu Y">YY Lu</name>
</author>
<author>
<name sortKey="Tang, K" uniqKey="Tang K">K Tang</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zielezinski, A" uniqKey="Zielezinski A">A Zielezinski</name>
</author>
<author>
<name sortKey="Vinga, S" uniqKey="Vinga S">S Vinga</name>
</author>
<author>
<name sortKey="Almeida, J" uniqKey="Almeida J">J Almeida</name>
</author>
<author>
<name sortKey="Karlowski, Wm" uniqKey="Karlowski W">WM Karlowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saitou, N" uniqKey="Saitou N">N Saitou</name>
</author>
<author>
<name sortKey="Nei, M" uniqKey="Nei M">M Nei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ali, W" uniqKey="Ali W">W Ali</name>
</author>
<author>
<name sortKey="Rito, T" uniqKey="Rito T">T Rito</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Deane, Cm" uniqKey="Deane C">CM Deane</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cong, Y" uniqKey="Cong Y">Y Cong</name>
</author>
<author>
<name sortKey="Chan, Yb" uniqKey="Chan Y">YB Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cong, Y" uniqKey="Cong Y">Y Cong</name>
</author>
<author>
<name sortKey="Chan, Yb" uniqKey="Chan Y">YB Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Posada, D" uniqKey="Posada D">D Posada</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fox, Ge" uniqKey="Fox G">GE Fox</name>
</author>
<author>
<name sortKey="Pechman, Kr" uniqKey="Pechman K">KR Pechman</name>
</author>
<author>
<name sortKey="Woese, Cr" uniqKey="Woese C">CR Woese</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woese, Cr" uniqKey="Woese C">CR Woese</name>
</author>
<author>
<name sortKey="Fox, Ge" uniqKey="Fox G">GE Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fox, Ge" uniqKey="Fox G">GE Fox</name>
</author>
<author>
<name sortKey="Stackebrandt, E" uniqKey="Stackebrandt E">E Stackebrandt</name>
</author>
<author>
<name sortKey="Hespell, Rb" uniqKey="Hespell R">RB Hespell</name>
</author>
<author>
<name sortKey="Gibson, J" uniqKey="Gibson J">J Gibson</name>
</author>
<author>
<name sortKey="Maniloff, J" uniqKey="Maniloff J">J Maniloff</name>
</author>
<author>
<name sortKey="Dyer, Ta" uniqKey="Dyer T">TA Dyer</name>
</author>
<author>
<name sortKey="Wolfe, Rs" uniqKey="Wolfe R">RS Wolfe</name>
</author>
<author>
<name sortKey="Balch, We" uniqKey="Balch W">WE Balch</name>
</author>
<author>
<name sortKey="Tanner, Rs" uniqKey="Tanner R">RS Tanner</name>
</author>
<author>
<name sortKey="Magrum, Lj" uniqKey="Magrum L">LJ Magrum</name>
</author>
<author>
<name sortKey="Zablen, Lb" uniqKey="Zablen L">LB Zablen</name>
</author>
<author>
<name sortKey="Blakemore, R" uniqKey="Blakemore R">R Blakemore</name>
</author>
<author>
<name sortKey="Gupta, R" uniqKey="Gupta R">R Gupta</name>
</author>
<author>
<name sortKey="Bonen, L" uniqKey="Bonen L">L Bonen</name>
</author>
<author>
<name sortKey="Lewis, Bj" uniqKey="Lewis B">BJ Lewis</name>
</author>
<author>
<name sortKey="Stahl, Da" uniqKey="Stahl D">DA Stahl</name>
</author>
<author>
<name sortKey="Luehrsen, Kr" uniqKey="Luehrsen K">KR Luehrsen</name>
</author>
<author>
<name sortKey="Chen, Kn" uniqKey="Chen K">KN Chen</name>
</author>
<author>
<name sortKey="Woese, Cr" uniqKey="Woese C">CR Woese</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yi, H" uniqKey="Yi H">H Yi</name>
</author>
<author>
<name sortKey="Jin, L" uniqKey="Jin L">L Jin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Yy" uniqKey="Lu Y">YY Lu</name>
</author>
<author>
<name sortKey="Tang, K" uniqKey="Tang K">K Tang</name>
</author>
<author>
<name sortKey="Ren, J" uniqKey="Ren J">J Ren</name>
</author>
<author>
<name sortKey="Fuhrman, Ja" uniqKey="Fuhrman J">JA Fuhrman</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leimeister, Ca" uniqKey="Leimeister C">CA Leimeister</name>
</author>
<author>
<name sortKey="Boden, M" uniqKey="Boden M">M Boden</name>
</author>
<author>
<name sortKey="Horwege, S" uniqKey="Horwege S">S Horwege</name>
</author>
<author>
<name sortKey="Lindner, S" uniqKey="Lindner S">S Lindner</name>
</author>
<author>
<name sortKey="Morgenstern, B" uniqKey="Morgenstern B">B Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Poirion, O" uniqKey="Poirion O">O Poirion</name>
</author>
<author>
<name sortKey="Hogan, Jm" uniqKey="Hogan J">JM Hogan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wan, L" uniqKey="Wan L">L Wan</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Chew, D" uniqKey="Chew D">D Chew</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skippington, E" uniqKey="Skippington E">E Skippington</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenfield, P" uniqKey="Greenfield P">P Greenfield</name>
</author>
<author>
<name sortKey="Roehm, U" uniqKey="Roehm U">U Roehm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jain, A" uniqKey="Jain A">A Jain</name>
</author>
<author>
<name sortKey="Srivastava, P" uniqKey="Srivastava P">P Srivastava</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shintani, M" uniqKey="Shintani M">M Shintani</name>
</author>
<author>
<name sortKey="Sanchez, Zk" uniqKey="Sanchez Z">ZK Sanchez</name>
</author>
<author>
<name sortKey="Kimbara, K" uniqKey="Kimbara K">K Kimbara</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harrison, E" uniqKey="Harrison E">E Harrison</name>
</author>
<author>
<name sortKey="Guymer, D" uniqKey="Guymer D">D Guymer</name>
</author>
<author>
<name sortKey="Spiers, Aj" uniqKey="Spiers A">AJ Spiers</name>
</author>
<author>
<name sortKey="Paterson, S" uniqKey="Paterson S">S Paterson</name>
</author>
<author>
<name sortKey="Brockhurst, Ma" uniqKey="Brockhurst M">MA Brockhurst</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fondi, M" uniqKey="Fondi M">M Fondi</name>
</author>
<author>
<name sortKey="Fani, R" uniqKey="Fani R">R Fani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lefeuvre, P" uniqKey="Lefeuvre P">P Lefeuvre</name>
</author>
<author>
<name sortKey="Cellier, G" uniqKey="Cellier G">G Cellier</name>
</author>
<author>
<name sortKey="Remenant, B" uniqKey="Remenant B">B Remenant</name>
</author>
<author>
<name sortKey="Chiroleu, F" uniqKey="Chiroleu F">F Chiroleu</name>
</author>
<author>
<name sortKey="Prior, P" uniqKey="Prior P">P Prior</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tatusov, Rl" uniqKey="Tatusov R">RL Tatusov</name>
</author>
<author>
<name sortKey="Galperin, My" uniqKey="Galperin M">MY Galperin</name>
</author>
<author>
<name sortKey="Natale, Da" uniqKey="Natale D">DA Natale</name>
</author>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benjamini, Y" uniqKey="Benjamini Y">Y Benjamini</name>
</author>
<author>
<name sortKey="Hochberg, Y" uniqKey="Hochberg Y">Y Hochberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soucy, Sm" uniqKey="Soucy S">SM Soucy</name>
</author>
<author>
<name sortKey="Huang, J" uniqKey="Huang J">J Huang</name>
</author>
<author>
<name sortKey="Gogarten, Jp" uniqKey="Gogarten J">JP Gogarten</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Beiko, Rg" uniqKey="Beiko R">RG Beiko</name>
</author>
<author>
<name sortKey="Darling, Ae" uniqKey="Darling A">AE Darling</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lerat, E" uniqKey="Lerat E">E Lerat</name>
</author>
<author>
<name sortKey="Daubin, V" uniqKey="Daubin V">V Daubin</name>
</author>
<author>
<name sortKey="Moran, Na" uniqKey="Moran N">NA Moran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daubin, V" uniqKey="Daubin V">V Daubin</name>
</author>
<author>
<name sortKey="Gouy, M" uniqKey="Gouy M">M Gouy</name>
</author>
<author>
<name sortKey="Perriere, G" uniqKey="Perriere G">G Perrière</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dagan, T" uniqKey="Dagan T">T Dagan</name>
</author>
<author>
<name sortKey="Martin, W" uniqKey="Martin W">W Martin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pace, Nr" uniqKey="Pace N">NR Pace</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kloesges, T" uniqKey="Kloesges T">T Kloesges</name>
</author>
<author>
<name sortKey="Popa, O" uniqKey="Popa O">O Popa</name>
</author>
<author>
<name sortKey="Martin, W" uniqKey="Martin W">W Martin</name>
</author>
<author>
<name sortKey="Dagan, T" uniqKey="Dagan T">T Dagan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Olsen, Gj" uniqKey="Olsen G">GJ Olsen</name>
</author>
<author>
<name sortKey="Woese, Cr" uniqKey="Woese C">CR Woese</name>
</author>
<author>
<name sortKey="Overbeek, R" uniqKey="Overbeek R">R Overbeek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stanier, Ry" uniqKey="Stanier R">RY Stanier</name>
</author>
<author>
<name sortKey="Van Niel, Cb" uniqKey="Van Niel C">CB Van Niel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wayne, Lg" uniqKey="Wayne L">LG Wayne</name>
</author>
<author>
<name sortKey="Brenner, Dj" uniqKey="Brenner D">DJ Brenner</name>
</author>
<author>
<name sortKey="Colwell, Rr" uniqKey="Colwell R">RR Colwell</name>
</author>
<author>
<name sortKey="Grimont, Pad" uniqKey="Grimont P">PAD Grimont</name>
</author>
<author>
<name sortKey="Kandler, O" uniqKey="Kandler O">O Kandler</name>
</author>
<author>
<name sortKey="Krichevsky, Mi" uniqKey="Krichevsky M">MI Krichevsky</name>
</author>
<author>
<name sortKey="Moore, Lh" uniqKey="Moore L">LH Moore</name>
</author>
<author>
<name sortKey="Moore, Wec" uniqKey="Moore W">WEC Moore</name>
</author>
<author>
<name sortKey="Murray, Rge" uniqKey="Murray R">RGE Murray</name>
</author>
<author>
<name sortKey="Stackebrandt, E" uniqKey="Stackebrandt E">E Stackebrandt</name>
</author>
<author>
<name sortKey="Starr, Mp" uniqKey="Starr M">MP Starr</name>
</author>
<author>
<name sortKey="Truper, Hg" uniqKey="Truper H">HG Truper</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trust, Tj" uniqKey="Trust T">TJ Trust</name>
</author>
<author>
<name sortKey="Logan, Sm" uniqKey="Logan S">SM Logan</name>
</author>
<author>
<name sortKey="Gustafson, Ce" uniqKey="Gustafson C">CE Gustafson</name>
</author>
<author>
<name sortKey="Romaniuk, Pj" uniqKey="Romaniuk P">PJ Romaniuk</name>
</author>
<author>
<name sortKey="Kim, Nw" uniqKey="Kim N">NW Kim</name>
</author>
<author>
<name sortKey="Chan, Vl" uniqKey="Chan V">VL Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
<author>
<name sortKey="Guerry, P" uniqKey="Guerry P">P Guerry</name>
</author>
<author>
<name sortKey="Gutell, Rr" uniqKey="Gutell R">RR Gutell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skennerton, Ct" uniqKey="Skennerton C">CT Skennerton</name>
</author>
<author>
<name sortKey="Haroon, Mf" uniqKey="Haroon M">MF Haroon</name>
</author>
<author>
<name sortKey="Briegel, A" uniqKey="Briegel A">A Briegel</name>
</author>
<author>
<name sortKey="Shi, J" uniqKey="Shi J">J Shi</name>
</author>
<author>
<name sortKey="Jensen, Gj" uniqKey="Jensen G">GJ Jensen</name>
</author>
<author>
<name sortKey="Tyson, Gw" uniqKey="Tyson G">GW Tyson</name>
</author>
<author>
<name sortKey="Orphan, Vj" uniqKey="Orphan V">VJ Orphan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Powell, S" uniqKey="Powell S">S Powell</name>
</author>
<author>
<name sortKey="Szklarczyk, D" uniqKey="Szklarczyk D">D Szklarczyk</name>
</author>
<author>
<name sortKey="Trachana, K" uniqKey="Trachana K">K Trachana</name>
</author>
<author>
<name sortKey="Roth, A" uniqKey="Roth A">A Roth</name>
</author>
<author>
<name sortKey="Kuhn, M" uniqKey="Kuhn M">M Kuhn</name>
</author>
<author>
<name sortKey="Muller, J" uniqKey="Muller J">J Muller</name>
</author>
<author>
<name sortKey="Arnold, R" uniqKey="Arnold R">R Arnold</name>
</author>
<author>
<name sortKey="Rattei, T" uniqKey="Rattei T">T Rattei</name>
</author>
<author>
<name sortKey="Letunic, I" uniqKey="Letunic I">I Letunic</name>
</author>
<author>
<name sortKey="Doerks, T" uniqKey="Doerks T">T Doerks</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">mSystems</journal-id>
<journal-id journal-id-type="iso-abbrev">mSystems</journal-id>
<journal-id journal-id-type="hwp">msys</journal-id>
<journal-id journal-id-type="pmc">msys</journal-id>
<journal-id journal-id-type="publisher-id">mSystems</journal-id>
<journal-title-group>
<journal-title>mSystems</journal-title>
</journal-title-group>
<issn pub-type="epub">2379-5077</issn>
<publisher>
<publisher-name>American Society for Microbiology</publisher-name>
<publisher-loc>1752 N St., N.W., Washington, DC</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30505941</article-id>
<article-id pub-id-type="pmc">6247013</article-id>
<article-id pub-id-type="publisher-id">mSystems00257-18</article-id>
<article-id pub-id-type="doi">10.1128/mSystems.00257-18</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="overline">
<subject>Ecological and Evolutionary Science</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>
<italic>k</italic>
-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank</article-title>
<alt-title alt-title-type="running-head">
<italic>k-</italic>
mer Phylogenomics of Microbes</alt-title>
<alt-title alt-title-type="short-authors">Bernard et al.</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0001-6251-9500</contrib-id>
<name>
<surname>Bernard</surname>
<given-names>Guillaume</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>a</sup>
</xref>
<xref ref-type="author-notes" rid="fn1">*</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0003-4028-9243</contrib-id>
<name>
<surname>Greenfield</surname>
<given-names>Paul</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>b</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0003-1672-7020</contrib-id>
<name>
<surname>Ragan</surname>
<given-names>Mark A.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>a</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0002-3729-8176</contrib-id>
<name>
<surname>Chan</surname>
<given-names>Cheong Xin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>a</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>c</sup>
</xref>
</contrib>
<aff id="aff1">
<label>a</label>
<addr-line>Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</aff>
<aff id="aff2">
<label>b</label>
<addr-line>Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, Australia</addr-line>
</aff>
<aff id="aff3">
<label>c</label>
<addr-line>School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia</addr-line>
</aff>
</contrib-group>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Claesson</surname>
<given-names>Marcus J.</given-names>
</name>
<role>Editor</role>
<aff>University College Cork</aff>
</contrib>
</contrib-group>
<author-notes>
<corresp id="cor1">Address correspondence to Cheong Xin Chan,
<email>c.chan1@uq.edu.au</email>
.</corresp>
<fn id="fn1" fn-type="present-address">
<label>*</label>
<p>Present address: Guillaume Bernard, Sorbonne Universités, UPMC Université Paris 06, Institut de Biologie Paris-Seine (IBPS), Paris, France.</p>
</fn>
<fn fn-type="other">
<p>
<bold>Citation</bold>
Bernard G, Greenfield P, Ragan MA, Chan CX. 2018.
<italic>k</italic>
-mer similarity, networks of microbial genomes, and taxonomic rank. mSystems 3:e00257-18.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/mSystems.00257-18">https://doi.org/10.1128/mSystems.00257-18</ext-link>
.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>11</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<season>Nov-Dec</season>
<year>2018</year>
</pub-date>
<volume>3</volume>
<issue>6</issue>
<elocation-id>e00257-18</elocation-id>
<history>
<date date-type="received">
<day>12</day>
<month>10</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>2</day>
<month>11</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="sys006182296001.pdf"></self-uri>
<abstract abstract-type="precis">
<p>Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly
<italic>Proteobacteria</italic>
. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.</p>
</abstract>
<abstract>
<title>ABSTRACT</title>
<p>Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on
<italic>k</italic>
-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly
<italic>Proteobacteria</italic>
. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean
<italic>k</italic>
-mer similarity can correlate with taxonomic rank. We also link the implicated
<italic>k</italic>
-mers to genome annotation (thus, functions) and define core
<italic>k</italic>
-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among
<italic>Spirochaetes</italic>
, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal
<italic>Tenericutes</italic>
. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that
<italic>k</italic>
-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale.</p>
<p>
<bold>IMPORTANCE</bold>
Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly
<italic>Proteobacteria</italic>
. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.</p>
</abstract>
<kwd-group>
<title>KEYWORDS</title>
<kwd>core functions</kwd>
<kwd>
<italic>k</italic>
-mers</kwd>
<kwd>networks</kwd>
<kwd>phylogenetic analysis</kwd>
<kwd>phylogenomics</kwd>
</kwd-group>
<funding-group>
<award-group id="award1">
<funding-source>
<institution-wrap>
<institution>James S. McDonnell Foundation (JSMF)</institution>
<institution-id>https://doi.org/10.13039/100000913</institution-id>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name>
<surname>Ragan</surname>
<given-names>Mark A.</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award2">
<funding-source>
<institution-wrap>
<institution>Department of Education and Training | Australian Research Council (ARC)</institution>
<institution-id>https://doi.org/10.13039/501100000923</institution-id>
</institution-wrap>
</funding-source>
<award-id>DP150101875</award-id>
<principal-award-recipient>
<name>
<surname>Chan</surname>
<given-names>Cheong Xin</given-names>
</name>
</principal-award-recipient>
<principal-award-recipient>
<name>
<surname>Ragan</surname>
<given-names>Mark A.</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<counts>
<count count="10" count-type="supplementary-material"></count>
<fig-count count="7"></fig-count>
<table-count count="3"></table-count>
<equation-count count="21"></equation-count>
<ref-count count="57"></ref-count>
<page-count count="16"></page-count>
<word-count count="9632"></word-count>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>cover-date</meta-name>
<meta-value>November/December 2018</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>INTRODUCTION</title>
<p>For nearly 100 years following the discovery of diverse bacteria by Pasteur, Koch, Cohn, and others in the latter decades of the 19th century (
<xref rid="B1" ref-type="bibr">1</xref>
), little was known of how these organisms might be related among themselves or to the rest of the living world. This began to change with the recognition that ribosomal RNAs are present in all living cells and contain structural domains that, by virtue of their differential entanglements with core molecular functions and their interactions with greater or lesser numbers of other components of the translational apparatus, can inform on evolutionary history across a range of temporal scales “much as the hands of a clock separately indicate hours, minutes, and seconds” (
<xref rid="B2" ref-type="bibr">2</xref>
). Given the central role of translation in the emergence of phenotype from genotype and the number and interrelatedness of these structural and functional constrains, it was assumed that statistical analysis of rRNA sequences would recover the tree of vertical descent not merely of the corresponding genes but also, much more interestingly, of the host organisms. As it happened, the PCR method was invented at about the same time (
<xref rid="B3" ref-type="bibr">3</xref>
), and the presence of conserved 5′ and 3′ regions made the rRNA gene an attractive target for amplification and sequencing. Thus, Darwin’s Great Tree of Life quickly became universal, and, as a bonus,
<italic>Archaebacteria</italic>
(
<italic>Archaea</italic>
) were recognized as a distinct domain of living organisms.</p>
<p>As molecular evolutionary studies were extended into families of protein-coding genes, congruent topologies were often (but not always) recovered (
<xref rid="B4" ref-type="bibr">4</xref>
). In contrast to expectation, instances of incongruence often failed to be resolved as data sets grew larger and statistical methodology for phylogenetic inference improved. It also became clear that many microbes can exchange genetic material through the mediation of plasmids or phage and/or take up DNA from their environment. Depending on the breadth and granularity of the data, phylogenetic trees inferred for regions of lateral origin may thus contain edges that directly connect lineages that are nonadjacent in the rRNA tree. That is, lateral genetic transfer creates phylogenomic networks. Plasmid and phage sequences in particular are expected to increase the connectivity of phylogenomic networks, although any genetic material that becomes established in a new host genome after transmission by such a vector can contribute.</p>
<p>The resulting pattern of phylogenomic relationships has been described (by the use of diverse metaphors) as fundamentally treelike (
<xref rid="B5" ref-type="bibr">5</xref>
,
<xref rid="B6" ref-type="bibr">6</xref>
), as a tree overgrown with tiny vines (
<xref rid="B7" ref-type="bibr">7</xref>
), as a ring (
<xref rid="B8" ref-type="bibr">8</xref>
,
<xref rid="B9" ref-type="bibr">9</xref>
), as a coral (
<xref rid="B10" ref-type="bibr">10</xref>
), as a web (
<xref rid="B7" ref-type="bibr">7</xref>
,
<xref rid="B11" ref-type="bibr">11</xref>
), as a network with some treelike regions (
<xref rid="B12" ref-type="bibr">12</xref>
), or simply as a network (
<xref rid="B13" ref-type="bibr">13</xref>
,
<xref rid="B14" ref-type="bibr">14</xref>
). Networks of lateral genetic transfer (
<xref rid="B11" ref-type="bibr">11</xref>
,
<xref rid="B13" ref-type="bibr">13</xref>
) highlighted the need to visualize contributions of different genomic regions on a broad scale.</p>
<p>Complete genome sequences are now available for thousands of bacterial and archaeal species, making it possible to assess microbial evolution globally and, often, at considerable phyletic depth. However, until recently these studies were necessarily biased in favor of alignable regions, i.e., genes, as classical phylogenomic workflows are based on multiple-sequence alignment (MSA) of putative orthogroups. Recently, so-called alignment-free (AF) approaches have been shown to perform well in phylogenetic inference from simulated and empirical (microbial genome) data sets (
<xref rid="B15" ref-type="bibr">15</xref>
; see references
<xref rid="B16" ref-type="bibr">16</xref>
,
<xref rid="B17" ref-type="bibr">17</xref>
, and
<xref rid="B18" ref-type="bibr">18</xref>
for recent reviews).</p>
<p>An important class of AF methods consists of approaches based on subsequences of fixed length, known as
<italic>k</italic>
-mers. These methods typically compute a matrix of distances on the basis of, e.g., the number of shared
<italic>k</italic>
-mers, which can then be used to generate a tree by the use of, e.g., neighbor joining (
<xref rid="B19" ref-type="bibr">19</xref>
) or a similarity network (
<xref rid="B20" ref-type="bibr">20</xref>
). Alternatively,
<italic>k</italic>
-mers of lateral origin can be recognized (
<xref rid="B11" ref-type="bibr">11</xref>
,
<xref rid="B21" ref-type="bibr">21</xref>
,
<xref rid="B22" ref-type="bibr">22</xref>
) and used to generate a directional network in which the edges natively represent inferred lateral relationships. The use of
<italic>k</italic>
-mers in phylogenetics is biologically intuitive (
<xref rid="B23" ref-type="bibr">23</xref>
,
<xref rid="B24" ref-type="bibr">24</xref>
); the earlier works of Carl Woese and colleagues (
<xref rid="B25" ref-type="bibr">25</xref>
<xref ref-type="bibr" rid="B26"></xref>
<xref rid="B27" ref-type="bibr">27</xref>
) showed that short (enzymatically digested) oligonucleotides of 16S/18S ribosomal RNAs carry significant phylogenetic (and thus, homology) signal and reveal the three domains of life. AF approaches can recover homology signal among molecular sequences at the genome scale and have been successfully applied to genomes of bacteria and archaea (
<xref rid="B15" ref-type="bibr">15</xref>
,
<xref rid="B28" ref-type="bibr">28</xref>
<xref ref-type="bibr" rid="B29"></xref>
<xref rid="B30" ref-type="bibr">30</xref>
), organelles (
<xref rid="B31" ref-type="bibr">31</xref>
), plants (
<xref rid="B31" ref-type="bibr">31</xref>
), and primates (
<xref rid="B30" ref-type="bibr">30</xref>
) as well as to microbial metagenomes (
<xref rid="B30" ref-type="bibr">30</xref>
).</p>
<p>AF methods can be more robust than MSA-based approaches to among-site rate heterogeneity, compositional bias, rearrangement, and insertion-deletion events (
<xref rid="B15" ref-type="bibr">15</xref>
,
<xref rid="B32" ref-type="bibr">32</xref>
) and are scalable for very large data sets (
<xref rid="B32" ref-type="bibr">32</xref>
,
<xref rid="B33" ref-type="bibr">33</xref>
). We earlier generated an AF phylogenetic network for 143 bacterial and archaeal genomes (
<xref rid="B29" ref-type="bibr">29</xref>
) using pairwise
<italic>k</italic>
-mer distances computed using the
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
statistic (
<xref rid="B34" ref-type="bibr">34</xref>
,
<xref rid="B35" ref-type="bibr">35</xref>
). By varying similarity thresholds, we could easily display changes of network structure, e.g., the progressive separation of genomic lineages (
<xref rid="B29" ref-type="bibr">29</xref>
) or the disappearance of cliques (putative “genetic exchange communities” [
<xref rid="B11" ref-type="bibr">11</xref>
,
<xref rid="B36" ref-type="bibr">36</xref>
]).</p>
<p>Here we used
<italic>k</italic>
-mer methods and the
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
statistic to infer phylogenomic networks for 2,783 complete prokaryote genomes and investigated the contribution of different components of the data to the phylogenetic signals captured by AF methods. Specifically, we compared AF networks inferred using (i) complete genomic data sets, including plasmids, if any; (ii) chromosomal sequences without rRNA genes; (iii) only rRNA genes; and (iv) only plasmid sequences. Using an advanced database approach, we investigated the core functions that are specific to particular phyletic groups or genera on the basis of the shared
<italic>k</italic>
-mers.</p>
</sec>
<sec sec-type="results" id="s2">
<title>RESULTS</title>
<p>For each subset of the data (see above), we first calculated a distance
<italic>d</italic>
between the genomes in a given pair (
<italic>a</italic>
and
<italic>b</italic>
) using the
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
distance measure and
<italic>k </italic>
=
<italic></italic>
25 (see Materials and Methods). The value of
<italic>k </italic>
=
<italic></italic>
25 was found to capture an adequate level of uniqueness among 1,121 complete bacterial genome sequences and is thus suitable for deriving a metric of relatedness among bacterial genomes (
<xref rid="B37" ref-type="bibr">37</xref>
). We transformed the distance between genome
<italic>a</italic>
and genome
<italic>b</italic>
(
<italic>d
<sub>ab</sub>
</italic>
) into a similarity value (
<italic>S
<sub>ab</sub>
</italic>
) and generated a similarity network using a method that we described previously (
<xref rid="B29" ref-type="bibr">29</xref>
). These networks capture the relatedness among these genomes, i.e., are phylogenomic, although the relative contributions of the vertical and lateral components (which may be admixed) depend on the subset of data used as input. Here we define a threshold
<italic>t</italic>
for which only edges with
<italic>S</italic>
values that are ≥
<italic>t</italic>
are considered in the network. To compare our results at the genome and phylum levels, we generated
<italic>I</italic>
-networks in which nodes represent distinct genome isolates and edges indicate evidence of shared
<italic>k</italic>
-mers and also generated
<italic>P</italic>
-networks in which nodes represent distinct phyla and edges represent the number of isolates (summed over both nodes) that share
<italic>k</italic>
-mers with isolates of the other phylum (see Materials and Methods). Given the taxon richness of
<italic>Proteobacteria</italic>
, we evaluated its subgroups (e.g.,
<italic>Alphaproteobacteria</italic>
and
<italic>Betaproteobacteria</italic>
) as individual phyla. We then compared the
<italic>k</italic>
-mer networks based on the topological differences between them at different
<italic>t</italic>
values. All
<italic>I</italic>
- and
<italic>P</italic>
-networks of these 2,705 genome isolates are available at
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.14264/uql.2017.436">https://doi.org/10.14264/uql.2017.436</ext-link>
.</p>
<sec id="s2.1">
<title>AF networks of microbial evolution.</title>
<p>We first inferred phylogenomic networks based on a data set of 2,783 completely sequenced microbial genomes (2,618 bacterial genomes and 165 archaeal genomes [total of 9,582,718,896 bases]) downloaded from NCBI on 31 January 2016 (see
<xref ref-type="supplementary-material" rid="dataS1">Data Set S1</xref>
in the supplemental material), including plasmid sequences if present. Where two or more genomes had identical contents of 25-mers (
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
distance = 0), only one was retained. We also removed edges for which the
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
distance was >10; these genomes share ≤0.01% of 25-mers with any other genome. Following this filtering step, we took 2,705 genomes forward into subsequent analyses. For each network, we systematically assessed the number of nonsingleton nodes (
<italic>c</italic>
) (i.e., the number of nodes with one or more edges), the size of the maximal clique (i.e., the clique with the largest number of genomes) (
<italic>z</italic>
), and the number of cliques (
<italic>n</italic>
) across various levels of the similarity score threshold (
<italic>t</italic>
). We required a clique to contain three or more edges and defined
<italic>D</italic>
as the density of a network, i.e., the proportion of edges among all possible edges in a network (
<xref ref-type="fig" rid="fig1">Fig. 1</xref>
; see also Materials and Methods).</p>
<fig id="fig1" orientation="portrait" position="float">
<label>FIG 1</label>
<caption>
<p>Definition of key terms of network characteristics used in this study. The example 11-node network is shown at the top, and the definition of each key term associated with this example network is shown at the bottom.</p>
</caption>
<graphic xlink:href="sys0061822960001"></graphic>
</fig>
<supplementary-material content-type="local-data" id="dataS1">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.10</object-id>
<label>DATA SET S1</label>
<p>The 2,783 completely sequenced microbial genomes used in this study. For each genome data set, the data source, the total number of bases, and the total number of 25-mers are shown. Download
<inline-supplementary-material id="dsS1" mimetype="application" mime-subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet" xlink:href="sys006182296sd1.xlsx" content-type="local-data">Data Set S1, XLSX file, 0.2 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
<p>The network topology changes substantially with similarity threshold: at
<italic>t </italic>
=
<italic></italic>
0,
<italic>c </italic>
=
<italic></italic>
2,705 and
<italic>z </italic>
=
<italic></italic>
2,704, compared to
<italic>c </italic>
=
<italic></italic>
1,358 and
<italic>z </italic>
=
<italic></italic>
48 at
<italic>t </italic>
=
<italic></italic>
9 (
<xref rid="tab1" ref-type="table">Table 1</xref>
). As we increase the stringency of the threshold of shared similarity, the network becomes less connected, and distinct subsets corresponding to diverse taxa (i.e., phyla, classes, and genera) start to emerge. In this network, many bacterial phyla are represented in a single subgraph at
<italic>t </italic>
=
<italic></italic>
4, most phyla can be identified as distinct sets at
<italic>t </italic>
=
<italic></italic>
5, and all proteobacterial classes are separate from each other at
<italic>t </italic>
>
<italic></italic>
5.</p>
<table-wrap id="tab1" orientation="portrait" position="float">
<label>TABLE 1</label>
<caption>
<p>Characteristics of the phylogenomic network of 2,705 prokaryote genomes based on complete genomic data sets</p>
</caption>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1">Threshold</th>
<th rowspan="1" colspan="1">No. of nonsingleton nodes,
<italic>c</italic>
</th>
<th rowspan="1" colspan="1">Density,
<italic>D</italic>
</th>
<th rowspan="1" colspan="1">Size of the maximal clique,
<italic>z</italic>
</th>
<th rowspan="1" colspan="1">No. of cliques,
<italic>n</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">2,705</td>
<td rowspan="1" colspan="1">0.998</td>
<td rowspan="1" colspan="1">2,704</td>
<td rowspan="1" colspan="1">10</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">2,705</td>
<td rowspan="1" colspan="1">0.989</td>
<td rowspan="1" colspan="1">2,701</td>
<td rowspan="1" colspan="1">Not available</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">2,705</td>
<td rowspan="1" colspan="1">0.513</td>
<td rowspan="1" colspan="1">860</td>
<td rowspan="1" colspan="1">Not available</td>
</tr>
<tr>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">2,680</td>
<td rowspan="1" colspan="1">0.079</td>
<td rowspan="1" colspan="1">339</td>
<td rowspan="1" colspan="1">1,662,785</td>
</tr>
<tr>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">2,378</td>
<td rowspan="1" colspan="1">0.019</td>
<td rowspan="1" colspan="1">211</td>
<td rowspan="1" colspan="1">6,181</td>
</tr>
<tr>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">2,091</td>
<td rowspan="1" colspan="1">0.008</td>
<td rowspan="1" colspan="1">124</td>
<td rowspan="1" colspan="1">3,344</td>
</tr>
<tr>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">1,860</td>
<td rowspan="1" colspan="1">0.005</td>
<td rowspan="1" colspan="1">82</td>
<td rowspan="1" colspan="1">525</td>
</tr>
<tr>
<td rowspan="1" colspan="1">7</td>
<td rowspan="1" colspan="1">1,676</td>
<td rowspan="1" colspan="1">0.003</td>
<td rowspan="1" colspan="1">64</td>
<td rowspan="1" colspan="1">229</td>
</tr>
<tr>
<td rowspan="1" colspan="1">8</td>
<td rowspan="1" colspan="1">1,538</td>
<td rowspan="1" colspan="1">0.003</td>
<td rowspan="1" colspan="1">61</td>
<td rowspan="1" colspan="1">224</td>
</tr>
<tr>
<td rowspan="1" colspan="1">9</td>
<td rowspan="1" colspan="1">1,358</td>
<td rowspan="1" colspan="1">0.002</td>
<td rowspan="1" colspan="1">48</td>
<td rowspan="1" colspan="1">232</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The
<italic>I</italic>
-network is very densely connected at
<italic>t </italic>
=
<italic></italic>
0, with the maximum number of cliques
<italic>n</italic>
= 10. The value
<italic>n</italic>
is too great to be computed at
<italic>t</italic>
=1 or
<italic>t </italic>
=
<italic></italic>
2, but
<italic>n</italic>
= 1,662,785 at
<italic>t </italic>
=
<italic></italic>
3 and decreases to 232 at
<italic>t </italic>
=
<italic></italic>
9 (
<xref rid="tab1" ref-type="table">Table 1</xref>
). Most isolates are members of a single large clique at
<italic>t </italic>
=
<italic></italic>
0 and
<italic>t </italic>
=
<italic></italic>
1 (
<italic>D </italic>
>
<italic></italic>
0.98 in both cases); at
<italic>t </italic>
=
<italic></italic>
2,
<italic>D </italic>
=
<italic></italic>
0.513. The network becomes less dense at
<italic>t </italic>
=
<italic></italic>
3 (
<italic>D </italic>
=
<italic></italic>
0.079;
<xref rid="tab1" ref-type="table">Table 1</xref>
). As this network of 2,705 nodes remains too densely connected to be visualized and analyzed directly, we generated the
<italic>P</italic>
-network using the same data, with each node representing a phylum.
<xref ref-type="fig" rid="fig2">Figure 2</xref>
shows the
<italic>P</italic>
-network of the 2,705 genomes at
<italic>t </italic>
=
<italic></italic>
3 (dynamic view available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFmicrobes/">http://bioinformatics.org.au/tools/AFmicrobes/</ext-link>
). The width (thickness) of each edge represents the number of instances in which any two genomes (one from each phylum connected by the edge) have similarity
<italic>S</italic>
<italic>t</italic>
; the width is relative to the number of connected genome pairs between two phyla. Major phyla (e.g.,
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
,
<italic>Firmicutes</italic>
,
<italic>Actinobacteria</italic>
, and
<italic>Tenericutes</italic>
) are clearly separated at
<italic>t </italic>
=
<italic></italic>
3. The thickest edge (in red) is between the
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
(7,568 connected genome pairs; see
<xref ref-type="supplementary-material" rid="figS1">Fig. S1A</xref>
in the supplemental material), suggesting a high similarity among genomes between these groups. In addition, we also observed a large proportion of shared 25-mers between
<italic>Firmicutes</italic>
and each of the proteobacterial classes.</p>
<fig id="fig2" orientation="portrait" position="float">
<label>FIG 2</label>
<caption>
<p>
<italic>P</italic>
-network of 2,705 prokaryote genomes based on whole-genome data. The network was generated using
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
with
<italic>k </italic>
=
<italic></italic>
25 at
<italic>t </italic>
=
<italic></italic>
3. Each node represents a distinct phylum (or proteobacterial group), with major representative nodes labeled. Each edge between two nodes represents the number of genome pair connections between the two nodes. The thickness of each edge is proportional to the number of genome pairs with shared
<italic>k</italic>
-mers. The size of each node is proportional to the number of isolates within the phylum. The five representative
<italic>Proteobacteria</italic>
groups are labeled with the corresponding Greek characters. The highway of
<italic>k</italic>
-mer sharing between
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
is indicated in red. A dynamic view of this figure is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFmicrobes/">http://bioinformatics.org.au/tools/AFmicrobes/</ext-link>
.</p>
</caption>
<graphic xlink:href="sys0061822960002"></graphic>
</fig>
<supplementary-material content-type="local-data" id="figS1">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.2</object-id>
<label>FIG S1</label>
<p>Number of pair-wise genome connections (relative edge widths) between the phyla in each pair for the networks shown in (A) Fig. 2 (only five most abundant pairs are labeled) and (B) Fig. 3. Download
<inline-supplementary-material id="fS1" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296sf1.pdf" content-type="local-data">FIG S1, PDF file, 0.4 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
</sec>
<sec id="s2.2">
<title>Phylogenomic signal contributed by rRNA genes.</title>
<p>To determine the contribution of the rRNA genes to our AF networks, we first excluded from our set of 2,705 unique genomes (see above) the 89 genomes that did not have gene annotation, and we excluded from the remaining 2,616 all rRNA gene sequences based on annotated start and stop coordinates (see Materials and Methods). The density of the
<italic>I</italic>
-network of genomes from which rRNA genes have been removed was lower than in the
<italic>I</italic>
-network inferred using the whole data set. Similarly to what we observed for the
<italic>I</italic>
-networks described in the previous section, here, at
<italic>t </italic>
=
<italic></italic>
0,
<italic>c </italic>
=
<italic></italic>
2,615 and
<italic>z </italic>
=
<italic></italic>
1,226, and these values decreased to
<italic>c </italic>
=
<italic></italic>
1,290 and
<italic>z </italic>
=
<italic></italic>
47 at
<italic>t </italic>
=
<italic></italic>
9 (
<xref rid="tab2" ref-type="table">Table 2</xref>
). At
<italic>t </italic>
=
<italic></italic>
3, the
<italic>I</italic>
-network of the rRNA gene-free network had a network density of
<italic>D </italic>
=
<italic></italic>
0.026, 3-fold lower than the
<italic>D </italic>
=
<italic></italic>
0.079 in the whole-genome network (
<xref rid="tab1" ref-type="table">Table 1</xref>
).
<xref ref-type="fig" rid="fig3">Figure 3</xref>
shows the
<italic>P</italic>
-network of these 2,616 genomes at
<italic>t </italic>
=
<italic></italic>
3 (dynamic view available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFmicrobes/">http://bioinformatics.org.au/tools/AFmicrobes/</ext-link>
). As in
<xref ref-type="fig" rid="fig2">Fig. 2</xref>
, the thickest edge (in red), between
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
(
<xref ref-type="fig" rid="fig3">Fig. 3</xref>
), indicates the largest number of instances of shared
<italic>k</italic>
-mers between genomes from these two groups. This
<italic>P</italic>
-network is less dense than the equivalent network based on the whole data set (shown in
<xref ref-type="fig" rid="fig2">Fig. 2</xref>
). Although we observed fewer connections between phyla after removal of rRNA sequences from the genome data, many of the major connections observed in
<xref ref-type="fig" rid="fig2">Fig. 2</xref>
remained, e.g., the connections between
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
(404 connected genome pairs) and between
<italic>Actinobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
(57 connected genome pairs) (see
<xref ref-type="supplementary-material" rid="figS1">Fig. S1B</xref>
). Thus, the sharing of 25-mers contributing to these major connections extends beyond the rRNA genes commonly used as phylogenetic markers.</p>
<table-wrap id="tab2" orientation="portrait" position="float">
<label>TABLE 2</label>
<caption>
<p>Characteristics of the phylogenomic network of 2,616 prokaryote genomes based on complete genomes without rRNA genes</p>
</caption>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1">Threshold</th>
<th rowspan="1" colspan="1">No. of nonsingleton nodes,
<italic>c</italic>
</th>
<th rowspan="1" colspan="1">Density,
<italic>D</italic>
</th>
<th rowspan="1" colspan="1">Size of the maximal clique,
<italic>z</italic>
</th>
<th rowspan="1" colspan="1">No. of cliques,
<italic>n</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">2,615</td>
<td rowspan="1" colspan="1">0.490</td>
<td rowspan="1" colspan="1">1,226</td>
<td rowspan="1" colspan="1">Not available</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">2,597</td>
<td rowspan="1" colspan="1">0.219</td>
<td rowspan="1" colspan="1">548</td>
<td rowspan="1" colspan="1">Not available</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">2,555</td>
<td rowspan="1" colspan="1">0.072</td>
<td rowspan="1" colspan="1">367</td>
<td rowspan="1" colspan="1">164,221</td>
</tr>
<tr>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">2,394</td>
<td rowspan="1" colspan="1">0.026</td>
<td rowspan="1" colspan="1">220</td>
<td rowspan="1" colspan="1">5,379</td>
</tr>
<tr>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">2,182</td>
<td rowspan="1" colspan="1">0.012</td>
<td rowspan="1" colspan="1">159</td>
<td rowspan="1" colspan="1">5,139</td>
</tr>
<tr>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">1,959</td>
<td rowspan="1" colspan="1">0.006</td>
<td rowspan="1" colspan="1">117</td>
<td rowspan="1" colspan="1">631</td>
</tr>
<tr>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">1,761</td>
<td rowspan="1" colspan="1">0.004</td>
<td rowspan="1" colspan="1">74</td>
<td rowspan="1" colspan="1">299</td>
</tr>
<tr>
<td rowspan="1" colspan="1">7</td>
<td rowspan="1" colspan="1">1,591</td>
<td rowspan="1" colspan="1">0.003</td>
<td rowspan="1" colspan="1">62</td>
<td rowspan="1" colspan="1">120</td>
</tr>
<tr>
<td rowspan="1" colspan="1">8</td>
<td rowspan="1" colspan="1">1,460</td>
<td rowspan="1" colspan="1">0.003</td>
<td rowspan="1" colspan="1">59</td>
<td rowspan="1" colspan="1">117</td>
</tr>
<tr>
<td rowspan="1" colspan="1">9</td>
<td rowspan="1" colspan="1">1,290</td>
<td rowspan="1" colspan="1">0.002</td>
<td rowspan="1" colspan="1">47</td>
<td rowspan="1" colspan="1">131</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig3" orientation="portrait" position="float">
<label>FIG 3</label>
<caption>
<p>
<italic>P</italic>
-network of 2,616 prokaryote genomes based on chromosomal sequences with rRNA genes removed. The network was generated using
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
with
<italic>k </italic>
=
<italic></italic>
25 at
<italic>t </italic>
=
<italic></italic>
3; only nonsingleton nodes are shown. Each edge between two nodes represents the number of connections between isolates from the two phyla; the thickness of each edge is proportional to the number of genome pairs with shared
<italic>k</italic>
-mers. The size of each node is proportional to the number of isolates within the phylum. Singletons are not shown. The five representative
<italic>Proteobacteria</italic>
groups are labeled with the corresponding Greek characters. The highway of
<italic>k</italic>
-mer sharing between
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
is indicated in red. A dynamic view of this figure is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFmicrobes/">http://bioinformatics.org.au/tools/AFmicrobes/</ext-link>
.</p>
</caption>
<graphic xlink:href="sys0061822960003"></graphic>
</fig>
<p>A network computed using only the rRNA sequences was denser than the two corresponding
<named-content content-type="genus-species">I</named-content>
-networks described above. At
<italic>t </italic>
=
<italic></italic>
6,
<italic>D</italic>
was high at 0.635 (
<italic>z </italic>
=
<italic></italic>
1,321; see
<xref ref-type="supplementary-material" rid="tabS1">Table S1</xref>
in the supplemental material) compared to 0.005 (
<italic>z </italic>
=
<italic></italic>
82) and 0.004 (
<italic>z </italic>
=
<italic></italic>
74) in the
<italic>I</italic>
-networks based on whole-genome and rRNA gene-removed data, respectively.
<xref ref-type="supplementary-material" rid="figS2">Figure S2</xref>
shows the
<italic>P</italic>
-network of 2,616 genome isolates based solely on rRNA genes at
<italic>t </italic>
=
<italic></italic>
6. Although almost all phyla were connected to each other (
<italic>c </italic>
=
<italic></italic>
2,613 and
<italic>z </italic>
=
<italic></italic>
1,321 at
<italic>t </italic>
=
<italic></italic>
6), we observed a clear separation between the
<italic>Archaea</italic>
and
<italic>Bacteria</italic>
. These results imply that rRNA gene sequences contain sufficient information to distinguish
<italic>Archaea</italic>
from
<italic>Bacteria</italic>
by the use of a
<italic>k</italic>
-mer approach, but separation of bacterial phyla would require further tuning of
<italic>k</italic>
and
<italic>t</italic>
.</p>
<supplementary-material content-type="local-data" id="figS2">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.3</object-id>
<label>FIG S2</label>
<p>
<italic>P</italic>
-network of 2,616 prokaryote genomes using
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
with
<italic>k </italic>
=
<italic></italic>
25 based on rRNA genes only, at
<italic>t </italic>
=
<italic></italic>
6. Each edge between two nodes represents one or more connections between isolates from the two phyla. Archaeal phyla (labeled) are clearly separated from bacterial phyla. Download
<inline-supplementary-material id="fS2" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296sf2.pdf" content-type="local-data">FIG S2, PDF file, 2.1 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
<supplementary-material content-type="local-data" id="tabS1">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.6</object-id>
<label>TABLE S1</label>
<p>Characteristics of the phylogenomic network of 2,616 prokaryote genomes based on rRNA genes only. Download
<inline-supplementary-material id="tS1" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296st1.pdf" content-type="local-data">Table S1, PDF file, 0.03 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
</sec>
<sec id="s2.3">
<title>Phylogenomic signal contributed by plasmid genomes.</title>
<p>Among the genome data records available to this study, 921 (representing 26 phyla) include sequence annotated as arising from one or more extrachromosomal plasmids. To examine the phylogenomic signal contributed by these plasmids, we computed
<italic>I</italic>
- and
<italic>P</italic>
-networks using only the plasmid sequences for these 921 isolates (see Materials and Methods).
<xref ref-type="fig" rid="fig4">Figure 4</xref>
shows the
<italic>I</italic>
-network of the 921 plasmid genomes at
<italic>t </italic>
=
<italic></italic>
0, in which
<italic>D = </italic>
0.025 (
<italic>c </italic>
=
<italic></italic>
745 and
<italic>z </italic>
=
<italic></italic>
48;
<xref rid="tab3" ref-type="table">Table 3</xref>
); a dynamic view is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFmicrobes/">http://bioinformatics.org.au/tools/AFmicrobes/</ext-link>
. Most phyla appear as distinct cliques, but, notably, there are edges between
<italic>Proteobacteria</italic>
and
<italic>Actinobacteria</italic>
and between
<italic>Proteobacteria</italic>
and
<italic>Firmicutes</italic>
. At
<italic>t </italic>
=
<italic></italic>
4, most phyla are separated as distinct cliques, with the exception of
<italic>Epsilonproteobacteria</italic>
and
<italic>Firmicutes</italic>
; the other
<italic>Proteobacteria</italic>
(
<italic>Alphaproteobacteria</italic>
,
<italic>Betaproteobacteria</italic>
,
<italic>Deltaproteobacteria</italic>
, and
<italic>Gammaproteobacteria</italic>
) are in a distinct paraclique. The
<italic>Euryarchaeota</italic>
, connected only to the bacterial phylum
<italic>Planctomycetes</italic>
at
<italic>t </italic>
=
<italic></italic>
0, is separated from
<italic>Bacteria</italic>
at
<italic>t </italic>
<italic></italic>
1. All phyla are disjoint at
<italic>t </italic>
=
<italic></italic>
7. These results are not surprising, as the plasmid genomes can have a narrow host range (
<xref rid="B38" ref-type="bibr">38</xref>
,
<xref rid="B39" ref-type="bibr">39</xref>
) and are known to evolve faster than the core genomes (
<xref rid="B40" ref-type="bibr">40</xref>
); in combination with their smaller genome size, fewer shared
<italic>k</italic>
-mers are observed at a given similarity threshold (
<xref rid="B41" ref-type="bibr">41</xref>
).</p>
<fig id="fig4" orientation="portrait" position="float">
<label>FIG 4</label>
<caption>
<p>
<italic>I</italic>
-network of 921 plasmid genomes. The network was generated using
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
with
<italic>k </italic>
=
<italic></italic>
25 at
<italic>t </italic>
=
<italic></italic>
0. Each edge between two nodes represents evidence of shared
<italic>k</italic>
-mers. A dynamic view of this figure is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFmicrobes/">http://bioinformatics.org.au/tools/AFmicrobes/</ext-link>
.</p>
</caption>
<graphic xlink:href="sys0061822960004"></graphic>
</fig>
<table-wrap id="tab3" orientation="portrait" position="float">
<label>TABLE 3</label>
<caption>
<p>Characteristics of the phylogenomic network of 921 prokaryote genomes based on plasmid sequences only</p>
</caption>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
<col width="" span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1">Threshold</th>
<th rowspan="1" colspan="1">No. of nonsingleton nodes,
<italic>c</italic>
</th>
<th rowspan="1" colspan="1">Density,
<italic>D</italic>
</th>
<th rowspan="1" colspan="1">Size of the maximal clique,
<italic>z</italic>
</th>
<th rowspan="1" colspan="1">No. of cliques,
<italic>n</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">745</td>
<td rowspan="1" colspan="1">0.025</td>
<td rowspan="1" colspan="1">48</td>
<td rowspan="1" colspan="1">20,557</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">718</td>
<td rowspan="1" colspan="1">0.021</td>
<td rowspan="1" colspan="1">46</td>
<td rowspan="1" colspan="1">13,272</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">680</td>
<td rowspan="1" colspan="1">0.017</td>
<td rowspan="1" colspan="1">45</td>
<td rowspan="1" colspan="1">3,925</td>
</tr>
<tr>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">648</td>
<td rowspan="1" colspan="1">0.014</td>
<td rowspan="1" colspan="1">39</td>
<td rowspan="1" colspan="1">1,406</td>
</tr>
<tr>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">601</td>
<td rowspan="1" colspan="1">0.011</td>
<td rowspan="1" colspan="1">34</td>
<td rowspan="1" colspan="1">800</td>
</tr>
<tr>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">556</td>
<td rowspan="1" colspan="1">0.009</td>
<td rowspan="1" colspan="1">30</td>
<td rowspan="1" colspan="1">589</td>
</tr>
<tr>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">499</td>
<td rowspan="1" colspan="1">0.006</td>
<td rowspan="1" colspan="1">25</td>
<td rowspan="1" colspan="1">368</td>
</tr>
<tr>
<td rowspan="1" colspan="1">7</td>
<td rowspan="1" colspan="1">439</td>
<td rowspan="1" colspan="1">0.004</td>
<td rowspan="1" colspan="1">13</td>
<td rowspan="1" colspan="1">122</td>
</tr>
<tr>
<td rowspan="1" colspan="1">8</td>
<td rowspan="1" colspan="1">353</td>
<td rowspan="1" colspan="1">0.002</td>
<td rowspan="1" colspan="1">11</td>
<td rowspan="1" colspan="1">26</td>
</tr>
<tr>
<td rowspan="1" colspan="1">9</td>
<td rowspan="1" colspan="1">245</td>
<td rowspan="1" colspan="1">0.001</td>
<td rowspan="1" colspan="1">9</td>
<td rowspan="1" colspan="1">14</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For each genome pair, we further compared its
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
distance derived from whole genome data set to those derived from distinct genome components (
<xref ref-type="supplementary-material" rid="figS3">Fig. S3</xref>
). Distances derived from rRNA sequences are almost always smaller than the distances derived from the overall data set. The reverse trend is observed for distances derived from chromosomal sequences with rRNAs removed (although a one-to-one relationship is observed) and to a greater extent for those derived from plasmid sequences.</p>
<supplementary-material content-type="local-data" id="figS3">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.4</object-id>
<label>FIG S3</label>
<p>Relationship of pairwise
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
distances derived from whole-genome data sets with those derived from distinct genome components. The composite plot is shown (A); individual plots are shown for (B) chromosomal sequences with rRNAs removed, (C) rRNA sequences, and (D) plasmid sequences. Download
<inline-supplementary-material id="fS3" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296sf3.pdf" content-type="local-data">FIG S3, PDF file, 2.4 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
</sec>
<sec id="s2.4">
<title>Network comparison.</title>
<p>
<xref ref-type="fig" rid="fig5">Figure 5</xref>
shows the density
<italic>D</italic>
for all four
<italic>I</italic>
-networks as a function of threshold
<italic>t</italic>
. For all networks, the network density decreases as
<italic>t</italic>
increases. At
<italic>t </italic>
>
<italic></italic>
2, the rRNA gene-only network is denser than the others, with
<italic>D</italic>
remaining >0.63 through
<italic>t </italic>
=
<italic></italic>
6, compared to
<italic>D </italic>
<
<italic></italic>
0.02 for the others at
<italic>t > </italic>
3. As expected, the highest density of the complete-genome network is observed at
<italic>t </italic>
<
<italic></italic>
2;
<italic>D </italic>
>
<italic></italic>
0.98 and decreases rapidly at 2 < 
<italic>t </italic>
<
<italic></italic>
5. The network without rRNA genes exhibits a lower density,
<italic>D </italic>
<
<italic></italic>
0.5, at
<italic>t </italic>
=
<italic></italic>
0, and by
<italic>t </italic>
=
<italic></italic>
5,
<italic>D</italic>
has decreased to a level similar to that calculated for the complete-genome network (
<italic>D </italic>
<
<italic></italic>
0.01). Together with our observed pairwise genome distances based on distinct genome components (
<xref ref-type="supplementary-material" rid="figS3">Fig. S3</xref>
), these results confirm that rRNA sequences (as captured by 25-mers) are more highly conserved than are the genome sequences overall. The data corresponding to the whole-genome and rRNA-free networks differ through a similar range of network densities, whereas data corresponding to the rRNA gene network differ at a higher threshold (i.e.,
<italic>t </italic>
>
<italic></italic>
5). The plasmid network shows the lowest density, with
<italic>D </italic>
<
<italic></italic>
0.03 at
<italic>t </italic>
<italic></italic>
0 (
<xref ref-type="fig" rid="fig5">Fig. 5</xref>
), indicating that these plasmid genomes are more diverse in 25-mer composition than are the corresponding main genomes. The results presented in
<xref ref-type="fig" rid="fig5">Fig. 5</xref>
provide a guide for visualization and comparison of these networks at the appropriate
<italic>t</italic>
values. In this study, we chose
<italic>t</italic>
values that would yield a clear separation of
<italic>Bacteria</italic>
and
<italic>Archaea</italic>
; thus, we used
<italic>t </italic>
=
<italic></italic>
3 for visualizing the two networks shown in
<xref ref-type="fig" rid="fig2">Fig. 2</xref>
and
<xref ref-type="fig" rid="fig3">3</xref>
(i.e., the use of the same
<italic>t</italic>
value for both networks is purely coincidental) and
<italic>t </italic>
=
<italic></italic>
0 for the plasmid network shown in
<xref ref-type="fig" rid="fig4">Fig. 4</xref>
.</p>
<fig id="fig5" orientation="portrait" position="float">
<label>FIG 5</label>
<caption>
<p>Density of alignment-free phylogenomic networks. Network density (
<italic>D</italic>
) across distinct threshold levels of
<italic>t</italic>
is shown for each
<italic>I</italic>
-network based on complete genomic data sets (core genomes with rRNAs plus plasmids), rRNA genes only, chromosomal sequences without rRNA genes, and plasmid sequences only. The density of a four-node network is illustrated for
<italic>D </italic>
=
<italic></italic>
0.0, 0.5, and 1.0 on the left, and the stringency of the threshold
<italic>t</italic>
is shown at the bottom.</p>
</caption>
<graphic xlink:href="sys0061822960005"></graphic>
</fig>
<p>To assess the (individual) contributions of rRNA genes and plasmids to the relatedness among the distinct phyla, we calculated for each phylum pair a connectedness value
<italic>C</italic>
, representing the proportion of genome pairs that share one or more
<italic>k</italic>
-mers over all possible genome pairs in the two phyla (see Materials and Methods). As shown in the heat map summaries (
<xref ref-type="fig" rid="fig6">Fig. 6</xref>
), the hierarchical clustering of
<italic>C</italic>
values does not conform to known phyletic relationships; e.g., proteobacterial groups are not unified a single cluster. In the all-inclusive genome network (
<xref ref-type="fig" rid="fig6">Fig. 6A</xref>
), the archaeal phyla (
<italic>Crenarchaeota</italic>
and
<italic>Euryarchaeota</italic>
) are not clearly separated from the
<italic>Bacteria</italic>
phylum and show substantial connectedness with
<italic>Tenericutes</italic>
(
<italic>C </italic>
>
<italic></italic>
0.63) and
<italic>Chlamydiae</italic>
(
<italic>C </italic>
>
<italic></italic>
0.54). The highest mean
<italic>C</italic>
value (0.85) was observed in the network consisting only of rRNA genes (
<xref ref-type="fig" rid="fig6">Fig. 6B</xref>
), with
<italic>Archaea</italic>
and
<italic>Bacteria</italic>
clearly separated.
<italic>Crenarchaeota</italic>
shows substantial connectedness (
<italic>C </italic>
>
<italic></italic>
0.5) with 11 bacterial phyla, compared to
<italic>Euryarchaeota</italic>
with 6; both cases include
<italic>Deinococcus</italic>
-
<italic>Thermus</italic>
,
<italic>Aquificae</italic>
, and
<italic>Thermotogae</italic>
. The removal of rRNA genes from the genome sequences appears to have removed most of the connectedness among phyla (mean
<italic>C </italic>
=
<italic></italic>
0.05 in
<xref ref-type="fig" rid="fig6">Fig. 6C</xref>
), with the maximum
<italic>C </italic>
=
<italic></italic>
0.59 between
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
. Even less phylum-level connectedness was observed in the plasmid-only network (
<xref ref-type="fig" rid="fig6">Fig. 6D</xref>
; mean
<italic>C </italic>
=
<italic></italic>
0.002), with maximum
<italic>C </italic>
=
<italic></italic>
0.029 between
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
. These results indicate the complications of inferring a tree-like structure among these taxa using genome-wide
<italic>k</italic>
-mers and that whole-genome and plasmid sequences capture phyletic relatedness that is distinct from that captured by the rRNA genes. Remarkably, chromosomal sequences, apart from rRNA genes, although usually representing more than 99% of the genome sequences, contribute little to overall phylogenetic signal.</p>
<fig id="fig6" orientation="portrait" position="float">
<label>FIG 6</label>
<caption>
<p>Phylum connectedness based on shared
<italic>k</italic>
-mers. Summary data representing phylum connectedness (
<italic>C</italic>
) in a heat map for each
<italic>P</italic>
-network reconstructed based on (A) complete genomic data sets at
<italic>t </italic>
=
<italic></italic>
2, (B) rRNA gene sequences only at
<italic>t </italic>
=
<italic></italic>
1, (C) chromosomal sequences without rRNA genes at
<italic>t </italic>
=
<italic></italic>
1, and (D) plasmid sequences only at
<italic>t </italic>
=
<italic></italic>
0 are shown.</p>
</caption>
<graphic xlink:href="sys0061822960006"></graphic>
</fig>
</sec>
<sec id="s2.5">
<title>Core
<italic>k</italic>
-mers of microbial genera.</title>
<p>We define a core
<italic>k</italic>
-mer in a group of interest as a
<italic>k</italic>
-mer that is present in every genome within the group, e.g., a core 25-mer in
<italic>Proteobacteria</italic>
is present in all proteobacterial genomes in our database (see Materials and Methods). We identified core 25-mers for each genus in our 2,783-genome data set. Of these 699 genera, 497 are represented by only a single genome isolate, and a further 51 consist of highly divergent genomes for which no core 25-mers were identified; we exclude these data from this part of analysis. The remaining 151 genera for which core 25-mers were identified are listed in
<xref ref-type="supplementary-material" rid="tabS2">Table S2</xref>
. As these genera are represented in our data set by different numbers of isolates, we define
<italic>K</italic>
as the number of distinct core
<italic>k</italic>
-mers per isolate for each genus; this value can help describe the extent of genome divergence (and thus the evolutionary rate of these genomes) within each of these genera. Thus, the three genomes representing genus
<italic>Azotobacter</italic>
show the highest number of core
<italic>k</italic>
-mers, and
<italic>K </italic>
=
<italic></italic>
1,722,079; these genomes represent distinct isolates of the same species,
<named-content content-type="genus-species">Azotobacter vinelandii</named-content>
. This is in contrast to the 123
<italic>Streptococcus</italic>
genomes (in 27 described species), which share only one core
<italic>k</italic>
-mer (
<italic>K </italic>
=
<italic></italic>
0.01). Among the 20 genera with the greatest
<italic>K</italic>
values,
<italic>Shigella</italic>
is represented here by the greatest number of distinct isolates (10 from four species), and
<italic>K </italic>
=
<italic></italic>
33,698. This number compares to
<italic>K </italic>
=
<italic></italic>
4.82 among the 11
<italic>Ralstonia</italic>
genome isolates from three species. Thus, these
<italic>Shigella</italic>
genomes have diverged much less from their common ancestor than have these
<italic>Ralstonia</italic>
genomes from theirs, as assessed by shared 25-mers. This result also lends support to the earlier discovery of extensive gene dispersal among six genomes of
<named-content content-type="genus-species">Ralstonia solanacearum</named-content>
(of the 11
<italic>Ralstonia</italic>
isolates in our data set) (
<xref rid="B42" ref-type="bibr">42</xref>
).</p>
<supplementary-material content-type="local-data" id="tabS2">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.7</object-id>
<label>TABLE S2</label>
<p>Core
<italic>k</italic>
-mers identified in 151 genera of prokaryotes. Download
<inline-supplementary-material id="tS2" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296st2.pdf" content-type="local-data">Table S2, PDF file, 0.1 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
</sec>
<sec id="s2.6">
<title>Core functions of microbial phyla.</title>
<p>To relate the shared
<italic>k</italic>
-mers to biological functions, for all 25-mers in these 2,783 genomes we organized the genome coordinates of each instance, and the biological function annotated for the gene product encoded at those coordinates, in a relational database. Functional annotation was based on Clusters of Orthologous Groups (COGs) (
<xref rid="B43" ref-type="bibr">43</xref>
). Then, using the list of core 25-mers described above, we grouped these 25-mers by taxon, focusing on protein-coding sequences (i.e., rRNA sequences were discarded; see Materials and Methods). This yielded a set of core 25-mers for 112 genera in 16 phyla; the corresponding COG functional categories for these core 25-mers are shown in
<xref ref-type="supplementary-material" rid="tabS3">Table S3</xref>
. The noninformative functional categories R (general function prediction only) and S (function unknown) were excluded from subsequent analyses. No core
<italic>k</italic>
-mer in our data set was found to be associated with functional category Y (nuclear structure). Functional categories represented at <1% of core
<italic>k</italic>
-mers in each genus included category A (RNA processing and modification), category B (chromatin structure and dynamics), category W (extracellular structure), and category Z (cytoskeleton).</p>
<supplementary-material content-type="local-data" id="tabS3">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.8</object-id>
<label>TABLE S3</label>
<p>Number of core
<italic>k</italic>
-mers in 112 genera of prokaryotes, based on their annotated function in COG functional categories. Download
<inline-supplementary-material id="tS3" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296st3.pdf" content-type="local-data">Table S3, PDF file, 0.2 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
<p>We found core
<italic>k</italic>
-mers associated with functional category A only in the proteobacterial classes
<italic>Alphaproteobacteria</italic>
,
<italic>Betaproteobacteria</italic>
,
<italic>Gammaproteobacteria</italic>
, and
<italic>Deltaproteobacteria</italic>
(i.e., not in the
<italic>Epsilonproteobacteria</italic>
) and in phylum
<italic>Actinobacteria</italic>
and those associated with functional category B only in phyla
<italic>Chloroflexi</italic>
,
<italic>Euryarchaeota</italic>
and
<italic>Thaumarchaeota</italic>
.
<xref ref-type="fig" rid="fig7">Figure 7</xref>
shows the proportions of the five most-frequent COG categories associated with core 25-mers across the 23 COG categories for 16 phyla. Categories E (amino acid metabolism and transport) and C (energy production and conversion) are among the five most abundant categories in 15 and 13 phyla, respectively. The
<italic>Epsilonproteobacteria</italic>
,
<italic>Thaumarchaeota</italic>
,
<italic>Euryarchaeota</italic>
,
<italic>Actinobacteria</italic>
,
<italic>Cyanobacteria</italic>
, and
<italic>Chloroflexi</italic>
represent the only phyla with category H (coenzyme metabolism) among the five most abundant. For the phyla
<italic>Tenericutes</italic>
,
<italic>Deinococcus</italic>
-
<italic>Thermus</italic>
,
<italic>Firmicutes</italic>
and
<italic>Crenarchaeota</italic>
, the most-represented functional categories include P (inorganic ion transport and metabolism), L (replication and repair), J (translation), E (amino acid transport and metabolism), and G (carbohydrate metabolism and transport).
<italic>Bacteroidetes</italic>
is the only phylum for which categories O (posttranslational modification, protein turnover, and chaperone functions), Q (secondary structure), and F (nucleotide metabolism and transport) are among the top five. Phylum
<italic>Spirochaetes</italic>
is the only one with U (intracellular trafficking and secretion) and T (signal transduction) among the five most abundant, but very few COGs are associated with core 25-mers.</p>
<fig id="fig7" orientation="portrait" position="float">
<label>FIG 7</label>
<caption>
<p>Functions of core
<italic>k</italic>
-mers in microbial taxa. A
<italic>P</italic>
-network of 2,616 prokaryote genomes using
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
with
<italic>k </italic>
=
<italic></italic>
25 based on chromosomal sequences with rRNA genes removed at
<italic>t </italic>
=
<italic></italic>
3 is shown. At each node (phylum) where core
<italic>k</italic>
-mers are available, a pie chart representing the COG categories annotated for these core
<italic>k</italic>
-mers is shown. Only the top five COG categories and the corresponding numbers of core
<italic>k</italic>
-mers are shown for each phylum; in most cases, the top five categories account for >50% of the annotated core
<italic>k</italic>
-mers. The categories that are significantly enriched in a phylum (
<italic>P ≤ </italic>
0.05) are noted with an asterisk (*). Each edge between two nodes represents the number of connections between isolates from the two phyla; the thickness of each edge is proportional to the number of shared
<italic>k</italic>
-mers. The five representative
<italic>Proteobacteria</italic>
groups are labeled with the corresponding Greek characters.</p>
</caption>
<graphic xlink:href="sys0061822960007"></graphic>
</fig>
<p>Comparing the annotated core
<italic>k</italic>
-mers in each phylum to all annotated core
<italic>k</italic>
-mers, 11 of the 25 COG functional categories are significantly enriched in
<italic>Gammaproteobacteria</italic>
(Fisher’s exact test, Benjamini-Hochberg [
<xref rid="B44" ref-type="bibr">44</xref>
]-adjusted
<italic>P ≤ </italic>
0.05), 9 in
<italic>Alphaproteobacteria</italic>
, 9 in
<italic>Actinobacteria</italic>
, 8 in
<italic>Deltaproteobacteria</italic>
, and 7 in
<italic>Firmicutes</italic>
(
<xref ref-type="supplementary-material" rid="tabS4">Table S4</xref>
). This observation may be due to the large (73.6%) representation of taxa of these phyla in the overall 2,783 data set: 1,163 (41.8%)
<italic>Proteobacteria</italic>
, 601 (21.6%)
<italic>Firmicutes</italic>
, and 285 (10.2%)
<italic>Actinobacteria</italic>
(
<xref ref-type="supplementary-material" rid="dataS1">Data Set S1</xref>
). In comparison, category L (replication and repair) is enriched (
<italic>P = </italic>
7.55 × 10
<sup>−6</sup>
) among the core
<italic>k</italic>
-mers of
<italic>Tenericutes</italic>
and category M (cell wall/membrane/envelop biogenesis;
<italic>P = </italic>
4.88 × 10
<sup>−9</sup>
) in
<italic>Euryarchaeota</italic>
. These results suggest a more prominent conservation of these functions in these phyla than in the others, indicating their importance.</p>
<supplementary-material content-type="local-data" id="tabS4">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.9</object-id>
<label>TABLE S4</label>
<p>Comparison of annotated core
<italic>k</italic>
-mers in each phylum against all annotated core
<italic>k</italic>
-mers, based on COG functional categories. Download
<inline-supplementary-material id="tS4" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296st4.pdf" content-type="local-data">Table S4, PDF file, 0.1 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
<p>In order to determine whether the phyla can be clustered based on their COG-category profiles, we performed a series of principal-component analyses (PCA). PCA of the raw data (e.g., of nonnormalized counts of COG number) did not reveal any particular clustering (
<xref ref-type="supplementary-material" rid="figS4">Fig. S4A</xref>
), nor did PCA of the clusters of genera classified according to the number of isolates (
<xref ref-type="supplementary-material" rid="figS4">Fig. S4B</xref>
).
<xref ref-type="supplementary-material" rid="figS4">Figure S4C</xref>
shows the results of PCA performed on the normalized counts of COG numbers in a centered scale (e.g., COG categories with equal weights). In this analysis,
<italic>Nitrosopumilus</italic>
, the only genus in phylum
<italic>Thaumarchaeota</italic>
represented in this data set, is isolated from the other genera, as is genus
<italic>Dehalococcoides</italic>
, a member of phylum
<italic>Chloroflexi</italic>
. These results confirm that the different numbers of isolates per genus do not bias our analysis of functional categories but that some phyla can be distinguished from others.</p>
<supplementary-material content-type="local-data" id="figS4">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.5</object-id>
<label>FIG S4</label>
<p>Principal-component analysis (PCA) of core
<italic>k</italic>
-mers and their annotated COG categories, based on core
<italic>k</italic>
-mers in each (A) phylum and (B) genus. Results of PCA of core
<italic>k</italic>
-mers in each phylum performed on the normalized counts of COG categories in the centered scale are shown (C). Download
<inline-supplementary-material id="fS4" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296sf4.pdf" content-type="local-data">FIG S4, PDF file, 0.7 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
</sec>
</sec>
<sec sec-type="discussion" id="s3">
<title>DISCUSSION</title>
<p>Phylogenetic studies have long been based on multiple-sequence alignment (thus the implicit assumption of full-length contiguity), from which a phylogenetic tree is inferred. A tree-like structure is an unrealistic representation of microbial evolution due to complications of horizontal signal caused by genome rearrangements and lateral genetic transfer (
<xref rid="B33" ref-type="bibr">33</xref>
,
<xref rid="B45" ref-type="bibr">45</xref>
,
<xref rid="B46" ref-type="bibr">46</xref>
). In this study, we demonstrated that AF approaches can be used to infer phylogenetic networks quickly for large-scale microbial whole-genome data (see also
<xref ref-type="supplementary-material" rid="textS1">Text S1</xref>
in the supplemental material). Our results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure. We introduce for the first time the concept of a
<italic>k</italic>
-mer similarity network and two types of AF networks, the
<italic>I</italic>
- and
<italic>P</italic>
-networks. We show that by combining a
<italic>k</italic>
-mer approach with the use of a relational database, biological information can be accessed efficiently for large-scale data. Finally, we define core
<italic>k</italic>
-mers as consisting of those
<italic>k</italic>
-mers present in every isolate genome of a genus (or other taxon), following the concept of core genes (
<xref rid="B47" ref-type="bibr">47</xref>
,
<xref rid="B48" ref-type="bibr">48</xref>
).</p>
<supplementary-material content-type="local-data" id="textS1">
<object-id pub-id-type="doi">10.1128/mSystems.00257-18.1</object-id>
<label>TEXT S1</label>
<p>Computational scalability and runtime of AF phylogenomics. Download
<inline-supplementary-material id="txS1" mimetype="application" mime-subtype="pdf" xlink:href="sys006182296s1.pdf" content-type="local-data">Text S1, PDF file, 0.2 MB</inline-supplementary-material>
.</p>
<permissions>
<copyright-statement>Copyright © 2018 Bernard et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Bernard et al.</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This content is distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link>
.</license-p>
</license>
</permissions>
</supplementary-material>
<p>We examined the contributions of rRNA genes and plasmids to the phylogenomic signal among microbial genomes. As expected, rRNA genes contribute to the signal captured by 25-mers, as they do in MSA-based approaches. However, the pattern of network density versus threshold (
<xref ref-type="fig" rid="fig5">Fig. 5</xref>
) clearly indicates the different extents of sequence conservation in the distinct genomic regions. Our demonstration that, in general, the signal contributed by rRNA genes is not by itself sufficient to resolve relationships among (and sometimes within) bacterial phyla is in line with many previous studies (
<xref rid="B2" ref-type="bibr">2</xref>
,
<xref rid="B6" ref-type="bibr">6</xref>
,
<xref rid="B49" ref-type="bibr">49</xref>
,
<xref rid="B50" ref-type="bibr">50</xref>
). The low density of the plasmid
<italic>k</italic>
-mer network also confirms that plasmids tend to be taxon specific (
<xref rid="B41" ref-type="bibr">41</xref>
). In all our AF networks, phyletic relatedness based on shared
<italic>k</italic>
-mers is often strongest between proteobacterial classes, in particular, between the
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
, and many 25-mers are shared between the
<italic>Actinobacteria</italic>
and
<italic>Proteobacteria</italic>
or
<italic>Firmicutes</italic>
across all networks. Lateral genetic transfer between lineages of
<italic>Betaproteobacteria</italic>
and
<italic>Gammaproteobacteria</italic>
(
<xref rid="B14" ref-type="bibr">14</xref>
), identified in earlier studies based on MSA (
<xref rid="B14" ref-type="bibr">14</xref>
,
<xref rid="B51" ref-type="bibr">51</xref>
) and
<italic>k</italic>
-mers (
<xref rid="B11" ref-type="bibr">11</xref>
), partly explains this strong similarity in our networks.</p>
<p>Overall, the
<italic>I</italic>
- and
<italic>P</italic>
-networks provide a quick overview of the evolutionary relationships among whole genomes, or subsets of genomes, in large-scale data sets. The
<italic>I</italic>
-networks capture evolutionary dynamics (e.g., divergence and lateral genetic transfer) and relatedness among individual genomes, providing a fine-scale overview of shared genetic elements among these genomes. The
<italic>P</italic>
-networks capture phyletic relatedness and illustrate the magnitude of the sharing of
<italic>k</italic>
-mers (and genetic elements) among these groups at a deeper evolutionary timescale.</p>
<p>Assignment of taxonomic rank to groups of bacteria has long been considered fraught (
<xref rid="B52" ref-type="bibr">52</xref>
<xref ref-type="bibr" rid="B53"></xref>
<xref rid="B54" ref-type="bibr">54</xref>
), and there is no generally accepted way to extract taxonomic rank from trees. This undertaking is further complicated by the imbalance in the number of isolates per higher taxon. Our
<italic>k</italic>
-mer similarity networks provide an alternative way to explore the evolutionary dynamics of microbial genomes that tracks taxonomic rank. In our phylogenomic network based on 2,705 complete genomic data sets, at threshold at
<italic>t </italic>
<
<italic></italic>
3, domains
<italic>Archaea</italic>
and
<italic>Bacteria</italic>
appear as separate regions of dense connection within the AF graph. At 3 ≤
<italic>t</italic>
≤ 5, phyla (e.g.,
<italic>Proteobacteria</italic>
and
<italic>Firmicutes</italic>
) emerge. We see classes (e.g., of
<italic>Proteobacteria</italic>
) at 4 ≤
<italic>t</italic>
≤ 6 and structure between and/or within genera (e.g.,
<named-content content-type="genus-species">Escherichia coli</named-content>
and
<italic>Shigella</italic>
) at
<italic>t </italic>
>
<italic></italic>
6. Our
<italic>k</italic>
-mer phylogenomic network allows dynamic genome-scale exploration of the taxonomic rank.</p>
<p>Relating the identified core 25-mers for each genus to annotated functions of the corresponding genes identifies highly conserved functions. Although we took great care to use only 2,873 completely sequenced and annotated prokaryote genomes (and excluded draft, fragmented genomes with suboptimal annotations), we cannot dismiss entirely the possible impact of technical errors or inconsistencies of the genome annotation process (e.g., due to chains of functional inference) on these data sets. However, the annotated functions of core
<italic>k</italic>
-mers represent conservation at a finer scale than those based on full-length sequence comparisons and remain biologically relevant. Across the phyla represented in our data set, functions (identified on this basis) associated with the metabolism and transport of amino acids, and with the production and conversion of energy, are the ones most frequently encountered. Perhaps not surprisingly, we observed that phyla that share many 25-mers also exhibit similar core functional profiles. Our analysis reveals that the core functions highly conserved in
<italic>Epsilonproteobacteria</italic>
and in
<italic>Deltaproteobacteria</italic>
are distinct from those conserved in the other proteobacterial classes. Except for the two most highly conserved categories (see above), the
<italic>Epsilonproteobacteria</italic>
do not share highly conserved functions with the other classes of
<italic>Proteobacteria</italic>
; indeed, the
<italic>Epsilonproteobacteria</italic>
share more 25-mers with the
<italic>Firmicutes</italic>
and with the
<italic>Actinobacteria</italic>
than with other
<italic>Proteobacteria</italic>
. These results support those of previous single-gene phylogenetic analyses revealing
<italic>Epsilonproteobacteria</italic>
to be the most basal proteobacterial lineage and are consistent with
<italic>Epsilonproteobacteria</italic>
having been the last class in this phylum to have been recognized (
<xref rid="B55" ref-type="bibr">55</xref>
). Finally, we also observed that phylum
<italic>Tenericutes</italic>
is among the only phyla that do not have highly conserved functions related to energy production and conversion; this can be related to their parasitic or commensal lifestyle (
<xref rid="B56" ref-type="bibr">56</xref>
). These results demonstrate that analysis of conserved
<italic>k</italic>
-mers can identify molecular mechanisms and functions that characterize evolutionary diversification within and among microbial taxa.</p>
<p>No core 25-mers were recovered for 51 of these 699 genera, particularly those represented by genome sequences for many isolates from different species. For such genera, a core
<italic>k</italic>
-mer set might be sought at lower values of
<italic>k</italic>
, although at the potential risk of including signal from false positives and background noise (i.e., nonhomologous
<italic>k</italic>
-mers). Similarly, some phyla that we pointed out as sharing highly conserved functions have few distinct COGs related to core 25-mers.</p>
</sec>
<sec sec-type="materials|methods" id="s4">
<title>MATERIALS AND METHODS</title>
<sec id="s4.1">
<title>Data.</title>
<p>In total, 2,785 completely sequenced genomes of
<italic>Bacteria</italic>
and
<italic>Archaea</italic>
were downloaded from NCBI on 31 January 2016 (
<xref ref-type="supplementary-material" rid="dataS1">Data Set S1</xref>
); two of these were identified as “multispecies” and “multi-isolate” and were thus excluded. Functional annotation of the remaining 2,783 genomes was obtained through the corresponding RefSeq records. Genes encoding ribosomal RNAs were identified based on annotation. Genomes with no annotation information were excluded from our rRNA-gene network. Of the 2,783 isolates, 921 contained plasmids; these plasmid genomes were used in the plasmid-only network.</p>
</sec>
<sec id="s4.2">
<title>Relational database of
<italic>k</italic>
-mers and genome features.</title>
<p>We extracted 10,059,526,408 distinct 25-mers from the genomes of 4,401 bacterial and archaeal isolates (present as of 31 January 2016 in NCBI RefSeq), of which 2,783 genomes were complete and included in our subsequent analysis (see above). We organized these
<italic>k</italic>
-mers, and their genomic locations and features (based on RefSeq annotations), in a relational database using SQL, following the method of Greenfield and Roehm (
<xref rid="B37" ref-type="bibr">37</xref>
). Tables in the database contain a list of isolates, lists of genes and their sequences, coherent taxonomic information for each isolate, an indexed list of all 25-mers, an indexed list of gene-by-gene comparisons for each pair of genes, and an indexed list of genome-by-genome comparisons for each pair of genomes.</p>
</sec>
<sec id="s4.3">
<title>Alignment-free (AF) network.</title>
<p>We followed the method of Bernard et al. (
<xref rid="B29" ref-type="bibr">29</xref>
) in generating the AF networks. We first computed pairwise comparisons for the 2,783 isolates and generated for each comparison the corresponding
<italic>D</italic>
<sub>2</sub>
<sup>
<italic>S</italic>
</sup>
distance (
<xref rid="B15" ref-type="bibr">15</xref>
) value
<italic>d</italic>
, using 25-mers across parallel central processing units (CPUs). For a pair of genomes
<italic>a</italic>
and
<italic>b</italic>
, we transformed
<italic>d</italic>
into a similarity measure
<italic>S
<sub>ab</sub>
</italic>
, where
<italic>S
<sub>ab</sub>
</italic>
=10-
<italic>d</italic>
. For instance, considering two highly similar genomes of
<italic>a</italic>
and
<italic>b</italic>
for which distance
<italic>d
<sub>ab</sub>
</italic>
= 0.001, the similarity measure
<italic>S
<sub>ab</sub>
</italic>
= 9.999. Likewise, considering two highly dissimilar genomes of
<italic>a</italic>
and
<italic>b</italic>
for which
<italic>d
<sub>ab</sub>
</italic>
= 9.925,
<italic>S
<sub>ab</sub>
</italic>
= 0.075. We ignored any edge for which
<italic>d </italic>
>
<italic></italic>
10 (i.e., for which the
<italic>S</italic>
value was negative), as the corresponding pair of sequences shares only ≤0.01% of 25-mers (i.e., 25-mers capture little evidence of homology). We then generated the networks using JSON files containing the
<italic>S</italic>
values as input for a Javascript script using the D3 library (
<ext-link ext-link-type="uri" xlink:href="https://d3js.org/">https://d3js.org/</ext-link>
). Here, we present two types of AF networks. For a phylum-level depiction of the network (
<italic>P</italic>
-network), we grouped all sequences of the same phylum as a single entity prior to calculating the distance; each phylum is represented by a node in the network. The width of the edge between two nodes represents the number of connections between isolates from these two phyla, and the size of each node is proportional to the number of isolates in the phylum. For an isolate-level depiction of the network (
<italic>I</italic>
-network) we treated each genome isolate as a single entity (i.e., node). In this network, an edge between two nodes indicates evidence of shared
<italic>k</italic>
-mers. The AF networks include a similarity-score threshold
<italic>t</italic>
, for which only edges with
<italic>S</italic>
>
<italic>t</italic>
are displayed; changing
<italic>t</italic>
therefore can dynamically change the structure of the network (
<xref rid="B29" ref-type="bibr">29</xref>
). The resulting dynamic networks can be visualized using any web browser. All of the networks are available at
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.14264/uql.2017.436">https://doi.org/10.14264/uql.2017.436</ext-link>
.</p>
</sec>
<sec id="s4.4">
<title>Network density and phylum connectedness.</title>
<p>For a network with
<italic>x</italic>
nodes, there are
<italic>e</italic>
possible edges (potential connections), where
<inline-formula id="IE15">
<alternatives>
<inline-graphic xlink:href="sys00618-2296-mu1.jpg"></inline-graphic>
<mml:math id="i1">
<mml:mi>e</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:math>
</alternatives>
</inline-formula>
. For a network containing
<italic>y</italic>
edges (actual connections), the density value
<italic>D</italic>
was calculated as
<inline-formula id="IE16">
<alternatives>
<inline-graphic xlink:href="sys00618-2296-mu2.jpg"></inline-graphic>
<mml:math id="i2">
<mml:mfrac>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
</alternatives>
</inline-formula>
(
<xref ref-type="fig" rid="fig1">Fig. 1</xref>
). For a pair of phyla
<italic>a</italic>
and
<italic>b</italic>
, their connectedness value
<italic>C
<sub>ab</sub>
</italic>
is
<inline-formula id="IE17">
<alternatives>
<inline-graphic xlink:href="sys00618-2296-mu3.jpg"></inline-graphic>
<mml:math id="i3">
<mml:mi mathvariant="normal"> </mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
</alternatives>
</inline-formula>
, where
<italic>g</italic>
is the number of genome pairs (between phyla
<italic>a</italic>
and
<italic>b</italic>
) that share one or more
<italic>k</italic>
-mers and
<italic>G</italic>
is the number of all possible genome pairs between phyla
<italic>a</italic>
and
<italic>b</italic>
. In this case,
<italic>G</italic>
=
<italic>N
<sub>a</sub>
</italic>
×
<italic>N
<sub>b</sub>
</italic>
, where
<italic>N
<sub>a</sub>
</italic>
and
<italic>N
<sub>b</sub>
</italic>
represent the number of genomes or isolates in phylum
<italic>a</italic>
and phylum
<italic>b</italic>
, respectively. For each network,
<italic>C</italic>
values were calculated at the optimal threshold
<italic>t</italic>
for which the connectedness signal is neither too strong nor too weak across all phylum pair comparisons. To avoid potential biases of incomplete taxon sampling, here we restricted our comparisons to phyla that have ≥10 genomes.</p>
</sec>
<sec id="s4.5">
<title>Core
<italic>k</italic>
-mers and COG categories.</title>
<p>For a specific group of microbial isolates (representing, e.g., a genus or a phylum), we extracted the set of the 25-mers that are found in all isolates within the group; we define this set of 25-mers as the core
<italic>k</italic>
-mers for the corresponding group. Using the relational database of
<italic>k</italic>
-mers (see above), we identified for these core 25-mers their corresponding genome locations and function based on COG (Clusters of Orthologous Groups) (
<xref rid="B57" ref-type="bibr">57</xref>
) annotations in RefSeq records. We generated profiles of COG functional categories for each of the 151 genera, for each of the 11 phyla, and for the five proteobacterial classes in which core
<italic>k</italic>
-mers were identified using our approach.</p>
</sec>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGMENTS</title>
<p>This project was supported by an Australian Research Council grant (DP150101875) awarded to M.A.R. and C.X.C. and by a James S. McDonnell Foundation grant awarded to M.A.R. This work was supported by computational resources of the National Computational Infrastructure (NCI) National Facility systems through the NCI Merit Allocation Scheme (Project d85). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</p>
<p>G.B. implemented the analysis workflow, conducted the experiments, and prepared the first draft of the manuscript. G.B. and C.X.C. prepared all figures and tables. P.G. provided the
<italic>k</italic>
-mer database and contributed to analyses using this database. G.B., C.X.C., and M.A.R. conceived the study, designed the experiments, and analyzed and interpreted the results. All of us prepared, wrote, reviewed, commented on, and approved the final manuscript.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<label>1.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name name-style="western">
<surname>de Bary</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>1884</year>
<source>Vergleichende Morphologie und Biologie der Pilze Mycetozoen und Bacterien</source>
.
<publisher-name>Engelmann</publisher-name>
,
<publisher-loc>Leipzig, Germany</publisher-loc>
.</mixed-citation>
</ref>
<ref id="B2">
<label>2.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Woese</surname>
<given-names>CR</given-names>
</name>
</person-group>
<year>1987</year>
<article-title>Bacterial evolution</article-title>
.
<source>Microbiol Rev</source>
<volume>51</volume>
:
<fpage>221</fpage>
<lpage>271</lpage>
.
<pub-id pub-id-type="pmid">2439888</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<label>3.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Bartlett</surname>
<given-names>JMS</given-names>
</name>
,
<name name-style="western">
<surname>Stirling</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>A short history of the polymerase chain reaction</article-title>
.
<source>Methods Mol Biol</source>
<volume>226</volume>
:
<fpage>3</fpage>
<lpage>6</lpage>
. doi:
<pub-id pub-id-type="doi">10.1385/1-59259-384-4:3</pub-id>
.
<pub-id pub-id-type="pmid">12958470</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<label>4.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Brown</surname>
<given-names>JR</given-names>
</name>
,
<name name-style="western">
<surname>Masuchi</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Robb</surname>
<given-names>FT</given-names>
</name>
,
<name name-style="western">
<surname>Doolittle</surname>
<given-names>WF</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>Evolutionary relationships of bacterial and archaeal glutamine synthetase genes</article-title>
.
<source>J Mol Evol</source>
<volume>38</volume>
:
<fpage>566</fpage>
<lpage>576</lpage>
. doi:
<pub-id pub-id-type="doi">10.1007/BF00175876</pub-id>
.
<pub-id pub-id-type="pmid">7916055</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<label>5.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Hug</surname>
<given-names>LA</given-names>
</name>
,
<name name-style="western">
<surname>Baker</surname>
<given-names>BJ</given-names>
</name>
,
<name name-style="western">
<surname>Anantharaman</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
,
<name name-style="western">
<surname>Probst</surname>
<given-names>AJ</given-names>
</name>
,
<name name-style="western">
<surname>Castelle</surname>
<given-names>CJ</given-names>
</name>
,
<name name-style="western">
<surname>Butterfield</surname>
<given-names>CN</given-names>
</name>
,
<name name-style="western">
<surname>Hernsdorf</surname>
<given-names>AW</given-names>
</name>
,
<name name-style="western">
<surname>Amano</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Ise</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Suzuki</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Dudek</surname>
<given-names>N</given-names>
</name>
,
<name name-style="western">
<surname>Relman</surname>
<given-names>DA</given-names>
</name>
,
<name name-style="western">
<surname>Finstad</surname>
<given-names>KM</given-names>
</name>
,
<name name-style="western">
<surname>Amundson</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Thomas</surname>
<given-names>BC</given-names>
</name>
,
<name name-style="western">
<surname>Banfield</surname>
<given-names>JF</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>A new view of the tree of life</article-title>
.
<source>Nat Microbiol</source>
<volume>1</volume>
:
<fpage>16048</fpage>
. doi:
<pub-id pub-id-type="doi">10.1038/nmicrobiol.2016.48</pub-id>
.
<pub-id pub-id-type="pmid">27572647</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<label>6.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Forterre</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>The universal tree of life: an update</article-title>
.
<source>Front Microbiol</source>
<volume>6</volume>
:
<fpage>717</fpage>
. doi:
<pub-id pub-id-type="doi">10.3389/fmicb.2015.00717</pub-id>
.
<pub-id pub-id-type="pmid">26257711</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<label>7.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kunin</surname>
<given-names>V</given-names>
</name>
,
<name name-style="western">
<surname>Goldovsky</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Darzentas</surname>
<given-names>N</given-names>
</name>
,
<name name-style="western">
<surname>Ouzounis</surname>
<given-names>CA</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>The net of life: reconstructing the microbial phylogenetic network</article-title>
.
<source>Genome Res</source>
<volume>15</volume>
:
<fpage>954</fpage>
<lpage>959</lpage>
. doi:
<pub-id pub-id-type="doi">10.1101/gr.3666505</pub-id>
.
<pub-id pub-id-type="pmid">15965028</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<label>8.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Rivera</surname>
<given-names>MC</given-names>
</name>
,
<name name-style="western">
<surname>Lake</surname>
<given-names>JA</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>The ring of life provides evidence for a genome fusion origin of eukaryotes</article-title>
.
<source>Nature</source>
<volume>431</volume>
:
<fpage>152</fpage>
<lpage>155</lpage>
. doi:
<pub-id pub-id-type="doi">10.1038/nature02848</pub-id>
.
<pub-id pub-id-type="pmid">15356622</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<label>9.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Lake</surname>
<given-names>JA</given-names>
</name>
,
<name name-style="western">
<surname>Servin</surname>
<given-names>JA</given-names>
</name>
,
<name name-style="western">
<surname>Herbold</surname>
<given-names>CW</given-names>
</name>
,
<name name-style="western">
<surname>Skophammer</surname>
<given-names>RG</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Evidence for a new root of the tree of life</article-title>
.
<source>Syst Biol</source>
<volume>57</volume>
:
<fpage>835</fpage>
<lpage>843</lpage>
. doi:
<pub-id pub-id-type="doi">10.1080/10635150802555933</pub-id>
.
<pub-id pub-id-type="pmid">19085327</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<label>10.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fournier</surname>
<given-names>GP</given-names>
</name>
,
<name name-style="western">
<surname>Huang</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Gogarten</surname>
<given-names>JP</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Horizontal gene transfer from extinct and extant lineages: biological innovation and the coral of life</article-title>
.
<source>Philos Trans R Soc Lond B Biol Sci</source>
<volume>364</volume>
:
<fpage>2229</fpage>
<lpage>2239</lpage>
. doi:
<pub-id pub-id-type="doi">10.1098/rstb.2009.0033</pub-id>
.
<pub-id pub-id-type="pmid">19571243</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<label>11.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Cong</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>YB</given-names>
</name>
,
<name name-style="western">
<surname>Phillips</surname>
<given-names>CA</given-names>
</name>
,
<name name-style="western">
<surname>Langston</surname>
<given-names>MA</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>Robust inference of genetic exchange communities from microbial genomes using TF-IDF</article-title>
.
<source>Front Microbiol</source>
<volume>8</volume>
:
<fpage>21</fpage>
. doi:
<pub-id pub-id-type="doi">10.3389/fmicb.2017.00021</pub-id>
.
<pub-id pub-id-type="pmid">28154557</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<label>12.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Doolittle</surname>
<given-names>WF</given-names>
</name>
,
<name name-style="western">
<surname>Bapteste</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Pattern pluralism and the Tree of Life hypothesis</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>104</volume>
:
<fpage>2043</fpage>
<lpage>2049</lpage>
. doi:
<pub-id pub-id-type="doi">10.1073/pnas.0610699104</pub-id>
.
<pub-id pub-id-type="pmid">17261804</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<label>13.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Dagan</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Martin</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Getting a better picture of microbial evolution en route to a network of genomes</article-title>
.
<source>Philos Trans R Soc Lond B Biol Sci</source>
<volume>364</volume>
:
<fpage>2187</fpage>
<lpage>2196</lpage>
. doi:
<pub-id pub-id-type="doi">10.1098/rstb.2009.0040</pub-id>
.
<pub-id pub-id-type="pmid">19571239</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<label>14.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Beiko</surname>
<given-names>RG</given-names>
</name>
,
<name name-style="western">
<surname>Harlow</surname>
<given-names>TJ</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Highways of gene sharing in prokaryotes</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>102</volume>
:
<fpage>14332</fpage>
<lpage>14337</lpage>
. doi:
<pub-id pub-id-type="doi">10.1073/pnas.0504068102</pub-id>
.
<pub-id pub-id-type="pmid">16176988</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<label>15.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer</article-title>
.
<source>Sci Rep</source>
<volume>6</volume>
:
<fpage>28970</fpage>
. doi:
<pub-id pub-id-type="doi">10.1038/srep28970</pub-id>
.
<pub-id pub-id-type="pmid">27363362</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<label>16.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>YB</given-names>
</name>
,
<name name-style="western">
<surname>Chua</surname>
<given-names>XY</given-names>
</name>
,
<name name-style="western">
<surname>Cong</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Hogan</surname>
<given-names>JM</given-names>
</name>
,
<name name-style="western">
<surname>Maetschke</surname>
<given-names>SR</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<day>30</day>
<month>6</month>
<year>2017</year>
<article-title>Alignment-free inference of hierarchical and reticulate phylogenomic relationships</article-title>
.
<source>Brief Bioinform</source>
<fpage>bbx067</fpage>
. doi:
<pub-id pub-id-type="doi">10.1093/bib/bbx067</pub-id>
.</mixed-citation>
</ref>
<ref id="B17">
<label>17.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ren</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Bai</surname>
<given-names>X</given-names>
</name>
,
<name name-style="western">
<surname>Lu</surname>
<given-names>YY</given-names>
</name>
,
<name name-style="western">
<surname>Tang</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2018</year>
<article-title>Alignment-free sequence analysis and applications</article-title>
.
<source>Annu Rev Biomed Data Sci</source>
<volume>1</volume>
:
<fpage>93</fpage>
<lpage>114</lpage>
. doi:
<pub-id pub-id-type="doi">10.1146/annurev-biodatasci-080917-013431</pub-id>
.</mixed-citation>
</ref>
<ref id="B18">
<label>18.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Zielezinski</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Vinga</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Almeida</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Karlowski</surname>
<given-names>WM</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>Alignment-free sequence comparison: benefits, applications, and tools</article-title>
.
<source>Genome Biol</source>
<volume>18</volume>
:
<fpage>186</fpage>
. doi:
<pub-id pub-id-type="doi">10.1186/s13059-017-1319-7</pub-id>
.
<pub-id pub-id-type="pmid">28974235</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<label>19.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Saitou</surname>
<given-names>N</given-names>
</name>
,
<name name-style="western">
<surname>Nei</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>1987</year>
<article-title>The neighbor-joining method: a new method for reconstructing phylogenetic trees</article-title>
.
<source>Mol Biol Evol</source>
<volume>4</volume>
:
<fpage>406</fpage>
<lpage>425</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/oxfordjournals.molbev.a040454</pub-id>
.
<pub-id pub-id-type="pmid">3447015</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<label>20.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ali</surname>
<given-names>W</given-names>
</name>
,
<name name-style="western">
<surname>Rito</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Deane</surname>
<given-names>CM</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Alignment-free protein interaction network comparison</article-title>
.
<source>Bioinformatics</source>
<volume>30</volume>
:
<fpage>i430</fpage>
<lpage>i437</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/bioinformatics/btu447</pub-id>
.
<pub-id pub-id-type="pmid">25161230</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<label>21.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Cong</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>YB</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Exploring lateral genetic transfer among microbial genomes using TF-IDF</article-title>
.
<source>Sci Rep</source>
<volume>6</volume>
:
<fpage>29319</fpage>
. doi:
<pub-id pub-id-type="doi">10.1038/srep29319</pub-id>
.
<pub-id pub-id-type="pmid">27452976</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<label>22.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Cong</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>YB</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF</article-title>
.
<source>Sci Rep</source>
<volume>6</volume>
:
<fpage>30308</fpage>
. doi:
<pub-id pub-id-type="doi">10.1038/srep30308</pub-id>
.
<pub-id pub-id-type="pmid">27453035</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<label>23.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Posada</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Phylogenetic models of molecular evolution: next-generation data, fit, and performance</article-title>
.
<source>J Mol Evol</source>
<volume>76</volume>
:
<fpage>351</fpage>
<lpage>352</lpage>
. doi:
<pub-id pub-id-type="doi">10.1007/s00239-013-9566-z</pub-id>
.
<pub-id pub-id-type="pmid">23695649</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<label>24.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Biological intuition in alignment-free methods: response to Posada</article-title>
.
<source>J Mol Evol</source>
<volume>77</volume>
:
<fpage>1</fpage>
<lpage>2</lpage>
. doi:
<pub-id pub-id-type="doi">10.1007/s00239-013-9573-0</pub-id>
.
<pub-id pub-id-type="pmid">23877343</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<label>25.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fox</surname>
<given-names>GE</given-names>
</name>
,
<name name-style="western">
<surname>Pechman</surname>
<given-names>KR</given-names>
</name>
,
<name name-style="western">
<surname>Woese</surname>
<given-names>CR</given-names>
</name>
</person-group>
<year>1977</year>
<article-title>Comparative cataloging of 16S ribosomal ribonucleic acid: molecular approach to procaryotic systematics</article-title>
.
<source>Int J Syst Evol Microbiol</source>
<volume>27</volume>
:
<fpage>44</fpage>
<lpage>57</lpage>
. doi:
<pub-id pub-id-type="doi">10.1099/00207713-27-1-44</pub-id>
.</mixed-citation>
</ref>
<ref id="B26">
<label>26.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Woese</surname>
<given-names>CR</given-names>
</name>
,
<name name-style="western">
<surname>Fox</surname>
<given-names>GE</given-names>
</name>
</person-group>
<year>1977</year>
<article-title>Phylogenetic structure of the prokaryotic domain: the primary kingdoms</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>74</volume>
:
<fpage>5088</fpage>
<lpage>5090</lpage>
. doi:
<pub-id pub-id-type="doi">10.1073/pnas.74.11.5088</pub-id>
.
<pub-id pub-id-type="pmid">270744</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<label>27.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fox</surname>
<given-names>GE</given-names>
</name>
,
<name name-style="western">
<surname>Stackebrandt</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Hespell</surname>
<given-names>RB</given-names>
</name>
,
<name name-style="western">
<surname>Gibson</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Maniloff</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Dyer</surname>
<given-names>TA</given-names>
</name>
,
<name name-style="western">
<surname>Wolfe</surname>
<given-names>RS</given-names>
</name>
,
<name name-style="western">
<surname>Balch</surname>
<given-names>WE</given-names>
</name>
,
<name name-style="western">
<surname>Tanner</surname>
<given-names>RS</given-names>
</name>
,
<name name-style="western">
<surname>Magrum</surname>
<given-names>LJ</given-names>
</name>
,
<name name-style="western">
<surname>Zablen</surname>
<given-names>LB</given-names>
</name>
,
<name name-style="western">
<surname>Blakemore</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Gupta</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Bonen</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Lewis</surname>
<given-names>BJ</given-names>
</name>
,
<name name-style="western">
<surname>Stahl</surname>
<given-names>DA</given-names>
</name>
,
<name name-style="western">
<surname>Luehrsen</surname>
<given-names>KR</given-names>
</name>
,
<name name-style="western">
<surname>Chen</surname>
<given-names>KN</given-names>
</name>
,
<name name-style="western">
<surname>Woese</surname>
<given-names>CR</given-names>
</name>
</person-group>
<year>1980</year>
<article-title>The phylogeny of prokaryotes</article-title>
.
<source>Science</source>
<volume>209</volume>
:
<fpage>457</fpage>
<lpage>463</lpage>
. doi:
<pub-id pub-id-type="doi">10.1126/science.6771870</pub-id>
.
<pub-id pub-id-type="pmid">6771870</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<label>28.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Yi</surname>
<given-names>H</given-names>
</name>
,
<name name-style="western">
<surname>Jin</surname>
<given-names>L</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>
<italic>Co-phylog</italic>
: an assembly-free phylogenomic approach for closely related organisms</article-title>
.
<source>Nucleic Acids Res</source>
<volume>41</volume>
:
<elocation-id>e75</elocation-id>
. doi:
<pub-id pub-id-type="doi">10.1093/nar/gkt003</pub-id>
.
<pub-id pub-id-type="pmid">23335788</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<label>29.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Recapitulating phylogenies using
<italic>k</italic>
-mers: from trees to networks [version 2; referees: 2 approved]</article-title>
.
<source>F1000Research</source>
<volume>5</volume>
:
<fpage>2789</fpage>
. doi:
<pub-id pub-id-type="doi">10.12688/f1000research.10225.2</pub-id>
.
<pub-id pub-id-type="pmid">28105314</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<label>30.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Lu</surname>
<given-names>YY</given-names>
</name>
,
<name name-style="western">
<surname>Tang</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Ren</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Fuhrman</surname>
<given-names>JA</given-names>
</name>
,
<name name-style="western">
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>CAFE: a
<underline>C</underline>
celerated
<underline>A</underline>
lignment-
<underline>F</underline>
r
<underline>E</underline>
e sequence analysis</article-title>
.
<source>Nucleic Acids Res</source>
<volume>45</volume>
:
<fpage>W554</fpage>
<lpage>W559</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/nar/gkx351</pub-id>
.
<pub-id pub-id-type="pmid">28472388</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<label>31.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Leimeister</surname>
<given-names>CA</given-names>
</name>
,
<name name-style="western">
<surname>Boden</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Horwege</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Lindner</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Morgenstern</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Fast alignment-free sequence comparison using spaced-word frequencies</article-title>
.
<source>Bioinformatics</source>
<volume>30</volume>
:
<fpage>1991</fpage>
<lpage>1999</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/bioinformatics/btu177</pub-id>
.
<pub-id pub-id-type="pmid">24700317</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<label>32.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
,
<name name-style="western">
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Poirion</surname>
<given-names>O</given-names>
</name>
,
<name name-style="western">
<surname>Hogan</surname>
<given-names>JM</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Inferring phylogenies of evolving sequences without multiple sequence alignment</article-title>
.
<source>Sci Rep</source>
<volume>4</volume>
:
<fpage>6504</fpage>
. doi:
<pub-id pub-id-type="doi">10.1038/srep06504</pub-id>
.
<pub-id pub-id-type="pmid">25266120</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<label>33.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Next-generation phylogenomics</article-title>
.
<source>Biol Direct</source>
<volume>8</volume>
:
<fpage>3</fpage>
. doi:
<pub-id pub-id-type="doi">10.1186/1745-6150-8-3</pub-id>
.
<pub-id pub-id-type="pmid">23339707</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<label>34.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Wan</surname>
<given-names>L</given-names>
</name>
,
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Alignment-free sequence comparison (II): theoretical power of comparison statistics</article-title>
.
<source>J Comput Biol</source>
<volume>17</volume>
:
<fpage>1467</fpage>
<lpage>1490</lpage>
. doi:
<pub-id pub-id-type="doi">10.1089/cmb.2010.0056</pub-id>
.
<pub-id pub-id-type="pmid">20973742</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<label>35.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Chew</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Sun</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Alignment-free sequence comparison (I): statistics and power</article-title>
.
<source>J Comput Biol</source>
<volume>16</volume>
:
<fpage>1615</fpage>
<lpage>1634</lpage>
. doi:
<pub-id pub-id-type="doi">10.1089/cmb.2009.0198</pub-id>
.
<pub-id pub-id-type="pmid">20001252</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<label>36.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Skippington</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Within-species lateral genetic transfer and the evolution of transcriptional regulation in
<italic>Escherichia coli</italic>
and
<italic>Shigella</italic>
</article-title>
.
<source>BMC Genomics</source>
<volume>12</volume>
:
<fpage>532</fpage>
. doi:
<pub-id pub-id-type="doi">10.1186/1471-2164-12-532</pub-id>
.
<pub-id pub-id-type="pmid">22035052</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<label>37.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Greenfield</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Roehm</surname>
<given-names>U</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Answering biological questions by querying k-mer databases</article-title>
.
<source>Concurr Comput</source>
<volume>25</volume>
:
<fpage>497</fpage>
<lpage>509</lpage>
. doi:
<pub-id pub-id-type="doi">10.1002/cpe.2938</pub-id>
.</mixed-citation>
</ref>
<ref id="B38">
<label>38.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Jain</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Srivastava</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Broad host range plasmids</article-title>
.
<source>FEMS Microbiol Lett</source>
<volume>348</volume>
:
<fpage>87</fpage>
<lpage>96</lpage>
. doi:
<pub-id pub-id-type="doi">10.1111/1574-6968.12241</pub-id>
.
<pub-id pub-id-type="pmid">23980652</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<label>39.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Shintani</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Sanchez</surname>
<given-names>ZK</given-names>
</name>
,
<name name-style="western">
<surname>Kimbara</surname>
<given-names>K</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy</article-title>
.
<source>Front Microbiol</source>
<volume>6</volume>
:
<fpage>242</fpage>
. doi:
<pub-id pub-id-type="doi">10.3389/fmicb.2015.00242</pub-id>
.
<pub-id pub-id-type="pmid">25873913</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<label>40.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Harrison</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Guymer</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Spiers</surname>
<given-names>AJ</given-names>
</name>
,
<name name-style="western">
<surname>Paterson</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Brockhurst</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Parallel compensatory evolution stabilizes plasmids across the parasitism-mutualism continuum</article-title>
.
<source>Curr Biol</source>
<volume>25</volume>
:
<fpage>2034</fpage>
<lpage>2039</lpage>
. doi:
<pub-id pub-id-type="doi">10.1016/j.cub.2015.06.024</pub-id>
.
<pub-id pub-id-type="pmid">26190075</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<label>41.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fondi</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Fani</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>The horizontal flow of the plasmid resistome: clues from inter-generic similarity networks</article-title>
.
<source>Environ Microbiol</source>
<volume>12</volume>
:
<fpage>3228</fpage>
<lpage>3242</lpage>
. doi:
<pub-id pub-id-type="doi">10.1111/j.1462-2920.2010.02295.x</pub-id>
.
<pub-id pub-id-type="pmid">20636373</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<label>42.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Lefeuvre</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Cellier</surname>
<given-names>G</given-names>
</name>
,
<name name-style="western">
<surname>Remenant</surname>
<given-names>B</given-names>
</name>
,
<name name-style="western">
<surname>Chiroleu</surname>
<given-names>F</given-names>
</name>
,
<name name-style="western">
<surname>Prior</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Constraints on genome dynamics revealed from gene distribution among the
<italic>Ralstonia solanacearum</italic>
species</article-title>
.
<source>PLoS One</source>
<volume>8</volume>
:
<elocation-id>e63155</elocation-id>
. doi:
<pub-id pub-id-type="doi">10.1371/journal.pone.0063155</pub-id>
.
<pub-id pub-id-type="pmid">23723974</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<label>43.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Tatusov</surname>
<given-names>RL</given-names>
</name>
,
<name name-style="western">
<surname>Galperin</surname>
<given-names>MY</given-names>
</name>
,
<name name-style="western">
<surname>Natale</surname>
<given-names>DA</given-names>
</name>
,
<name name-style="western">
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>The COG database: a tool for genome-scale analysis of protein functions and evolution</article-title>
.
<source>Nucleic Acids Res</source>
<volume>28</volume>
:
<fpage>33</fpage>
<lpage>36</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/nar/28.1.33</pub-id>
.
<pub-id pub-id-type="pmid">10592175</pub-id>
</mixed-citation>
</ref>
<ref id="B44">
<label>44.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Benjamini</surname>
<given-names>Y</given-names>
</name>
,
<name name-style="western">
<surname>Hochberg</surname>
<given-names>Y</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>
.
<source>J R Stat Soc Series B Stat Methodol</source>
<volume>57</volume>
:
<fpage>289</fpage>
<lpage>300</lpage>
.</mixed-citation>
</ref>
<ref id="B45">
<label>45.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Soucy</surname>
<given-names>SM</given-names>
</name>
,
<name name-style="western">
<surname>Huang</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Gogarten</surname>
<given-names>JP</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Horizontal gene transfer: building the web of life</article-title>
.
<source>Nat Rev Genet</source>
<volume>16</volume>
:
<fpage>472</fpage>
<lpage>482</lpage>
. doi:
<pub-id pub-id-type="doi">10.1038/nrg3962</pub-id>
.
<pub-id pub-id-type="pmid">26184597</pub-id>
</mixed-citation>
</ref>
<ref id="B46">
<label>46.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
,
<name name-style="western">
<surname>Beiko</surname>
<given-names>RG</given-names>
</name>
,
<name name-style="western">
<surname>Darling</surname>
<given-names>AE</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Lateral transfer of genes and gene fragments in prokaryotes</article-title>
.
<source>Genome Biol Evol</source>
<volume>1</volume>
:
<fpage>429</fpage>
<lpage>438</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/gbe/evp044</pub-id>
.
<pub-id pub-id-type="pmid">20333212</pub-id>
</mixed-citation>
</ref>
<ref id="B47">
<label>47.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Lerat</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Daubin</surname>
<given-names>V</given-names>
</name>
,
<name name-style="western">
<surname>Moran</surname>
<given-names>NA</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria</article-title>
.
<source>PLoS Biol</source>
<volume>1</volume>
:
<elocation-id>e19</elocation-id>
. doi:
<pub-id pub-id-type="doi">10.1371/journal.pbio.0000019</pub-id>
.
<pub-id pub-id-type="pmid">12975657</pub-id>
</mixed-citation>
</ref>
<ref id="B48">
<label>48.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Daubin</surname>
<given-names>V</given-names>
</name>
,
<name name-style="western">
<surname>Gouy</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Perrière</surname>
<given-names>G</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history</article-title>
.
<source>Genome Res</source>
<volume>12</volume>
:
<fpage>1080</fpage>
<lpage>1090</lpage>
. doi:
<pub-id pub-id-type="doi">10.1101/gr.187002</pub-id>
.
<pub-id pub-id-type="pmid">12097345</pub-id>
</mixed-citation>
</ref>
<ref id="B49">
<label>49.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Dagan</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Martin</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>The tree of one percent</article-title>
.
<source>Genome Biol</source>
<volume>7</volume>
:
<fpage>118</fpage>
. doi:
<pub-id pub-id-type="doi">10.1186/gb-2006-7-10-118</pub-id>
.
<pub-id pub-id-type="pmid">17081279</pub-id>
</mixed-citation>
</ref>
<ref id="B50">
<label>50.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Pace</surname>
<given-names>NR</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Mapping the tree of life: progress and prospects</article-title>
.
<source>Microbiol Mol Biol Rev</source>
<volume>73</volume>
:
<fpage>565</fpage>
<lpage>576</lpage>
. doi:
<pub-id pub-id-type="doi">10.1128/MMBR.00033-09</pub-id>
.
<pub-id pub-id-type="pmid">19946133</pub-id>
</mixed-citation>
</ref>
<ref id="B51">
<label>51.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kloesges</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Popa</surname>
<given-names>O</given-names>
</name>
,
<name name-style="western">
<surname>Martin</surname>
<given-names>W</given-names>
</name>
,
<name name-style="western">
<surname>Dagan</surname>
<given-names>T</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths</article-title>
.
<source>Mol Biol Evol</source>
<volume>28</volume>
:
<fpage>1057</fpage>
<lpage>1074</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/molbev/msq297</pub-id>
.
<pub-id pub-id-type="pmid">21059789</pub-id>
</mixed-citation>
</ref>
<ref id="B52">
<label>52.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Olsen</surname>
<given-names>GJ</given-names>
</name>
,
<name name-style="western">
<surname>Woese</surname>
<given-names>CR</given-names>
</name>
,
<name name-style="western">
<surname>Overbeek</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>The winds of (evolutionary) change: breathing new life into microbiology</article-title>
.
<source>J Bacteriol</source>
<volume>176</volume>
:
<fpage>1</fpage>
<lpage>6</lpage>
. doi:
<pub-id pub-id-type="doi">10.1128/jb.176.1.1-6.1994</pub-id>
.
<pub-id pub-id-type="pmid">8282683</pub-id>
</mixed-citation>
</ref>
<ref id="B53">
<label>53.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Stanier</surname>
<given-names>RY</given-names>
</name>
,
<name name-style="western">
<surname>Van Niel</surname>
<given-names>CB</given-names>
</name>
</person-group>
<year>1941</year>
<article-title>The main outlines of bacterial classification</article-title>
.
<source>J Bacteriol</source>
<volume>42</volume>
:
<fpage>437</fpage>
<lpage>466</lpage>
.
<pub-id pub-id-type="pmid">16560462</pub-id>
</mixed-citation>
</ref>
<ref id="B54">
<label>54.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Wayne</surname>
<given-names>LG</given-names>
</name>
,
<name name-style="western">
<surname>Brenner</surname>
<given-names>DJ</given-names>
</name>
,
<name name-style="western">
<surname>Colwell</surname>
<given-names>RR</given-names>
</name>
,
<name name-style="western">
<surname>Grimont</surname>
<given-names>PAD</given-names>
</name>
,
<name name-style="western">
<surname>Kandler</surname>
<given-names>O</given-names>
</name>
,
<name name-style="western">
<surname>Krichevsky</surname>
<given-names>MI</given-names>
</name>
,
<name name-style="western">
<surname>Moore</surname>
<given-names>LH</given-names>
</name>
,
<name name-style="western">
<surname>Moore</surname>
<given-names>WEC</given-names>
</name>
,
<name name-style="western">
<surname>Murray</surname>
<given-names>RGE</given-names>
</name>
,
<name name-style="western">
<surname>Stackebrandt</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Starr</surname>
<given-names>MP</given-names>
</name>
,
<name name-style="western">
<surname>Truper</surname>
<given-names>HG</given-names>
</name>
</person-group>
<year>1987</year>
<article-title>Report of the ad hoc committee on reconciliation of approaches to bacterial systematics</article-title>
.
<source>Int J Syst Evol Microbiol</source>
<volume>37</volume>
:
<fpage>463</fpage>
<lpage>464</lpage>
. doi:
<pub-id pub-id-type="doi">10.1099/00207713-37-4-463</pub-id>
.</mixed-citation>
</ref>
<ref id="B55">
<label>55.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Trust</surname>
<given-names>TJ</given-names>
</name>
,
<name name-style="western">
<surname>Logan</surname>
<given-names>SM</given-names>
</name>
,
<name name-style="western">
<surname>Gustafson</surname>
<given-names>CE</given-names>
</name>
,
<name name-style="western">
<surname>Romaniuk</surname>
<given-names>PJ</given-names>
</name>
,
<name name-style="western">
<surname>Kim</surname>
<given-names>NW</given-names>
</name>
,
<name name-style="western">
<surname>Chan</surname>
<given-names>VL</given-names>
</name>
,
<name name-style="western">
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
,
<name name-style="western">
<surname>Guerry</surname>
<given-names>P</given-names>
</name>
,
<name name-style="western">
<surname>Gutell</surname>
<given-names>RR</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>Phylogenetic and molecular characterization of a 23S rRNA gene positions the genus
<italic>Campylobacter</italic>
in the epsilon subdivision of the
<italic>Proteobacteria</italic>
and shows that the presence of transcribed spacers is common in
<italic>Campylobacter</italic>
spp</article-title>
.
<source>J Bacteriol</source>
<volume>176</volume>
:
<fpage>4597</fpage>
<lpage>4609</lpage>
. doi:
<pub-id pub-id-type="doi">10.1128/jb.176.15.4597-4609.1994</pub-id>
.
<pub-id pub-id-type="pmid">8045890</pub-id>
</mixed-citation>
</ref>
<ref id="B56">
<label>56.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Skennerton</surname>
<given-names>CT</given-names>
</name>
,
<name name-style="western">
<surname>Haroon</surname>
<given-names>MF</given-names>
</name>
,
<name name-style="western">
<surname>Briegel</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Shi</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Jensen</surname>
<given-names>GJ</given-names>
</name>
,
<name name-style="western">
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
,
<name name-style="western">
<surname>Orphan</surname>
<given-names>VJ</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Phylogenomic analysis of
<italic>Candidatus</italic>
‘Izimaplasma’ species: free-living representatives from a
<italic>Tenericutes</italic>
clade found in methane seeps</article-title>
.
<source>ISME J</source>
<volume>10</volume>
:
<fpage>2679</fpage>
<lpage>2692</lpage>
. doi:
<pub-id pub-id-type="doi">10.1038/ismej.2016.55</pub-id>
.
<pub-id pub-id-type="pmid">27058507</pub-id>
</mixed-citation>
</ref>
<ref id="B57">
<label>57.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Powell</surname>
<given-names>S</given-names>
</name>
,
<name name-style="western">
<surname>Szklarczyk</surname>
<given-names>D</given-names>
</name>
,
<name name-style="western">
<surname>Trachana</surname>
<given-names>K</given-names>
</name>
,
<name name-style="western">
<surname>Roth</surname>
<given-names>A</given-names>
</name>
,
<name name-style="western">
<surname>Kuhn</surname>
<given-names>M</given-names>
</name>
,
<name name-style="western">
<surname>Muller</surname>
<given-names>J</given-names>
</name>
,
<name name-style="western">
<surname>Arnold</surname>
<given-names>R</given-names>
</name>
,
<name name-style="western">
<surname>Rattei</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Letunic</surname>
<given-names>I</given-names>
</name>
,
<name name-style="western">
<surname>Doerks</surname>
<given-names>T</given-names>
</name>
,
<name name-style="western">
<surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
,
<name name-style="western">
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
,
<name name-style="western">
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges</article-title>
.
<source>Nucleic Acids Res</source>
<volume>40</volume>
:
<fpage>D284</fpage>
<lpage>D289</lpage>
. doi:
<pub-id pub-id-type="doi">10.1093/nar/gkr1060</pub-id>
.
<pub-id pub-id-type="pmid">22096231</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001255 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001255 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:6247013
   |texte=   k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:30505941" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021