Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000C69 ( Pmc/Corpus ); précédent : 000C689; suivant : 000C700 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Recapitulating phylogenies using
<italic>k</italic>
-mers: from trees to networks</title>
<author>
<name sortKey="Bernard, Guillaume" sort="Bernard, Guillaume" uniqKey="Bernard G" first="Guillaume" last="Bernard">Guillaume Bernard</name>
<affiliation>
<nlm:aff id="a1">Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A." last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:aff id="a1">Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chan, Cheong Xin" sort="Chan, Cheong Xin" uniqKey="Chan C" first="Cheong Xin" last="Chan">Cheong Xin Chan</name>
<affiliation>
<nlm:aff id="a1">Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28105314</idno>
<idno type="pmc">5224691</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224691</idno>
<idno type="RBID">PMC:5224691</idno>
<idno type="doi">10.12688/f1000research.10225.2</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000C69</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000C69</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Recapitulating phylogenies using
<italic>k</italic>
-mers: from trees to networks</title>
<author>
<name sortKey="Bernard, Guillaume" sort="Bernard, Guillaume" uniqKey="Bernard G" first="Guillaume" last="Bernard">Guillaume Bernard</name>
<affiliation>
<nlm:aff id="a1">Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A." last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:aff id="a1">Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chan, Cheong Xin" sort="Chan, Cheong Xin" uniqKey="Chan C" first="Cheong Xin" last="Chan">Cheong Xin Chan</name>
<affiliation>
<nlm:aff id="a1">Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">F1000Research</title>
<idno type="eISSN">2046-1402</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared
<italic>k</italic>
-mers (subsequences at fixed length
<italic>k</italic>
). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using
<italic>k</italic>
-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Dayrat, B" uniqKey="Dayrat B">B Dayrat</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haeckel, E" uniqKey="Haeckel E">E Haeckel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haeckel, E" uniqKey="Haeckel E">E Haeckel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burkhardt, Rw" uniqKey="Burkhardt R">RW Burkhardt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fitch, Wm" uniqKey="Fitch W">WM Fitch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hall, Bk" uniqKey="Hall B">BK Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Notredame, C" uniqKey="Notredame C">C Notredame</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Notredame, C" uniqKey="Notredame C">C Notredame</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Darling, Ae" uniqKey="Darling A">AE Darling</name>
</author>
<author>
<name sortKey="Mikl S, I" uniqKey="Mikl S I">I Miklós</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beiko, Rg" uniqKey="Beiko R">RG Beiko</name>
</author>
<author>
<name sortKey="Harlow, Tj" uniqKey="Harlow T">TJ Harlow</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doolittle, Wf" uniqKey="Doolittle W">WF Doolittle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Puigb, P" uniqKey="Puigb P">P Puigbò</name>
</author>
<author>
<name sortKey="Lobkovsky, Ae" uniqKey="Lobkovsky A">AE Lobkovsky</name>
</author>
<author>
<name sortKey="Kristensen, Dm" uniqKey="Kristensen D">DM Kristensen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Adl, Sm" uniqKey="Adl S">SM Adl</name>
</author>
<author>
<name sortKey="Simpson, Ag" uniqKey="Simpson A">AG Simpson</name>
</author>
<author>
<name sortKey="Lane, Ce" uniqKey="Lane C">CE Lane</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spang, A" uniqKey="Spang A">A Spang</name>
</author>
<author>
<name sortKey="Saw, Jh" uniqKey="Saw J">JH Saw</name>
</author>
<author>
<name sortKey="J Rgensen, Sl" uniqKey="J Rgensen S">SL Jørgensen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bonham Carter, O" uniqKey="Bonham Carter O">O Bonham-Carter</name>
</author>
<author>
<name sortKey="Steele, J" uniqKey="Steele J">J Steele</name>
</author>
<author>
<name sortKey="Bastola, D" uniqKey="Bastola D">D Bastola</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haubold, B" uniqKey="Haubold B">B Haubold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cong, Y" uniqKey="Cong Y">Y Cong</name>
</author>
<author>
<name sortKey="Chan, Yb" uniqKey="Chan Y">YB Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Domazet Loso, M" uniqKey="Domazet Loso M">M Domazet-Lošo</name>
</author>
<author>
<name sortKey="Haubold, B" uniqKey="Haubold B">B Haubold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corel, E" uniqKey="Corel E">E Corel</name>
</author>
<author>
<name sortKey="Lopez, P" uniqKey="Lopez P">P Lopez</name>
</author>
<author>
<name sortKey="Meheust, R" uniqKey="Meheust R">R Méheust</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dagan, T" uniqKey="Dagan T">T Dagan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Bryant, D" uniqKey="Bryant D">D Bryant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Scornavacca, C" uniqKey="Scornavacca C">C Scornavacca</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kunin, V" uniqKey="Kunin V">V Kunin</name>
</author>
<author>
<name sortKey="Goldovsky, L" uniqKey="Goldovsky L">L Goldovsky</name>
</author>
<author>
<name sortKey="Darzentas, N" uniqKey="Darzentas N">N Darzentas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Poirion, O" uniqKey="Poirion O">O Poirion</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Chew, D" uniqKey="Chew D">D Chew</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wan, L" uniqKey="Wan L">L Wan</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Akman, L" uniqKey="Akman L">L Akman</name>
</author>
<author>
<name sortKey="Yamashita, A" uniqKey="Yamashita A">A Yamashita</name>
</author>
<author>
<name sortKey="Watanabe, H" uniqKey="Watanabe H">H Watanabe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Seshadri, R" uniqKey="Seshadri R">R Seshadri</name>
</author>
<author>
<name sortKey="Paulsen, It" uniqKey="Paulsen I">IT Paulsen</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dagan, T" uniqKey="Dagan T">T Dagan</name>
</author>
<author>
<name sortKey="Martin, W" uniqKey="Martin W">W Martin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenfield, P" uniqKey="Greenfield P">P Greenfield</name>
</author>
<author>
<name sortKey="Roehm, U" uniqKey="Roehm U">U Roehm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">F1000Res</journal-id>
<journal-id journal-id-type="iso-abbrev">F1000Res</journal-id>
<journal-id journal-id-type="pmc">F1000Research</journal-id>
<journal-title-group>
<journal-title>F1000Research</journal-title>
</journal-title-group>
<issn pub-type="epub">2046-1402</issn>
<publisher>
<publisher-name>F1000Research</publisher-name>
<publisher-loc>London, UK</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28105314</article-id>
<article-id pub-id-type="pmc">5224691</article-id>
<article-id pub-id-type="doi">10.12688/f1000research.10225.2</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Note</subject>
</subj-group>
<subj-group>
<subject>Articles</subject>
<subj-group>
<subject>Developmental Evolution</subject>
</subj-group>
<subj-group>
<subject>Evolutionary/Comparative Genetics</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Recapitulating phylogenies using
<italic>k</italic>
-mers: from trees to networks</article-title>
<fn-group content-type="pub-status">
<fn>
<p>[version 2; referees: 2 approved]</p>
</fn>
</fn-group>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Bernard</surname>
<given-names>Guillaume</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ragan</surname>
<given-names>Mark A.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chan</surname>
<given-names>Cheong Xin</given-names>
</name>
<xref ref-type="corresp" rid="c1">a</xref>
<xref ref-type="aff" rid="a1">1</xref>
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-3729-8176</contrib-id>
</contrib>
<aff id="a1">
<label>1</label>
Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia</aff>
</contrib-group>
<author-notes>
<corresp id="c1">
<label>a</label>
<email xlink:href="mailto:c.chan1@uq.edu.au">c.chan1@uq.edu.au</email>
</corresp>
<fn fn-type="con">
<p>GB, MAR and CXC conceived the study and designed the experiments. GB carried out the experiments. GB and CXC prepared the first draft of the manuscript. All authors were involved in the revision of the draft manuscript and have agreed to the final content.</p>
</fn>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>12</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>5</volume>
<elocation-id>2789</elocation-id>
<history>
<date date-type="accepted">
<day>20</day>
<month>12</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: © 2016 Bernard G et al.</copyright-statement>
<copyright-year>2016</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="f1000research-5-11322.pdf"></self-uri>
<abstract>
<p>Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared
<italic>k</italic>
-mers (subsequences at fixed length
<italic>k</italic>
). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using
<italic>k</italic>
-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>phylogenies</kwd>
<kwd>phylogenetic trees</kwd>
<kwd>phylogenetic networks</kwd>
<kwd>k-mers</kwd>
</kwd-group>
<funding-group>
<award-group id="fund-1">
<funding-source>Australian Research Council</funding-source>
<award-id>DP150101875</award-id>
</award-group>
<funding-statement>We thank funding support from the Australian Research Council (DP150101875) awarded to MAR and CXC, and a James S. McDonnell Foundation grant awarded to MAR.</funding-statement>
<funding-statement>
<italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
</funding-statement>
</funding-group>
</article-meta>
<notes notes-type="version-changes">
<sec sec-type="version-changes">
<label>Revised</label>
<title>Amendments from Version 1</title>
<p>In this revision, we have rewritten part of the Abstract and Introduction to clarify that (a) phylogenetic approaches based on multiple sequence alignment do not exclude inference of a network, (b) multiple sequence alignment is computationally demanding, and that (c) phylogenetic inference is complicated by non-treelike evolutionary processes that shape microbial genomes. In Results and discussion, we have now provided justification for using the 143-genome dataset in this work, and have made explicit the scope of this study. We have also cited a number of additional publications in the areas of phylogenetic networks and alignment-free methods.</p>
</sec>
</notes>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Ernst Haeckel coined the term
<italic>Phylogenie</italic>
to describe the series of morphological stages in the evolutionary history of an organism or group of organisms
<sup>
<xref rid="ref-1" ref-type="bibr">1</xref>
</sup>
. In his Tree of Life published 150 years ago
<sup>
<xref rid="ref-2" ref-type="bibr">2</xref>
</sup>
, Haeckel postulated that living organisms trace their evolutionary origin(s) along three distinct lineages (Plantae, Protista and Animalia) to a “common Moneran root of autogonous organisms”. In some (but not all) later works (e.g. in 1868
<sup>
<xref rid="ref-3" ref-type="bibr">3</xref>
</sup>
) he allowed that different Monera may have arisen independently by spontaneous generation. Either way, these views accord with the Larmackian notion of a built-in direction of evolution from morphologically simple “lower” organisms to more-complex “higher” forms
<sup>
<xref rid="ref-4" ref-type="bibr">4</xref>
</sup>
.</p>
<p>Haeckel through his “Biogenetic Law” advocated that “ontogeny recapitulates phylogeny”
<sup>
<xref rid="ref-2" ref-type="bibr">2</xref>
</sup>
: that the embryonic series of an organism is a record of its evolutionary history. Under this view, morphologies observed at different developmental stages of an organism resemble and represent the successive stages (including adult stages) of its ancestors over the course of evolution. Of course, he worked before the advent of genetics and the modern synthesis, and before it was appreciated that information on hereditary is carried by DNA and can be recovered by sequencing and statistical analysis. He could not have foreseen that these DNA sequences code for other biomolecules and control life processes, including his beloved developmental series and organismal phenotype, through vastly complex molecular webs of interactions. Nor could Haeckel have envisaged the scale of phylogenetic analysis that can be carried out today using these DNA sequences across multiple genomes, made possible by the advent of high-throughput sequencing and computing technologies.</p>
<p>Fast-forwarding 150 years, phylogenetic inference based on comparative analysis of biological sequences is now a common practice. The similarity among sequences is commonly interpreted as evidence of homology
<sup>
<xref rid="ref-5" ref-type="bibr">5</xref>
,
<xref rid="ref-6" ref-type="bibr">6</xref>
</sup>
, i.e. that they share a common ancestry. From the earliest days of molecular phylogenetics, multiple sequences have been aligned
<sup>
<xref rid="ref-7" ref-type="bibr">7</xref>
,
<xref rid="ref-8" ref-type="bibr">8</xref>
</sup>
to display this homology position-by-position along the length of the sequences. That is, the residues are arranged relative to each other such that the best available hypothesis of homology is achieved at every position (column) of the alignment. By default, it is assumed that the best alignment can be achieved simply by displaying the sequences in the same direction, and inserting gaps where needed (to represent insertions and deletions). This assumption is largely valid when working with highly conserved orthologs of any source, and with exons or proteins of morphologically complex eukaryotes. However, microbial genomes are often affected by recombination and rearrangement
<sup>
<xref rid="ref-9" ref-type="bibr">9</xref>
</sup>
, undermining the assumption of homology along adjacent positions, while lateral genetic transfer would not be represented by a common treelike process
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
<xref rid="ref-13" ref-type="bibr">13</xref>
</sup>
. As Haeckel observed when he drew his Tree
<sup>
<xref rid="ref-2" ref-type="bibr">2</xref>
</sup>
, biological evolution can be anything but straightforward, and these complications have become ever more-complicated
<sup>
<xref rid="ref-14" ref-type="bibr">14</xref>
,
<xref rid="ref-15" ref-type="bibr">15</xref>
</sup>
.</p>
<p>Alternative approaches for inferring and representing phylogenies are available. An attractive strategy that addresses the issue of full-length alignability is to compute relatedness among a set of sequences based on the number or extent of
<italic>k</italic>
-mers (short sub-sequences of a fixed length
<italic>k</italic>
) that they share. Such approaches avoid multiple sequence alignment, and for this reason are termed
<italic>alignment-free</italic>
. As opposed to heuristics in multiple sequence alignment, these methods provide exact solutions. Various modifications are available, e.g. the use of degenerate
<italic>k</italic>
-mers, scoring match lengths rather than
<italic>k</italic>
-mer composition, and grammar-based techniques; see recent reviews
<sup>
<xref rid="ref-16" ref-type="bibr">16</xref>
,
<xref rid="ref-17" ref-type="bibr">17</xref>
</sup>
for more detail. Methods for inferring lateral genetic transfer have also been developed
<sup>
<xref rid="ref-18" ref-type="bibr">18</xref>
,
<xref rid="ref-19" ref-type="bibr">19</xref>
</sup>
. Importantly, evolutionary relationships can also be depicted as a network, with taxa and relationships represented respectively as nodes and edges
<sup>
<xref rid="ref-20" ref-type="bibr">20</xref>
<xref rid="ref-24" ref-type="bibr">24</xref>
</sup>
, rather than as a strictly bifurcating tree. Using simulated and empirical sequence data, we recently demonstrated that alignment-free approaches can yield phylogenetic trees that are biologically meaningful
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
<xref rid="ref-27" ref-type="bibr">27</xref>
</sup>
. We find that these approaches are more robust to genome rearrangement and lateral genetic transfer, and are highly scalable
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
,
<xref rid="ref-26" ref-type="bibr">26</xref>
</sup>
, a much-desired feature given the current deluge of sequence data facing the research community
<sup>
<xref rid="ref-28" ref-type="bibr">28</xref>
</sup>
. Here we extend the alignment-free phylogenetic approaches on 143 bacterial and archaeal genomes to generate a network of phylogenetic relatedness, and assess biological implications of this network relative to the phylogenetic tree. The phylogenetic relationships among these genomes have been carefully studied using the standard approach based on multiple sequence alignment
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
</sup>
and an alignment-free approach
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
; this dataset thus provides a good reference for comparison.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<p>Using 143 complete genomes of Bacteria and Archaea
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
, we inferred the relatedness of these genome sequences using an alignment-free method based on the
<mml:math id="math1">
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>S</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
statistic
<sup>
<xref rid="ref-29" ref-type="bibr">29</xref>
,
<xref rid="ref-30" ref-type="bibr">30</xref>
</sup>
. We computed a
<mml:math id="math2">
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>S</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
distance,
<italic>d</italic>
for each possible pair of 143 genomes based on the presence of shared 25-mers using jD2Stat version 1.0 (
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/jD2Stat/">http://bioinformatics.org.au/tools/jD2Stat/</ext-link>
)
<sup>
<xref rid="ref-26" ref-type="bibr">26</xref>
</sup>
and following Bernard
<italic>et al.</italic>
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
. Here the distance
<italic>d</italic>
is normalised based on genome sizes and the probabilities that corresponding
<italic>k</italic>
-mers occur in the compared sequences
<sup>
<xref rid="ref-29" ref-type="bibr">29</xref>
,
<xref rid="ref-30" ref-type="bibr">30</xref>
</sup>
;
<italic>d</italic>
ranges between 0.0 (i.e. two genomes are identical) and 15.5 (< 0.0001% 25-mers are shared between the two genomes). For a pair of genomes
<italic>a</italic>
and
<italic>b</italic>
, we transformed
<italic>d
<sub>ab</sub>
</italic>
into a similarity measure
<italic>S
<sub>ab</sub>
</italic>
, in which
<italic>S
<sub>ab</sub>
</italic>
= 10 –
<italic>d
<sub>ab</sub>
</italic>
. We ignore instances of
<italic>d</italic>
>10, as these pairs of sequences share ≤ 0.01% of 25-mers (i.e. there is little evidence of homology). To visualise the phylogenetic relatedness of these genomes, we adopted the D3 JavaScript library for data-driven documents (
<ext-link ext-link-type="uri" xlink:href="https://d3js.org/">https://d3js.org/</ext-link>
). In this network, each node represents a genome, and an edge connecting two nodes represents the qualitative evidence of shared
<italic>k</italic>
-mers between them. We set a threshold function
<italic>t</italic>
for which only edges with
<italic>S</italic>
<italic>t</italic>
are displayed on the screen. Changing
<italic>t</italic>
dynamically changes the network structure. The resulting dynamic network is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFnetwork/">http://bioinformatics.org.au/tools/AFnetwork/</ext-link>
.</p>
</sec>
<sec>
<title>Results and discussion</title>
<p>
<xref ref-type="fig" rid="f1">Figure 1</xref>
shows the phylogenetic tree of the 143 Bacteria and Archaea genomes that we previously inferred using an alignment-free method based on the
<mml:math id="math3">
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>S</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
statistic
<sup>
<xref rid="ref-29" ref-type="bibr">29</xref>
,
<xref rid="ref-30" ref-type="bibr">30</xref>
</sup>
. In an earlier study
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
</sup>
, a supertree was generated for these genomes, summarising 22,432 protein phylogenies. Incongruence between the two trees was observed in 42% of the bipartitions, most of which are at terminal branches
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
. The alignment-free tree (
<xref ref-type="fig" rid="f1">Figure 1</xref>
) recovers 13 out of the 15 “backbone” nodes
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
</sup>
, distinct clades of Archaea and Bacteria, a monophyletic clade of Proteobacteria, and the lack of resolution between gamma- and beta-Proteobacteria, in agreement with previously published studies; as such, this tree captures most of the major biological groupings of Bacteria and Archaea as presently understood.</p>
<fig fig-type="figure" id="f1" orientation="portrait" position="float">
<label>Figure 1. </label>
<caption>
<title>The alignment-free phylogenetic tree topology of the 143 Bacteria and Archaea genomes based on
<mml:math id="math4">
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>S</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
statistic, modified based on the tree in Bernard
<italic>et al.</italic>
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
; jackknife support at each internal node is shown.</title>
<p>Each phylum is represented in a distinct colour, and the backbones identified in Beiko
<italic>et al.</italic>
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
</sup>
are shown on the internal node with black filled circles. The association of
<italic>Coxiella burnetii</italic>
and
<italic>Nitrosomonas europaea</italic>
is marked with an asterisk.</p>
</caption>
<graphic xlink:href="f1000research-5-11322-g0000"></graphic>
</fig>
<p>
<xref ref-type="fig" rid="f2">Figure 2</xref>
shows the network of phylogenetic relatedness of the same 143 genomes; a dynamic view of this network is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFnetwork/">http://bioinformatics.org.au/tools/AFnetwork/</ext-link>
. As in our tree (
<xref ref-type="fig" rid="f1">Figure 1</xref>
), Archaea and Bacteria form two separate paracliques; even at
<italic>t</italic>
= 0, we found only one archaean isolate (the euryarchaeote
<italic>Methanocaldococcus jannaschii</italic>
DSM 2661) linked to the bacterial groups Thermotogales and Aquificales
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
. Upon reaching
<italic>t</italic>
= 3, most of the 14 phyla have formed distinct densely connected subgraphs in our network, i.e. Cyanobacteria and Chlamydiales form cliques at
<italic>t</italic>
= 1.5 and all subgroups of Proteobacteria form a large paraclique with the Firmicutes at
<italic>t</italic>
= 2. Four
<italic>Escherichia coli</italic>
and two
<italic>Shigella</italic>
isolates, known to be closely related, form a clique up to
<italic>t</italic>
= 8.5. Interestingly, this network also showcases the extent that genomic regions are shared among diverse phyla, e.g. the high extent of genetic similarity among Proteobacteria
<italic>versus</italic>
the low extent between Chlamydiales and Cyanobacteria. Our observations largely agree with published studies
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
,
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
, but also highlight the inadequacy of representing microbial phylogeny as a tree. For instance, in the tree
<italic>Coxiella burnetii</italic>
, a member of the gamma-Proteobacteria, is grouped with
<italic>Nitrosomonas europaea</italic>
of the alpha-Proteobacteria (marked with an asterisk in
<xref ref-type="fig" rid="f1">Figure 1</xref>
); in the network, the strongest connection of
<italic>C. burnetii</italic>
is with
<italic>Wigglesworthia glossinidia</italic>
, a member of the gamma-Proteobacteria (marked with an asterisk in
<xref ref-type="fig" rid="f2">Figure 2</xref>
) at
<italic>t</italic>
= 2. Both
<italic>W. glossinidia</italic>
and
<italic>C. burnetii</italic>
are parasites; the
<italic>W. glossinidia</italic>
genome (0.7 Mbp) is highly reduced
<sup>
<xref rid="ref-31" ref-type="bibr">31</xref>
</sup>
and the
<italic>C. burnetii</italic>
genome (2 Mbp) is proposed to be undergoing reduction
<sup>
<xref rid="ref-32" ref-type="bibr">32</xref>
</sup>
. As both the tree (
<xref ref-type="fig" rid="f1">Figure 1</xref>
) and network presented here were generated using the same alignment-free method, the contradictory position of
<italic>C. burnetii</italic>
is likely caused by the neighbour-joining algorithm used for tree inference
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
. In this scenario, the
<italic>C. burnetii</italic>
genome connects with
<italic>N. europaea</italic>
because it shares high similarity with
<italic>N. europaea</italic>
and
<italic>Neisseria</italic>
genomes of the beta-Proteobacteria (
<italic>S</italic>
between 1.43 and 1.68), second only to
<italic>W. glossinidia</italic>
(
<italic>S</italic>
= 2.05), and because it shares little or no similarity with other genomes of gamma-Proteobacteria that are closely related to
<italic>W. glossinidia</italic>
, i.e.
<italic>Buchnera aphidicola</italic>
isolates (average
<italic>S</italic>
= 0.63) and “
<italic>Candidatus</italic>
Blochmannia floridanus”
<italic></italic>
(
<italic>S</italic>
= 0).</p>
<fig fig-type="figure" id="f2" orientation="portrait" position="float">
<label>Figure 2. </label>
<caption>
<title>Alignment-free phylogenetic network of the 143 Bacteria and Archaea genomes based on
<mml:math id="math5">
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>S</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
statistic using 25-mers, at
<italic>t</italic>
= 2.</title>
<p>Each phylum is represented in a distinct colour, each node represents a genome and an edge represents a qualitative evidence of shared 25-mers between two genomes. The association between
<italic>Coxiella burnetii</italic>
and
<italic>Wigglesworthia glossinidia</italic>
is marked with an asterisk.</p>
</caption>
<graphic xlink:href="f1000research-5-11322-g0001"></graphic>
</fig>
<p>By changing the threshold
<italic>t</italic>
, we can dynamically visualise changes in the network structure. These changes are not random, but appear to correlate to the evolutionary history of the species. At
<italic>t</italic>
= 0, Archaea and Bacteria form two distinct paracliques, linked only by two edges, and the Planctomycetes isolate forms a singleton. When we increase
<italic>t</italic>
from 1 to 2, the Archaea and Bacteria paracliques quickly dissociate from each other; within the Bacteria, cliques of Chlamydiales and Cyanobacteria are formed and the Spirochaetales become isolated. Going from
<italic>t</italic>
= 2 to
<italic>t</italic>
= 3 we observe a scission between Firmicutes and Proteobacteria, and at
<italic>t</italic>
> 3 all classes of Proteobacteria start to form respective paracliques. The separation (as
<italic>t</italic>
is incremented) of a densely connected subgraph involving all representatives of a phylum, from the rest of the network mimics the divergence of this phylum from a common ancestor. Because the similarity measures do not have a unit (such as number of substitutions per site), it is not straightforward to interpret
<italic>S</italic>
as an evolutionary rate or divergence time. A comprehensive comparative analysis between our network here and one that is generated using multiple sequence alignment is beyond the scope of this work. However, our findings suggest that our alignment-free network yields snapshots of biologically meaningful evolutionary relationship among these genomes, and that increasing the threshold based on the proportion of shared
<italic>k</italic>
-mers recapitulates the progressive separation of genomic lineages in evolution.</p>
<p>The alignment-free network reconstructed using whole-genome sequences thus recovers phylogenetic signals that cannot be captured in a binary tree. Using this approach, we generated the network in < 30 minutes; a whole-genome alignment of 143 sequences would have taken days, and even then, the alignment would be difficult to interpret given the genome dynamics in Bacteria and Archaea
<sup>
<xref rid="ref-9" ref-type="bibr">9</xref>
<xref rid="ref-13" ref-type="bibr">13</xref>
,
<xref rid="ref-33" ref-type="bibr">33</xref>
</sup>
. One can imagine inferring a network of thousands of microbial genomes in a few hours using distributed computing. More importantly, the network can be visualised dynamically, explored interactively and shared.</p>
<p>Other biological questions could be addressed by linking the
<italic>k</italic>
-mers to their genomic locations and annotated genome features, e.g. in a relational database
<sup>
<xref rid="ref-34" ref-type="bibr">34</xref>
</sup>
. For instance, we could use such a database to compare thousands of isolates and identify core gene functions for a specific phylum or genus, or exclusive
<italic>versus</italic>
non-exclusive functions in bacterial pathogens, in a matter of seconds. We can also use
<italic>k</italic>
-mers to quickly search for biological information e.g. functions relevant to lateral genetic transfer, recombination or duplications.</p>
<p>In contrast to Haeckel’s “Biogenetic Law”,
<italic>k</italic>
-mers used in this way recapitulate phylogenetic signal, not ontogeny. Alignment-free approaches generate a biologically meaningful phylogenetic inference, and are highly scalable. More importantly, representing alignment-free phylogenetic relationships using a network captures aspects of evolutionary histories that are not possible in a tree. As more genome data become available, Haeckel’s goal of depicting the History of Life is closer to reality.</p>
</sec>
<sec>
<title>Data availability</title>
<p>The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2016 Bernard G et al.</p>
<p>The 143 Bacteria and Archaea genomes used in this work are the same dataset used in an earlier study
<sup>
<xref rid="ref-25" ref-type="bibr">25</xref>
</sup>
, available at
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.14264/uql.2016.908">http://dx.doi.org/10.14264/uql.2016.908</ext-link>
<sup>
<xref rid="ref-35" ref-type="bibr">35</xref>
</sup>
. The dynamic phylogenetic network of these genomes is available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.org.au/tools/AFnetwork">http://bioinformatics.org.au/tools/AFnetwork</ext-link>
, with the source code available at
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.14264/uql.2016.952">http://dx.doi.org/10.14264/uql.2016.952</ext-link>
<sup>
<xref rid="ref-36" ref-type="bibr">36</xref>
</sup>
</p>
</sec>
</body>
<back>
<ref-list>
<ref id="ref-1">
<label>1</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dayrat</surname>
<given-names>B</given-names>
</name>
</person-group>
:
<article-title>The roots of phylogeny: how did Haeckel build his trees?</article-title>
<source>
<italic>Syst Biol.</italic>
</source>
<year>2003</year>
;
<volume>52</volume>
(
<issue>4</issue>
):
<fpage>515</fpage>
<lpage>27</lpage>
.
<pub-id pub-id-type="doi">10.1080/10635150390218277</pub-id>
<pub-id pub-id-type="pmid">12857642</pub-id>
</mixed-citation>
</ref>
<ref id="ref-2">
<label>2</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Haeckel</surname>
<given-names>E</given-names>
</name>
</person-group>
:
<article-title>Generelle Morphologie der Organismen. Allgemeine Grundzüge der organischen Formen-Wissenschaft, mechanisch begründet durch die von Charles Darwin reformirte Descendenztheorie.</article-title>
Bd. 1 und 2. Berlin: Reimer;
<year>1866</year>
<pub-id pub-id-type="doi">10.5962/bhl.title.3953</pub-id>
</mixed-citation>
</ref>
<ref id="ref-3">
<label>3</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Haeckel</surname>
<given-names>E</given-names>
</name>
</person-group>
:
<article-title>Natürliche Schöpfungsgeschichte.</article-title>
Berlin: Reimer;
<year>1868</year>
<ext-link ext-link-type="uri" xlink:href="http://caliban.mpiz-koeln.mpg.de/haeckel/natuerliche/natuerliche.html">Reference Source</ext-link>
</mixed-citation>
</ref>
<ref id="ref-4">
<label>4</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burkhardt</surname>
<given-names>RW</given-names>
<suffix>Jr</suffix>
</name>
</person-group>
:
<article-title>Lamarck, evolution, and the inheritance of acquired characters.</article-title>
<source>
<italic>Genetics.</italic>
</source>
<year>2013</year>
;
<volume>194</volume>
(
<issue>4</issue>
):
<fpage>793</fpage>
<lpage>805</lpage>
.
<pub-id pub-id-type="doi">10.1534/genetics.113.151852</pub-id>
<pmc-comment>3730912</pmc-comment>
<pub-id pub-id-type="pmid">23908372</pub-id>
</mixed-citation>
</ref>
<ref id="ref-5">
<label>5</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fitch</surname>
<given-names>WM</given-names>
</name>
</person-group>
:
<article-title>Homology: a personal view on some of the problems.</article-title>
<source>
<italic>Trends Genet.</italic>
</source>
<year>2000</year>
;
<volume>16</volume>
(
<issue>5</issue>
):
<fpage>227</fpage>
<lpage>31</lpage>
.
<pub-id pub-id-type="doi">10.1016/S0168-9525(00)02005-9</pub-id>
<pub-id pub-id-type="pmid">10782117</pub-id>
</mixed-citation>
</ref>
<ref id="ref-6">
<label>6</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Hall</surname>
<given-names>BK</given-names>
</name>
</person-group>
:
<article-title>Homology: the hierarchical basis of comparative biology.</article-title>
San Diego: Academic Press;
<year>1994</year>
<ext-link ext-link-type="uri" xlink:href="http://research-repository.uwa.edu.au/en/publications/homology-the-hierarchical-basis-of-comparative-biology(573e7340-32ea-4f8b-a1af-11a52563edcb)/export.html">Reference Source</ext-link>
</mixed-citation>
</ref>
<ref id="ref-7">
<label>7</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Notredame</surname>
<given-names>C</given-names>
</name>
</person-group>
:
<article-title>Recent progress in multiple sequence alignment: a survey.</article-title>
<source>
<italic>Pharmacogenomics.</italic>
</source>
<year>2002</year>
;
<volume>3</volume>
(
<issue>1</issue>
):
<fpage>131</fpage>
<lpage>44</lpage>
.
<pub-id pub-id-type="doi">10.1517/14622416.3.1.131</pub-id>
<pub-id pub-id-type="pmid">11966409</pub-id>
</mixed-citation>
</ref>
<ref id="ref-8">
<label>8</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Notredame</surname>
<given-names>C</given-names>
</name>
</person-group>
:
<article-title>Recent evolutions of multiple sequence alignment algorithms.</article-title>
<source>
<italic>PLoS Comput Biol.</italic>
</source>
<year>2007</year>
;
<volume>3</volume>
(
<issue>8</issue>
):
<fpage>e123</fpage>
.
<pub-id pub-id-type="doi">10.1371/journal.pcbi.0030123</pub-id>
<pmc-comment>1963500</pmc-comment>
<pub-id pub-id-type="pmid">17784778</pub-id>
</mixed-citation>
</ref>
<ref id="ref-9">
<label>9</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Darling</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Miklós</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>Dynamics of genome rearrangement in bacterial populations.</article-title>
<source>
<italic>PLoS Genet.</italic>
</source>
<year>2008</year>
;
<volume>4</volume>
(
<issue>7</issue>
):
<fpage>e1000128</fpage>
.
<pub-id pub-id-type="doi">10.1371/journal.pgen.1000128</pub-id>
<pmc-comment>2483231</pmc-comment>
<pub-id pub-id-type="pmid">18650965</pub-id>
</mixed-citation>
</ref>
<ref id="ref-10">
<label>10</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beiko</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Harlow</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>Highways of gene sharing in prokaryotes.</article-title>
<source>
<italic>Proc Natl Acad Sci U S A.</italic>
</source>
<year>2005</year>
;
<volume>102</volume>
(
<issue>40</issue>
):
<fpage>14332</fpage>
<lpage>7</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.0504068102</pub-id>
<pmc-comment>1242295</pmc-comment>
<pub-id pub-id-type="pmid">16176988</pub-id>
</mixed-citation>
</ref>
<ref id="ref-11">
<label>11</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Doolittle</surname>
<given-names>WF</given-names>
</name>
</person-group>
:
<article-title>Phylogenetic classification and the universal tree.</article-title>
<source>
<italic>Science.</italic>
</source>
<year>1999</year>
;
<volume>284</volume>
(
<issue>5423</issue>
):
<fpage>2124</fpage>
<lpage>9</lpage>
.
<pub-id pub-id-type="doi">10.1126/science.284.5423.2124</pub-id>
<pub-id pub-id-type="pmid">10381871</pub-id>
</mixed-citation>
</ref>
<ref id="ref-12">
<label>12</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
:
<article-title>Horizontal gene transfer: essentiality and evolvability in prokaryotes, and roles in evolutionary transitions [version 1; referees: 2 approved].</article-title>
<source>
<italic>F1000Res.</italic>
</source>
<year>2016</year>
;
<volume>5</volume>
: pii: F1000 Faculty Rev-1805.
<pub-id pub-id-type="doi">10.12688/f1000research.8737.1</pub-id>
<pmc-comment>4962295</pmc-comment>
<pub-id pub-id-type="pmid">27508073</pub-id>
</mixed-citation>
</ref>
<ref id="ref-13">
<label>13</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Puigbò</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Lobkovsky</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Kristensen</surname>
<given-names>DM</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes.</article-title>
<source>
<italic>BMC Biol.</italic>
</source>
<year>2014</year>
;
<volume>12</volume>
:
<fpage>66</fpage>
.
<pub-id pub-id-type="doi">10.1186/s12915-014-0066-4</pub-id>
<pmc-comment>4166000</pmc-comment>
<pub-id pub-id-type="pmid">25141959</pub-id>
</mixed-citation>
</ref>
<ref id="ref-14">
<label>14</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adl</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Simpson</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Lane</surname>
<given-names>CE</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>The revised classification of eukaryotes.</article-title>
<source>
<italic>J Eukaryot Microbiol.</italic>
</source>
<year>2012</year>
;
<volume>59</volume>
(
<issue>5</issue>
):
<fpage>429</fpage>
<lpage>93</lpage>
.
<pub-id pub-id-type="doi">10.1111/j.1550-7408.2012.00644.x</pub-id>
<pmc-comment>3483872</pmc-comment>
<pub-id pub-id-type="pmid">23020233</pub-id>
</mixed-citation>
</ref>
<ref id="ref-15">
<label>15</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spang</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saw</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Jørgensen</surname>
<given-names>SL</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Complex archaea that bridge the gap between prokaryotes and eukaryotes.</article-title>
<source>
<italic>Nature.</italic>
</source>
<year>2015</year>
;
<volume>521</volume>
(
<issue>7551</issue>
):
<fpage>173</fpage>
<lpage>9</lpage>
.
<pub-id pub-id-type="doi">10.1038/nature14447</pub-id>
<pmc-comment>4444528</pmc-comment>
<pub-id pub-id-type="pmid">25945739</pub-id>
</mixed-citation>
</ref>
<ref id="ref-16">
<label>16</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bonham-Carter</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Steele</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bastola</surname>
<given-names>D</given-names>
</name>
</person-group>
:
<article-title>Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.</article-title>
<source>
<italic>Brief Bioinform.</italic>
</source>
<year>2014</year>
;
<volume>15</volume>
(
<issue>6</issue>
):
<fpage>890</fpage>
<lpage>905</lpage>
.
<pub-id pub-id-type="doi">10.1093/bib/bbt052</pub-id>
<pmc-comment>4296134</pmc-comment>
<pub-id pub-id-type="pmid">23904502</pub-id>
</mixed-citation>
</ref>
<ref id="ref-17">
<label>17</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haubold</surname>
<given-names>B</given-names>
</name>
</person-group>
:
<article-title>Alignment-free phylogenetics and population genetics.</article-title>
<source>
<italic>Brief Bioinform.</italic>
</source>
<year>2014</year>
;
<volume>15</volume>
(
<issue>3</issue>
):
<fpage>407</fpage>
<lpage>18</lpage>
.
<pub-id pub-id-type="doi">10.1093/bib/bbt083</pub-id>
<pub-id pub-id-type="pmid">24291823</pub-id>
</mixed-citation>
</ref>
<ref id="ref-18">
<label>18</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cong</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>YB</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF.</article-title>
<source>
<italic>Sci Rep.</italic>
</source>
<year>2016</year>
;
<volume>6</volume>
: 30308.
<pub-id pub-id-type="doi">10.1038/srep30308</pub-id>
<pmc-comment>4958984</pmc-comment>
<pub-id pub-id-type="pmid">27453035</pub-id>
</mixed-citation>
</ref>
<ref id="ref-19">
<label>19</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Domazet-Lošo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Haubold</surname>
<given-names>B</given-names>
</name>
</person-group>
:
<article-title>Alignment-free detection of local similarity among viral and bacterial genomes.</article-title>
<source>
<italic>Bioinformatics.</italic>
</source>
<year>2011</year>
;
<volume>27</volume>
(
<issue>11</issue>
):
<fpage>1466</fpage>
<lpage>72</lpage>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr176</pub-id>
<pub-id pub-id-type="pmid">21471011</pub-id>
</mixed-citation>
</ref>
<ref id="ref-20">
<label>20</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corel</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Méheust</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Network-thinking: graphs to analyze microbial complexity and evolution.</article-title>
<source>
<italic>Trends Microbiol.</italic>
</source>
<year>2016</year>
;
<volume>24</volume>
(
<issue>3</issue>
):
<fpage>224</fpage>
<lpage>37</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.tim.2015.12.003</pub-id>
<pmc-comment>4766943</pmc-comment>
<pub-id pub-id-type="pmid">26774999</pub-id>
</mixed-citation>
</ref>
<ref id="ref-21">
<label>21</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dagan</surname>
<given-names>T</given-names>
</name>
</person-group>
:
<article-title>Phylogenomic networks.</article-title>
<source>
<italic>Trends Microbiol.</italic>
</source>
<year>2011</year>
;
<volume>19</volume>
(
<issue>10</issue>
):
<fpage>483</fpage>
<lpage>91</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.tim.2011.07.001</pub-id>
<pub-id pub-id-type="pmid">21820313</pub-id>
</mixed-citation>
</ref>
<ref id="ref-22">
<label>22</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>D</given-names>
</name>
</person-group>
:
<article-title>Application of phylogenetic networks in evolutionary studies.</article-title>
<source>
<italic>Mol Biol Evol.</italic>
</source>
<year>2006</year>
;
<volume>23</volume>
(
<issue>2</issue>
):
<fpage>254</fpage>
<lpage>67</lpage>
.
<pub-id pub-id-type="doi">10.1093/molbev/msj030</pub-id>
<pub-id pub-id-type="pmid">16221896</pub-id>
</mixed-citation>
</ref>
<ref id="ref-23">
<label>23</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Scornavacca</surname>
<given-names>C</given-names>
</name>
</person-group>
:
<article-title>A survey of combinatorial methods for phylogenetic networks.</article-title>
<source>
<italic>Genome Biol Evol.</italic>
</source>
<year>2011</year>
;
<volume>3</volume>
:
<fpage>23</fpage>
<lpage>35</lpage>
.
<pub-id pub-id-type="doi">10.1093/gbe/evq077</pub-id>
<pmc-comment>3017387</pmc-comment>
<pub-id pub-id-type="pmid">21081312</pub-id>
</mixed-citation>
</ref>
<ref id="ref-24">
<label>24</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kunin</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Goldovsky</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Darzentas</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>The net of life: reconstructing the microbial phylogenetic network.</article-title>
<source>
<italic>Genome Res.</italic>
</source>
<year>2005</year>
;
<volume>15</volume>
(
<issue>7</issue>
):
<fpage>954</fpage>
<lpage>9</lpage>
.
<pub-id pub-id-type="doi">10.1101/gr.3666505</pub-id>
<pmc-comment>1172039</pmc-comment>
<pub-id pub-id-type="pmid">15965028</pub-id>
</mixed-citation>
</ref>
<ref id="ref-25">
<label>25</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.</article-title>
<source>
<italic>Sci Rep.</italic>
</source>
<year>2016</year>
;
<volume>6</volume>
: 28970.
<pub-id pub-id-type="doi">10.1038/srep28970</pub-id>
<pmc-comment>4929450</pmc-comment>
<pub-id pub-id-type="pmid">27363362</pub-id>
</mixed-citation>
</ref>
<ref id="ref-26">
<label>26</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
<name>
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Poirion</surname>
<given-names>O</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Inferring phylogenies of evolving sequences without multiple sequence alignment.</article-title>
<source>
<italic>Sci Rep.</italic>
</source>
<year>2014</year>
;
<volume>4</volume>
: 6504.
<pub-id pub-id-type="doi">10.1038/srep06504</pub-id>
<pmc-comment>4179140</pmc-comment>
<pub-id pub-id-type="pmid">25266120</pub-id>
</mixed-citation>
</ref>
<ref id="ref-27">
<label>27</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
</person-group>
:
<article-title>Molecular phylogenetics before sequences: oligonucleotide catalogs as
<italic>k</italic>
-mer spectra.</article-title>
<source>
<italic>RNA Biol.</italic>
</source>
<year>2014</year>
;
<volume>11</volume>
(
<issue>3</issue>
):
<fpage>176</fpage>
<lpage>85</lpage>
.
<pub-id pub-id-type="doi">10.4161/rna.27505</pub-id>
<pmc-comment>4008546</pmc-comment>
<pub-id pub-id-type="pmid">24572375</pub-id>
</mixed-citation>
</ref>
<ref id="ref-28">
<label>28</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>Next-generation phylogenomics.</article-title>
<source>
<italic>Biol Direct.</italic>
</source>
<year>2013</year>
;
<volume>8</volume>
:
<fpage>3</fpage>
.
<pub-id pub-id-type="doi">10.1186/1745-6150-8-3</pub-id>
<pmc-comment>3564786</pmc-comment>
<pub-id pub-id-type="pmid">23339707</pub-id>
</mixed-citation>
</ref>
<ref id="ref-29">
<label>29</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chew</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Alignment-free sequence comparison (I): statistics and power.</article-title>
<source>
<italic>J Comput Biol.</italic>
</source>
<year>2009</year>
;
<volume>16</volume>
(
<issue>12</issue>
):
<fpage>1615</fpage>
<lpage>34</lpage>
.
<pub-id pub-id-type="doi">10.1089/cmb.2009.0198</pub-id>
<pmc-comment>2818754</pmc-comment>
<pub-id pub-id-type="pmid">20001252</pub-id>
</mixed-citation>
</ref>
<ref id="ref-30">
<label>30</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Reinert</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Alignment-free sequence comparison (II): theoretical power of comparison statistics.</article-title>
<source>
<italic>J Comput Biol.</italic>
</source>
<year>2010</year>
;
<volume>17</volume>
(
<issue>11</issue>
):
<fpage>1467</fpage>
<lpage>90</lpage>
.
<pub-id pub-id-type="doi">10.1089/cmb.2010.0056</pub-id>
<pmc-comment>3123933</pmc-comment>
<pub-id pub-id-type="pmid">20973742</pub-id>
</mixed-citation>
</ref>
<ref id="ref-31">
<label>31</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akman</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yamashita</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Genome sequence of the endocellular obligate symbiont of tsetse flies,
<italic>Wigglesworthia glossinidia.</italic>
</article-title>
<source>
<italic>Nat Genet.</italic>
</source>
<year>2002</year>
;
<volume>32</volume>
(
<issue>3</issue>
):
<fpage>402</fpage>
<lpage>7</lpage>
.
<pub-id pub-id-type="doi">10.1038/ng986</pub-id>
<pub-id pub-id-type="pmid">12219091</pub-id>
</mixed-citation>
</ref>
<ref id="ref-32">
<label>32</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seshadri</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Paulsen</surname>
<given-names>IT</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Complete genome sequence of the Q-fever pathogen
<italic>Coxiella burnetii.</italic>
</article-title>
<source>
<italic>Proc Natl Acad Sci U S A.</italic>
</source>
<year>2003</year>
;
<volume>100</volume>
(
<issue>9</issue>
):
<fpage>5455</fpage>
<lpage>60</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.0931379100</pub-id>
<pmc-comment>154366</pmc-comment>
<pub-id pub-id-type="pmid">12704232</pub-id>
</mixed-citation>
</ref>
<ref id="ref-33">
<label>33</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dagan</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Martin</surname>
<given-names>W</given-names>
</name>
</person-group>
:
<article-title>The tree of one percent.</article-title>
<source>
<italic>Genome Biol.</italic>
</source>
<year>2006</year>
;
<volume>7</volume>
(
<issue>10</issue>
):
<fpage>118</fpage>
.
<pub-id pub-id-type="doi">10.1186/gb-2006-7-10-118</pub-id>
<pmc-comment>1794558</pmc-comment>
<pub-id pub-id-type="pmid">17081279</pub-id>
</mixed-citation>
</ref>
<ref id="ref-34">
<label>34</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Greenfield</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Roehm</surname>
<given-names>U</given-names>
</name>
</person-group>
:
<article-title>Answering biological questions by querying k-mer databases.</article-title>
<source>
<italic>Concurr Comput Pract Exper.</italic>
</source>
<year>2013</year>
;
<volume>25</volume>
(
<issue>4</issue>
):
<fpage>497</fpage>
<lpage>509</lpage>
.
<pub-id pub-id-type="doi">10.1002/cpe.2938</pub-id>
</mixed-citation>
</ref>
<ref id="ref-35">
<label>35</label>
<mixed-citation publication-type="data">
<person-group person-group-type="author">
<name>
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>143 Prokaryote genomes</article-title>
. Dataset.
<year>2016</year>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.14264/uql.2016.908">Data Source</ext-link>
</mixed-citation>
</ref>
<ref id="ref-36">
<label>36</label>
<mixed-citation publication-type="data">
<person-group person-group-type="author">
<name>
<surname>Bernard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>CX</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
:
<article-title>Alignment-free network of 143 prokaryote genomes</article-title>
. Dataset.
<year>2016</year>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.14264/uql.2016.952">Data Source</ext-link>
</mixed-citation>
</ref>
</ref-list>
</back>
<sub-article id="report18754" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.11322.r18754</article-id>
<title-group>
<article-title>Referee response for version 2</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hao</surname>
<given-names>Weilong</given-names>
</name>
<xref ref-type="aff" rid="r18754a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r18754a1">
<label>1</label>
Department of Biological Sciences, Wayne State University, Detroit, MI, USA</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>12</month>
<year>2016</year>
</pub-date>
<related-article id="d35e2415" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.10225.2">Version 2</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>The authors' responses are acceptable.</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
</body>
</sub-article>
<sub-article id="report18060" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.11014.r18060</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hao</surname>
<given-names>Weilong</given-names>
</name>
<xref ref-type="aff" rid="r18060a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r18060a1">
<label>1</label>
Department of Biological Sciences, Wayne State University, Detroit, MI, USA</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>12</month>
<year>2016</year>
</pub-date>
<related-article id="d35e2464" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.10225.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve-with-reservations</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>The manuscript uses
<italic>k</italic>
-mers from whole-genome sequences to recapitulate phylogenetic relationships from trees to networks. The analyses seemed to be convincing, and of general interest. I just have some comments on the manuscript structure and some other minor suggestions.</p>
<p> The authors used Ernst Haeckel’s phylogeny and Biogenetic Law to start their manuscript. Although it is fun to read all these historical pieces, the link between Haeckel’s ideas and the construction of networks using
<italic>k</italic>
-mers was not made strong in the current version of the manuscript.</p>
<p> The authors compared alignment-free data against sequence alignments, and stated that the sequence alignment approach “ignores important evolutionary processes that are known to shape the genomes of microbes” followed by mentioning recombination, genome rearrangement, and lateral gene transfer. This is not accurate, as sequence alignments can also be used to reconstruct web-like phylogenetic relationships, which are sometimes called phylogenetic networks (e.g., Huson and Bryant 2006). I think it is important to carefully define and compare the networks mentioned in this manuscript and the phylogenetic networks mentioned by Huson and Bryant. Along this line, approaches based on sequence alignments might not all assume tree-like relationship. Furthermore, the authors mentioned evolutionary events, such as recombination, genome rearrangement, and lateral gene transfer, that are difficult to study using sequence alignments, but did not provide detailed evidence on whether
<italic>k</italic>
-mers can tackle them all. I suggest the authors to rather stay closer to their data and make more specific statements.</p>
<p> In the third introduction paragraph, “By default, it is assumed that the best alignment can be achieved simply by displaying the sequences in the same direction and inserting gaps where needed. This assumption is largely valid when working with exons or proteins of morphologically complex eukaryotes. However, in microbes this assumption is violated...” I feel the meaning of “assumption” in each of these sentences is a moving target. If they are talking about orthologous sequences, the analysis of orthologs should hold for both eukaryotes and prokaryotes. The key here, I guess, is the comparison of ortholgs, versus, the comparison of exenologs even non-homologs. Another minor point is the use of “microbes”, which can mean, bacteria, archaea, and small-eukaryotes. I don’t think it is a good word to use here.</p>
<p> The authors did not justify the use of the 143 genomes. It seemed that they were inherited from their previous study conducted some time ago, and likely skewed in terms of taxon-sampling. Since taxon-samping is important for tree-like phylogenetic analysis, it would be nice to address how the improved (or more balanced) taxon-sampling can benefit the network analyses.</p>
<p> The authors wrote “... in agreement with previously published studies; as such, this tree represents reality as presently understood, i.e., is biologically correct”. The use of words such as reality, biologically correct here, is inappropriate.</p>
<p> The data of Wigglesworthia, Coxiella and others are of potential interest. The readers would definitely appreciate some real data analyses to address them, which are currently lacking.</p>
<p> The cited references are relatively recent and skewed. Some of the older and more influential papers need to be added (for both networks and alignment free).</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
</body>
<sub-article id="comment2372" article-type="response">
<front-stub>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Chan</surname>
<given-names>Cheong Xin</given-names>
</name>
<aff>Institute for Molecular Bioscience, The University of Queensland, Australia</aff>
</contrib>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>15</day>
<month>12</month>
<year>2016</year>
</pub-date>
</front-stub>
<body>
<p>Thank you for these comments.
<list list-type="bullet">
<list-item>
<p>
<bold>The link between Haeckel’s ideas and the construction of networks using
<italic>k</italic>
-mers was not made strong in the current version of the manuscript.</bold>
</p>
</list-item>
</list>
The work we present here is a proof-of-concept for a biologically informative network based on
<italic>k</italic>
-mers extracted from whole-genome sequences. We hope to convince readers that dynamic visualization of such a network is intuitive for exploring and addressing biological questions, aiding discovery. The paper is part of a special collection of F1000Research articles in phylogenetics, commemorating the 150th anniversary of Ernst Haeckel’s Tree of Life published in 1866. Here we argue that by using
<italic>k</italic>
-mers we can recapitulate phylogenetic signal, somewhat in the same spirit as Haeckel famously argued that “ontogeny recapitulates phylogeny”. More precisely, our claim is that “increasing the threshold based on the proportion of shared
<italic>k</italic>
-mers recapitulates the progressive separation of genomic lineages in evolution”. Full consideration of Haeckel’s work in the context of Darwinian evolution then and today is well beyond the scope of our brief paper, although we cite some key references.
<list list-type="bullet">
<list-item>
<p>
<bold>The authors … stated that the sequence alignment approach “ignores important evolutionary processes that are known to shape the genomes of microbes” followed by mentioning recombination, genome rearrangement, and lateral gene transfer. This is not accurate, as sequence alignments can also be used to reconstruct web-like phylogenetic relationships, which are sometimes called phylogenetic networks (e.g., Huson and Bryant 2006). I think it is important to carefully define and compare the networks mentioned in this manuscript and the phylogenetic networks mentioned by Huson and Bryant.</bold>
<bold>Along this line, approaches based on sequence alignments might not all assume tree-like relationship.</bold>
</p>
</list-item>
</list>
We agree and have now rewritten part of the Abstract to stage our argument more clearly: genomic processes in microbes can undermine the assumptions that underlie multiple sequence alignment, hence phylogenetic inference as usually practiced. We have now cited other articles on phylogenetic networks in the text where appropriate, specifically Huson and Bryant
<sup>1</sup>
and Kunin
<italic>et al.</italic>
<sup>2</sup>
. Comprehensive comparison of
<italic>k</italic>
-mer-based and (alignment-based) phylogenetic networks is important but, due to its complexity, beyond the scope of this paper; we have now clarified this in the revised text.
<list list-type="bullet">
<list-item>
<p>
<bold>The authors mentioned evolutionary events, such as recombination, genome rearrangement, and lateral gene transfer … but did not provide detailed evidence on whether k-mers can tackle them all. I suggest the authors to rather stay closer to their data and make more specific statements.</bold>
</p>
</list-item>
</list>
In Chan
<italic>et al.</italic>
<sup>3</sup>
and Bernard
<italic>et al.</italic>
<sup>4</sup>
we provided detailed evidence that alignment-free approaches based on
<italic>k</italic>
-mers, at multi-genome scale, can be robust to insertions/deletions, genome rearrangement and lateral genetic transfer; these articles are cited where appropriate.
<list list-type="bullet">
<list-item>
<p>
<bold>In the third introduction paragraph, “By default, it is assumed that the best alignment can be achieved simply by displaying the sequences in the same direction and inserting gaps where needed. This assumption is largely valid when working with exons or proteins of morphologically complex eukaryotes. However, in microbes this assumption is violated...” I feel the meaning of “assumption” in each of these sentences is a moving target. If they are talking about orthologous sequences, the analysis of orthologs should hold for both eukaryotes and prokaryotes. </bold>
</p>
</list-item>
</list>
We have now revised the text to make it clear that the main assumption underlying multiple sequence alignment, i.e. that the alignment columns display homology position-by-position along the length of the sequences, is largely valid when working with highly conserved orthologs of any source; and that the validity of this assumption is often undermined in the case of microbial genome sequences, due to recombination and rearrangement.
<list list-type="bullet">
<list-item>
<p>
<bold>Another minor point is the use of “microbes”, which can mean, bacteria, archaea, and small-eukaryotes. I don’t think it is a good word to use here.</bold>
</p>
</list-item>
</list>
We used the word “microbes” here specifically to include archaea, bacteria and microbial eukaryotes. Genomes of many microbial eukaryotes are known to be impacted by lateral genetic transfer, at frequencies sometimes nearly as large as in bacteria and archaea.
<list list-type="bullet">
<list-item>
<p>
<bold>The authors did not justify the use of the 143 genomes. … Since taxon-sampling is important for tree-like phylogenetic analysis, it would be nice to address how the improved (or more balanced) taxon-sampling can benefit the network analyses.</bold>
</p>
</list-item>
</list>
Here we used the 143-genome dataset because the phylogenetic relationships among these genomes have been studied using careful alignment-based methods
<sup>5</sup>
and by alignment-free approaches
<sup>4</sup>
; it thus provides a good reference for comparison. We have now clarified this in the text. In our alignment-free network, each edge represents the qualitative evidence of
<italic>k</italic>
-mers shared pairwise between two genomes. This evidence is not affected by other genomes present in (or absent from) the dataset. Therefore, our networks are not affected by taxon-sampling biases of the sort encountered in tree inference. Of course, the presence or absence of a critical node (genome) might affect the biological conclusion we draw from a network, but the same is true for any scientific analysis. We considered the effect of phyletic balance on the inference of lateral genetic transfer networks in another context
<sup>6</sup>
.  
<list list-type="bullet">
<list-item>
<p>
<bold>The authors wrote “... in agreement with previously published studies; as such, this tree represents reality as presently understood, i.e., is biologically correct”. The use of words such as reality, biologically correct here, is inappropriate.</bold>
</p>
</list-item>
</list>
We agree and now state that “as such, this tree captures most of the major biological groupings of Bacteria and Archaea as presently understood”.
<list list-type="bullet">
<list-item>
<p>
<bold>The data of
<italic>Wigglesworthia</italic>
,
<italic>Coxiella</italic>
and others are of potential interest. The readers would definitely appreciate some real data analyses to address them, which are currently lacking.</bold>
</p>
</list-item>
</list>
A follow-up analysis between
<italic>Wigglesworthia</italic>
and
<italic>Coxiella</italic>
would indeed be interesting, but is beyond the scope of this Research Note, the aim of which is to present limited findings in hopes of inspiring and encouraging others to explore this research area.
<list list-type="bullet">
<list-item>
<p>
<bold>Some of the older and more influential papers need to be added (for both networks and alignment free).</bold>
</p>
</list-item>
</list>
We have now cited older, relevant references in the text for both networks
<sup>1, 2</sup>
and alignment-free methods
<sup>7</sup>
.</p>
<p>
<bold>References</bold>
<list list-type="order">
<list-item>
<p>Huson DH, Bryant D: Application of phylogenetic networks in evolutionary studies.
<italic>Mol Biol Evol</italic>
. 2006;
<bold>23</bold>
(2): 254-67.</p>
</list-item>
<list-item>
<p>Kunin V, Goldovsky L, Darzentas N
<italic>, et al.</italic>
: The net of life: reconstructing the microbial phylogenetic network.
<italic>Genome Res</italic>
. 2005;
<bold>15</bold>
(7): 954-9.</p>
</list-item>
<list-item>
<p>Chan CX, Bernard G, Poirion O
<italic>, et al.</italic>
: Inferring phylogenies of evolving sequences without multiple sequence alignment.
<italic>Sci Rep</italic>
. 2014;
<bold>4</bold>
: 6504.</p>
</list-item>
<list-item>
<p>Bernard G, Chan CX, Ragan MA: Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.
<italic>Sci Rep</italic>
. 2016;
<bold>6</bold>
: 28970.</p>
</list-item>
<list-item>
<p>Beiko RG, Harlow TJ, Ragan MA: Highways of gene sharing in prokaryotes.
<italic>Proc Natl Acad Sci U S A</italic>
. 2005;
<bold>102</bold>
(40): 14332-7.</p>
</list-item>
<list-item>
<p>Cong Y, Chan YB, Ragan MA: Exploring lateral genetic transfer among microbial genomes using TF-IDF.
<italic>Sci Rep</italic>
. 2016;
<bold>6</bold>
: 29319.</p>
</list-item>
<list-item>
<p>Domazet-Lošo M, Haubold B: Alignment-free detection of local similarity among viral and bacterial genomes.
<italic>Bioinformatics</italic>
. 2011;
<bold>27</bold>
(11): 1466-72.</p>
</list-item>
</list>
</p>
</body>
</sub-article>
</sub-article>
<sub-article id="report18402" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.11014.r18402</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Haubold</surname>
<given-names>Bernhard</given-names>
</name>
<xref ref-type="aff" rid="r18402a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r18402a1">
<label>1</label>
Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>12</month>
<year>2016</year>
</pub-date>
<related-article id="d35e2805" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.10225.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>Phylogeny reconstruction is a classical research topic in bioinformatics. In this context the standard trade-off between speed and accuracy becomes a choice between slow but accurate sequence alignment on the one hand and fast but less accurate alignment-free methods on the other. Bernard
<italic>et al.</italic>
aim for speed and use an established alignment-free measure, D_2, to reconstruct the phylogeny of 143 Bacteria and Archaea from full genome sequences. D_2 is based on the number of shared
<italic>k</italic>
-mers, and the main contribution of the paper is the visualization of the D_2 distance matrix of the 143 taxa as a network rather than the traditional bifurcating tree. This visualization is dynamic in the sense that the user can choose a similarity threshold between 0 and 10, and watch as the taxa disintegrate from initially two clusters to essentially every taxon on its own. This is an innovative way of presenting large-scale evolutionary relationships, and the tool is fun to use. As the authors remark, it is unclear how the D_2 metric scales with more familiar measures of evolutionary time such as substitutions per site. It would thus be interesting to explored this in future work; for example by supplying a version of the visualization tool that allows users to upload their own sequences. I was also wondering how the networks generated by Bernard
<italic>et al.</italic>
compare to established methods of network-based evolutionary analysis such as SplitsTree and minimum spanning trees. I realize that these are both usually based on alignments, but it is always possible to analyze a given alignment using D_2, thereby allowing a direct assessment of the accuracy lost (if any) for the speed gained.</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
</body>
<sub-article id="comment2371" article-type="response">
<front-stub>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Chan</surname>
<given-names>Cheong Xin</given-names>
</name>
<aff>Institute for Molecular Bioscience, The University of Queensland, Australia</aff>
</contrib>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>15</day>
<month>12</month>
<year>2016</year>
</pub-date>
</front-stub>
<body>
<p>Thank you for these comments. Indeed, the correlation between D2 metrics and evolutionary distances is an interesting area, and a tool that allows users to upload their own datasets would be useful. A comparative analysis between a
<italic>k</italic>
-mer-based network and a phylogenetic network based on multiple sequence alignment, although doable, is not straightforward. We believe the adoption of alignment-free methods in phylogenetic inference is still in its infancy, and we hope that this work will inspire and encourage other researchers to pursue this approach.</p>
</body>
</sub-article>
</sub-article>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C69  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000C69  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021