Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 001147 ( Pmc/Corpus ); précédent : 0011469; suivant : 0011480 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?</title>
<author>
<name sortKey="Hohl, Michael" sort="Hohl, Michael" uniqKey="Hohl M" first="Michael" last="Höhl">Michael Höhl</name>
<affiliation>
<nlm:aff id="au1">
<institution>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland</institution>
<addr-line>Brisbane, QLD 4072, Australia</addr-line>
E-mail:
<email>m.ragan@imb.uq.edu.au</email>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A." last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:aff id="au1">
<institution>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland</institution>
<addr-line>Brisbane, QLD 4072, Australia</addr-line>
E-mail:
<email>m.ragan@imb.uq.edu.au</email>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17454975</idno>
<idno type="pmc">7107264</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7107264</idno>
<idno type="RBID">PMC:7107264</idno>
<idno type="doi">10.1080/10635150701294741</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">001147</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001147</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?</title>
<author>
<name sortKey="Hohl, Michael" sort="Hohl, Michael" uniqKey="Hohl M" first="Michael" last="Höhl">Michael Höhl</name>
<affiliation>
<nlm:aff id="au1">
<institution>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland</institution>
<addr-line>Brisbane, QLD 4072, Australia</addr-line>
E-mail:
<email>m.ragan@imb.uq.edu.au</email>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A." last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:aff id="au1">
<institution>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland</institution>
<addr-line>Brisbane, QLD 4072, Australia</addr-line>
E-mail:
<email>m.ragan@imb.uq.edu.au</email>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Systematic Biology</title>
<idno type="ISSN">1063-5157</idno>
<idno type="eISSN">1076-836X</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from
<ext-link ext-link-type="uri" xlink:href="http://www.bioinformatics.org.au">http://www.bioinformatics.org.au</ext-link>
(as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from
<italic>k</italic>
-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length
<italic>k</italic>
of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Beiko, R G" uniqKey="Beiko R">R. G. Beiko</name>
</author>
<author>
<name sortKey="Chan, C X" uniqKey="Chan C">C. X. Chan</name>
</author>
<author>
<name sortKey="Ragan, M A" uniqKey="Ragan M">M. A. Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beiko, R G" uniqKey="Beiko R">R. G. Beiko</name>
</author>
<author>
<name sortKey="Harlow, T J" uniqKey="Harlow T">T. J. Harlow</name>
</author>
<author>
<name sortKey="Ragan, M A" uniqKey="Ragan M">M. A. Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beiko, R G" uniqKey="Beiko R">R. G. Beiko</name>
</author>
<author>
<name sortKey="Keith, J M" uniqKey="Keith J">J. M. Keith</name>
</author>
<author>
<name sortKey="Harlow, T J" uniqKey="Harlow T">T. J. Harlow</name>
</author>
<author>
<name sortKey="Ragan, M A" uniqKey="Ragan M">M. A. Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blaisdell, B E" uniqKey="Blaisdell B">B. E. Blaisdell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Castresana, J" uniqKey="Castresana J">J. Castresana</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chu, K H" uniqKey="Chu K">K. H. Chu</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J. Qi</name>
</author>
<author>
<name sortKey="Yu, Z G" uniqKey="Yu Z">Z.-G. Yu</name>
</author>
<author>
<name sortKey="Anh, V" uniqKey="Anh V">V. Anh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cowles, M K" uniqKey="Cowles M">M. K. Cowles</name>
</author>
<author>
<name sortKey="Carlin, B P" uniqKey="Carlin B">B. P. Carlin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, R C" uniqKey="Edgar R">R. C. Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, R C" uniqKey="Edgar R">R. C. Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J. Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J. Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gelman, A" uniqKey="Gelman A">A. Gelman</name>
</author>
<author>
<name sortKey="Carlin, J B" uniqKey="Carlin J">J. B. Carlin</name>
</author>
<author>
<name sortKey="Stern, H S" uniqKey="Stern H">H. S. Stern</name>
</author>
<author>
<name sortKey="Rubin, D B" uniqKey="Rubin D">D. B. Rubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hall, B G" uniqKey="Hall B">B. G. Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hao, B" uniqKey="Hao B">B. Hao</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J. Qi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harlow, T J" uniqKey="Harlow T">T. J. Harlow</name>
</author>
<author>
<name sortKey="Gogarten, J P" uniqKey="Gogarten J">J. P. Gogarten</name>
</author>
<author>
<name sortKey="Ragan, M A" uniqKey="Ragan M">M. A. Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Henikoff, S" uniqKey="Henikoff S">S. Henikoff</name>
</author>
<author>
<name sortKey="Henikoff, J G" uniqKey="Henikoff J">J. G. Henikoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hohl, M" uniqKey="Hohl M">M. Höhl</name>
</author>
<author>
<name sortKey="Rigoutsos, I" uniqKey="Rigoutsos I">I. Rigoutsos</name>
</author>
<author>
<name sortKey="Ragan, M A" uniqKey="Ragan M">M. A. Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huelsenbeck, J P" uniqKey="Huelsenbeck J">J. P. Huelsenbeck</name>
</author>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F. Ronquist</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, D T" uniqKey="Jones D">D. T. Jones</name>
</author>
<author>
<name sortKey="Taylor, W R" uniqKey="Taylor W">W. R. Taylor</name>
</author>
<author>
<name sortKey="Thornton, J M" uniqKey="Thornton J">J. M. Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lempel, A" uniqKey="Lempel A">A. Lempel</name>
</author>
<author>
<name sortKey="Ziv, J" uniqKey="Ziv J">J. Ziv</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lewis, P O" uniqKey="Lewis P">P. O. Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, M" uniqKey="Li M">M. Li</name>
</author>
<author>
<name sortKey="Badger, J H" uniqKey="Badger J">J. H. Badger</name>
</author>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X. Chen</name>
</author>
<author>
<name sortKey="Kwong, S" uniqKey="Kwong S">S. Kwong</name>
</author>
<author>
<name sortKey="Kearney, P" uniqKey="Kearney P">P. Kearney</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mantaci, S" uniqKey="Mantaci S">S. Mantaci</name>
</author>
<author>
<name sortKey="Restivo, A" uniqKey="Restivo A">A. Restivo</name>
</author>
<author>
<name sortKey="Rosone, G" uniqKey="Rosone G">G. Rosone</name>
</author>
<author>
<name sortKey="Sciortino, M" uniqKey="Sciortino M">M. Sciortino</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nee, S" uniqKey="Nee S">S. Nee</name>
</author>
<author>
<name sortKey="May, R M" uniqKey="May R">R. M. May</name>
</author>
<author>
<name sortKey="Harvey, P H" uniqKey="Harvey P">P. H. Harvey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ogden, T H" uniqKey="Ogden T">T. H. Ogden</name>
</author>
<author>
<name sortKey="Rosenberg, M S" uniqKey="Rosenberg M">M. S. Rosenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Otu, H H" uniqKey="Otu H">H. H. Otu</name>
</author>
<author>
<name sortKey="Sayood, K" uniqKey="Sayood K">K. Sayood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J. Qi</name>
</author>
<author>
<name sortKey="Wang, B" uniqKey="Wang B">B. Wang</name>
</author>
<author>
<name sortKey="Hao, B I" uniqKey="Hao B">B.-I. Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rambaut, A" uniqKey="Rambaut A">A. Rambaut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rambaut, A" uniqKey="Rambaut A">A. Rambaut</name>
</author>
<author>
<name sortKey="Grassly, N C" uniqKey="Grassly N">N. C. Grassly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rigoutsos, I" uniqKey="Rigoutsos I">I. Rigoutsos</name>
</author>
<author>
<name sortKey="Floratos, A" uniqKey="Floratos A">A. Floratos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robinson, D F" uniqKey="Robinson D">D. F. Robinson</name>
</author>
<author>
<name sortKey="Foulds, L R" uniqKey="Foulds L">L. R. Foulds</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F. Ronquist</name>
</author>
<author>
<name sortKey="Huelsenback, J P" uniqKey="Huelsenback J">J. P. Huelsenback</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saitou, N" uniqKey="Saitou N">N. Saitou</name>
</author>
<author>
<name sortKey="Nei, M" uniqKey="Nei M">M. Nei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stuart, G W" uniqKey="Stuart G">G. W. Stuart</name>
</author>
<author>
<name sortKey="Berry, M W" uniqKey="Berry M">M. W. Berry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stuart, G W" uniqKey="Stuart G">G. W. Stuart</name>
</author>
<author>
<name sortKey="Berry, M W" uniqKey="Berry M">M. W. Berry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stuart, G W" uniqKey="Stuart G">G. W. Stuart</name>
</author>
<author>
<name sortKey="Moffett, K" uniqKey="Moffett K">K. Moffett</name>
</author>
<author>
<name sortKey="Baker, S" uniqKey="Baker S">S. Baker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stuart, G W" uniqKey="Stuart G">G. W. Stuart</name>
</author>
<author>
<name sortKey="Moffett, K" uniqKey="Moffett K">K. Moffett</name>
</author>
<author>
<name sortKey="Leader, J J" uniqKey="Leader J">J. J. Leader</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Taylor, W R" uniqKey="Taylor W">W. R. Taylor</name>
</author>
<author>
<name sortKey="Jones, D T" uniqKey="Jones D">D. T. Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ulitsky, I" uniqKey="Ulitsky I">I. Ulitsky</name>
</author>
<author>
<name sortKey="Burstein, D" uniqKey="Burstein D">D. Burstein</name>
</author>
<author>
<name sortKey="Tuller, T" uniqKey="Tuller T">T. Tuller</name>
</author>
<author>
<name sortKey="Chor, B" uniqKey="Chor B">B. Chor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J. Van Helden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vinga, S" uniqKey="Vinga S">S. Vinga</name>
</author>
<author>
<name sortKey="Almeida, J" uniqKey="Almeida J">J. Almeida</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vinga, S" uniqKey="Vinga S">S. Vinga</name>
</author>
<author>
<name sortKey="Gouveia Oliveira, R" uniqKey="Gouveia Oliveira R">R. Gouveia-Oliveira</name>
</author>
<author>
<name sortKey="Almeida, J S" uniqKey="Almeida J">J. S. Almeida</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L. Wang</name>
</author>
<author>
<name sortKey="Jiang, T" uniqKey="Jiang T">T. Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, T J" uniqKey="Wu T">T.-J. Wu</name>
</author>
<author>
<name sortKey="Burke, J P" uniqKey="Burke J">J. P. Burke</name>
</author>
<author>
<name sortKey="Davison, D B" uniqKey="Davison D">D. B. Davison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, A C C" uniqKey="Yang A">A. C.-C. Yang</name>
</author>
<author>
<name sortKey="Goldberger, A L" uniqKey="Goldberger A">A. L. Goldberger</name>
</author>
<author>
<name sortKey="Peng, C K" uniqKey="Peng C">C.-K. Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, Z G" uniqKey="Yu Z">Z.-G. Yu</name>
</author>
<author>
<name sortKey="Anh, V" uniqKey="Anh V">V. Anh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zar, J H" uniqKey="Zar J">J. H. Zar</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Syst Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">Syst. Biol</journal-id>
<journal-id journal-id-type="hwp">sysbio</journal-id>
<journal-id journal-id-type="publisher-id">sysbio</journal-id>
<journal-title-group>
<journal-title>Systematic Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1063-5157</issn>
<issn pub-type="epub">1076-836X</issn>
<publisher>
<publisher-name>Society of Systematic Zoology</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17454975</article-id>
<article-id pub-id-type="pmc">7107264</article-id>
<article-id pub-id-type="doi">10.1080/10635150701294741</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Höhl</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="au1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ragan</surname>
<given-names>Mark A.</given-names>
</name>
<xref ref-type="aff" rid="au1">1</xref>
</contrib>
</contrib-group>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Page</surname>
<given-names>Rod</given-names>
</name>
<role>Associate Editor</role>
</contrib>
</contrib-group>
<aff id="au1">
<label>1</label>
<institution>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland</institution>
<addr-line>Brisbane, QLD 4072, Australia</addr-line>
E-mail:
<email>m.ragan@imb.uq.edu.au</email>
</aff>
<pub-date pub-type="ppub">
<month>4</month>
<year>2007</year>
</pub-date>
<pub-date pub-type="epub" iso-8601-date="2007-04-01">
<month>4</month>
<year>2007</year>
</pub-date>
<pub-date pub-type="pmc-release">
<month>4</month>
<year>2007</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>56</volume>
<issue>2</issue>
<fpage>206</fpage>
<lpage>221</lpage>
<history>
<date date-type="received">
<day>03</day>
<month>5</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>18</day>
<month>7</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>10</month>
<year>2006</year>
</date>
</history>
<permissions>
<copyright-statement>© 2007 Society of Systematic Biologists</copyright-statement>
<copyright-year>2007</copyright-year>
<license>
<license-p>This article is made available via the PMC Open Access Subset for unrestricted re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the COVID-19 pandemic or until permissions are revoked in writing. Upon expiration of these permissions, PMC is granted a perpetual license to make this article available via PMC and Europe PMC, consistent with existing copyright protections.</license-p>
</license>
</permissions>
<self-uri xlink:href="56-2-206.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<p>The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from
<ext-link ext-link-type="uri" xlink:href="http://www.bioinformatics.org.au">http://www.bioinformatics.org.au</ext-link>
(as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from
<italic>k</italic>
-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length
<italic>k</italic>
of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.</p>
</abstract>
<kwd-group>
<kwd>Alignment-free methods</kwd>
<kwd>Bayesian</kwd>
<kwd>distance estimation</kwd>
<kwd>phylogenetics</kwd>
<kwd>tree reconstruction</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<p>It is commonly believed that to infer a phylogenetic tree that represents the history of a set of molecular sequences, one must first arrange these sequences relative to each other in a way that presents the best available hypothesis of homology at each and every position in those molecules; ie., an optimal multiple sequence alignment (MSA). A large number of studies (many of which are cited in
<xref rid="b13" ref-type="bibr">Hall, 2005</xref>
, and
<xref rid="b25" ref-type="bibr">Ogden and Rosenberg, 2006</xref>
) indicate that under a wide range of biologically relevant situations, suboptimality of the MSA diminishes the accuracy of the resulting tree. The sensitivity of this relationship can differ depending on the shape of the tree, branch length, inference method, and other factors (
<xref rid="b25" ref-type="bibr">Ogden and Rosenberg, 2006</xref>
).</p>
<p>There is nonetheless a small literature (reviewed in
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
) that presents alternative approaches to molecular phylogenetic inference that do not involve prior MSA. Frequently, these involve two steps: the calculation of a matrix of pairwise distances among unaligned molecular sequences, followed by generation of a tree using a distance-based method such as neighbor-joining (
<xref rid="b33" ref-type="bibr">Saitou and Nei, 1987</xref>
). The fundamental difference from alignment-based methods obviously lies in the first step; ie., how pairwise distances in the underlying distance matrix are constituted. As MSA is NP-hard (
<xref rid="b43" ref-type="bibr">Wang and Jiang, 1994</xref>
) and most good heuristics are computationally expensive, there is intrinsic value in exploring polynomial-time alternatives.</p>
<p>In nonphylogenetic contexts, alignment-free methods are employed in tasks as diverse as sequence classification, database search, and detection of regulatory sequences; the literature on these applications is small but is growing at an increasing rate. Underlying principles and techniques together with applications are reviewed by
<xref rid="b41" ref-type="bibr">Vinga and Almeida (2003)</xref>
. In stark contrast to the plethora of studies investigating the accuracy of alignment-based tree reconstruction, surprisingly little is known about the accuracy of alignment-free methods, due to an almost complete absence of systematic and comprehensive large-scale studies from this field. In the context of phylogenetics, studies that introduce a new method have usually characterized its accuracy by comparing at most a handful reconstructed trees to “standard” trees derived from alignments, focusing on the clustering of subgroups and the placement of taxa instead of emphasizing numerical results (even though studies may otherwise be large-scale:
<xref rid="b22" ref-type="bibr">Li et al., 2001</xref>
;
<xref rid="b26" ref-type="bibr">Otu and Sayood, 2003</xref>
;
<xref rid="b36" ref-type="bibr">Stuart et al., 2002a</xref>
,
<xref rid="b37" ref-type="bibr">2002b</xref>
;
<xref rid="b34" ref-type="bibr">Stuart and Berry, 2003</xref>
,
<xref rid="b35" ref-type="bibr">2004</xref>
;
<xref rid="b27" ref-type="bibr">Qi et al., 2004</xref>
;
<xref rid="b6" ref-type="bibr">Chu et al., 2004</xref>
;
<xref rid="b14" ref-type="bibr">Hao and Qi, 2004</xref>
;
<xref rid="b46" ref-type="bibr">Yu and Anh, 2004</xref>
;
<xref rid="b45" ref-type="bibr">Yang et al., 2005</xref>
;
<xref rid="b23" ref-type="bibr">Mantaci et al., 2005</xref>
). This makes it difficult to extract useful generalizations from this literature, especially considering that data sets vary from paper to paper. A notable exception is the work of
<xref rid="b39" ref-type="bibr">Ulitsky et al. (2006)</xref>
, who compared their average common substring (ACS) approach favorably to three other alignment-free methods on a data set of 75 species, using a tree topology metric due to
<xref rid="b31" ref-type="bibr">Robinson and Foulds (1981)</xref>
; furthermore, they validated their ACS approach on (a) mitochondrial genomes and proteomes from 34 mammals, (b) 191 proteomes, and (c) a forest from 1865 viral genomes. Recently, we took a first step toward a more systematic and comprehensive comparison of alignment-free approaches in molecular phylogenetic inference (
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
), inferring trees by several methods and across a range of phylogenetic distances and calculating their topological distance from corresponding reference trees that were either samples drawn from tree distributions or based on structurally informed, manually curated multiple sequence alignments.</p>
<p>Here, we expand on and refine this evaluation framework, described in detail in Methods. First, we increase the power of our statistical assessment by doubling the number of taxa in our synthetic data sets. Second, we vary biologically important parameters such as among-site rate variation and sequence length. In the Results section these data sets are used to characterize the behavior of various alignment-free methods, among them one new approach and one variant of our pattern-based approach (
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
). We also compare the methods on a high-quality empirical data set, allowing us to gain insight into the effect of two different alphabets; robustness is achieved by employing appropriate statistical tests. We present an empirical analysis of the time required for pattern-based distance calculation, including the aforementioned variant that achieves a speed-up of an order of magnitude. The new alignment-free approach that we introduce in Methods is based on Bayesian inference, and we present an analysis of convergence and extent of burn-in at the very end of Results and Discussion.</p>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Alignment-Free Methods</title>
<p>We start by giving abbreviations that we use throughout this paper. The methods considered here are:
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
, the (squared) Euclidean distance;
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
, the standardized Euclidean distance;
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
, a distance based on the fractional common
<italic>K</italic>
-mer count;
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
, a distance based on probabilities of common
<italic>k</italic>
-mer counts under a multiplicative Poisson model;
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
, the composition distance;
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
, the W-metric;
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
, a distance based on Lempel-Ziv complexity;
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
, a distance based on the average common substring length;
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
, the pattern-based distance using maximum-likelihood (ML) estimation;
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
, a variant calculated using a similarity matrix;
<italic>B-bin</italic>
, the Bayesian inference from
<italic>K</italic>
-mers with a binary encoding;
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
, the ML estimate of phylogenetic distances from the correct alignment (
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
serves as a baseline).</p>
<p>With the exception of
<italic>B-bin</italic>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
, all alignment-free methods tested here have been described and compared previously (
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
). As a convenience for the reader, we provide short summaries and notationally consistent formulas here. These methods calculate pairwise distances between sequences, in contrast to
<italic>B-bin</italic>
, a novel method that we introduce below.</p>
<p>Let
<italic>X</italic>
(
<italic>Y</italic>
) denote a string of
<italic>n</italic>
(
<italic>m</italic>
) characters. There are
<italic>c</italic>
different characters in our alphabet
<inline-formula>
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</inline-formula>
; thus, for a word of length
<italic>k</italic>
, we have
<italic>w</italic>
=
<italic>c</italic>
<sup>
<italic>k</italic>
</sup>
so-called
<italic>K</italic>
-mers.</p>
<p>The (squared) Euclidean distance (
<xref rid="b4" ref-type="bibr">Blaisdell, 1986</xref>
) is calculated using
<italic>c</italic>
<sup>
<italic>X</italic>
</sup>
<sub>
<italic>i</italic>
</sub>
, the count of
<italic>K</italic>
-mer occurrences in
<italic>X</italic>
:
<disp-formula id="eq1">
<graphic xlink:href="56-2-206-m1.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
The standardized Euclidean distance (
<xref rid="b44" ref-type="bibr">Wu et al., 1997</xref>
) is calculated by dividing
<italic>f</italic>
<sup>
<italic>X</italic>
</sup>
<sub>
<italic>i</italic>
</sub>
, the relative frequencies of
<italic>K</italic>
-mer occurrences in
<italic>X</italic>
, by their standard deviations
<italic>s</italic>
<sup>
<italic>X</italic>
</sup>
<sub>
<italic>i</italic>
</sub>
:
<disp-formula id="eq2">
<graphic xlink:href="56-2-206-m2.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
The fractional common
<italic>k</italic>
-mer count (
<xref rid="b8" ref-type="bibr">Edgar, 2004a</xref>
) is derived from the common
<italic>K</italic>
-mer count
<italic>C</italic>
<sup>
<italic>XY</italic>
</sup>
<sub>
<italic>i</italic>
</sub>
between
<italic>X</italic>
and
<italic>Y</italic>
and is transformed into a distance
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
(
<italic>X</italic>
,
<italic>Y</italic>
) = −log (0.1+
<italic>F</italic>
).
<disp-formula id="eq3">
<graphic xlink:href="56-2-206-m3.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
Under a multiplicative Poisson model (
<xref rid="b40" ref-type="bibr">Van Helden, 2004</xref>
), probabilities of common
<italic>k</italic>
-mer counts yield a distance:
<disp-formula id="eq4">
<graphic xlink:href="56-2-206-m4.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
The composition distance (
<xref rid="b14" ref-type="bibr">Hao and Qi, 2004</xref>
) between
<italic>X</italic>
and
<italic>Y</italic>
is calculated from their correlation as
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
(
<italic>X</italic>
,
<italic>Y</italic>
) = [1 − cos (
<italic>X</italic>
,
<italic>Y</italic>
)]/2. More precisely, it is the cosine of the angle between their composition vectors V = (C − E)/E of
<italic>K</italic>
-mers in
<italic>X</italic>
and in
<italic>Y</italic>
, where C denotes occurrence counts and E expected counts under a Markov model of order
<italic>k</italic>
−2.</p>
<p>The W-metric (
<xref rid="b42" ref-type="bibr">Vinga et al., 2004</xref>
) weighs differences between all pairs of amino acids by their entries in matrix W. Here, we use BLOSUM62 (
<xref rid="b16" ref-type="bibr">Henikoff and Henikoff, 1992</xref>
).
<disp-formula id="eq5">
<graphic xlink:href="56-2-206-m5.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
The Lempel-Ziv complexity of
<italic>X</italic>
,
<italic>c</italic>
(
<italic>X</italic>
) (
<xref rid="b20" ref-type="bibr">Lempel and Ziv, 1976</xref>
) can be used to define a distance measure (
<xref rid="b26" ref-type="bibr">Otu and Sayood, 2003</xref>
), where
<italic>XY</italic>
refers to the concatenation of
<italic>X</italic>
and
<italic>Y</italic>
:
<disp-formula id="eq6">
<graphic xlink:href="56-2-206-m6.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
The average common substring distance (
<xref rid="b39" ref-type="bibr">Ulitsky et al., 2006</xref>
) requires definition of
<italic>L</italic>
(
<italic>X</italic>
,
<italic>Y</italic>
) = ∑
<sub>
<italic>i</italic>
= 1</sub>
<sup>
<italic>n</italic>
</sup>
<sup>
<italic>XY</italic>
</sup>
<sub>
<italic>i</italic>
</sub>
/
<italic>n</italic>
, where ℓ
<sup>
<italic>XY</italic>
</sup>
<sub>
<italic>i</italic>
</sub>
is the length of the longest string starting at
<italic>X</italic>
<sub>
<italic>i</italic>
</sub>
that exactly matches a string starting at
<italic>Y</italic>
<sub>
<italic>j</italic>
</sub>
.
<disp-formula id="eq7">
<graphic xlink:href="56-2-206-m7.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
<disp-formula id="eq8">
<graphic xlink:href="56-2-206-m8.jpg" position="float" orientation="portrait"></graphic>
</disp-formula>
The pattern-based distance (
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
;
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
) is calculated as follows. In a first step, maximal patterns are discovered in unaligned sequences using TEIRESIAS (
<xref rid="b30" ref-type="bibr">Rigoutsos and Floratos, 1998</xref>
) with parameters
<italic>L</italic>
= 4,
<italic>W</italic>
= 16, and
<italic>K</italic>
= 2 (
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
); patterns occurring more than once in any sequence are removed. For each pair of sequences, all corresponding pattern instances are concatenated and distances are calculated from these new strings using ML estimation under the JTT model (
<xref rid="b19" ref-type="bibr">Jones et al., 1992</xref>
) as implemented in Protdist from the PHYLIP package(
<xref rid="b11" ref-type="bibr">Felsenstein, 2005</xref>
).</p>
<p>We now present a variant (d
<sup>
<italic>pb</italic>
<italic>sim</italic>
</sup>
) that utilizes the BLOSUM62 similarity matrix to speed up distance calculation. We transform a similarity matrix S into a distance matrix D (
<xref rid="b38" ref-type="bibr">Taylor and Jones, 1993</xref>
)::
<italic>D</italic>
<sub>
<italic>ij</italic>
</sub>
=
<italic>S</italic>
<sub>
<italic>ii</italic>
</sub>
+
<italic>S</italic>
<sub>
<italic>jj</italic>
</sub>
− 2
<italic>S</italic>
<sub>
<italic>ij</italic>
</sub>
. For each pair of concatenated strings
<italic>X</italic>
and
<italic>Y</italic>
(of common length
<italic>n</italic>
), we calculate the distance as
<italic>d</italic>
(
<italic>X</italic>
,
<italic>Y</italic>
) = ∑
<sup>
<italic>n</italic>
</sup>
<sub>
<italic>i</italic>
= 1</sub>
<italic>D</italic>
<sub>
<italic>X</italic>
<sub>
<italic>i</italic>
</sub>
<italic>Y</italic>
<sub>
<italic>i</italic>
</sub>
</sub>
/
<italic>n</italic>
, where
<italic>X</italic>
<sub>
<italic>i</italic>
</sub>
denotes the character at position
<italic>i</italic>
in
<italic>X</italic>
.</p>
<sec>
<title>Bayesian phylogenetic inference from
<italic>k</italic>
-mers</title>
<p>We propose a novel way of utilizing the phylogenetic information inherent in the distribution of
<italic>K</italic>
-mers among a set of sequences without calculating pairwise distances. Instead, we encode
<italic>K</italic>
-mers as character states and estimate posterior probabilities (PPs) of bipartitions using Mr-bayes (
<xref rid="b18" ref-type="bibr">Huelsenbeck and Ronquist, 2001</xref>
;
<xref rid="b32" ref-type="bibr">Ronquist and Huelsenback, 2003</xref>
). For the purpose of this work, we (a) build the consensus tree employing the extended 50% majority rule (
<xref rid="b11" ref-type="bibr">Felsenstein, 2005</xref>
) and (b) use the consensus tree to estimate the accuracy of the method, although more sophisticated ways of utilizing the resulting data are possible.</p>
<p>Each possible
<italic>K</italic>
-mer is either present in or absent from a sequence, and thus the
<italic>K</italic>
-mer content of each sequence can be encoded by a set of binary states variables. However, because the number of possible
<italic>K</italic>
-mers grows exponentially with
<italic>k</italic>
, and most
<italic>K</italic>
-mers will not be present in any sequence when
<italic>k</italic>
is large, we record presence/absence data only for those
<italic>K</italic>
-mers that appear in at least one sequence. This practice introduces a data aquisition bias; fortunately, Mr-bayes implements models that correct for just such an aquisition bias: this correction is achieved by setting the lset coding=noabsencesites option.
<xref rid="b10" ref-type="bibr">Felsenstein (1992)</xref>
developed a model for binary states (the binary model in Mr-bayes), originally for restriction site presence/absence data.
<xref rid="b21" ref-type="bibr">Lewis (2001)</xref>
generalized this model to include ≥ 2 states (the standard discrete model in MrBayes). We use the latter one with two (binary) states. It features an instantaneous rate matrix Q with two stationary state frequencies that model the rate of word gain and word loss. Using this model, it is not possible to estimate unequal stationary state frequencies (they are assumed to be equal), but we can allow the state frequencies to vary over sites, and hence set the symmetric Dirichlet hyperprior (prset symdirihyperpr = exponential(1.0)). The discrete approximation of the Dirichlet distribution uses five categories (default in MrBayes). We place a uniform prior on topology, and an unconstrained exponential prior on branch lengths with mean 0.1 (default in MrBayes). We denote this binary encoding of
<italic>K</italic>
-mers by
<italic>B-bin</italic>
, and we analyze its convergence and the extent of its burn-in phase at the very end of Results and Discussion.</p>
<p>We note that the presence/absence data from
<italic>K</italic>
-mers violate assumptions of the simple binary model in two cases: (a)
<italic>K</italic>
-mers appear/disappear together as they overlap; hence, their occurrence is not independent of each other. A simple way to achieve independence is to take words that occur at position
<italic>a</italic>
+
<italic>bk</italic>
where
<italic>a</italic>
∈ [1,
<italic>k</italic>
] and
<italic>b</italic>
takes on values ≥ 0 (subject to sequence-length constraints). This process discards much data and thus seems a reasonable approach only for sufficiently long sequences. (b)
<italic>K</italic>
-mer loss is coupled to
<italic>K</italic>
-mer gain. Generally, as sequence change reduces the count for one word, the count for another word will be increased. The number of distinct
<italic>K</italic>
-mers that are gained or lost as a result of a single sequence change increases with
<italic>k</italic>
, and thus the departure of the actual data from our assumption of independent gain and loss is expected to become more apparent for longer words. However,
<italic>k</italic>
is relatively small in all of our analyses. Comparison of
<xref ref-type="fig" rid="fig1">Figure 1</xref>
for
<italic>B-bin</italic>
with corresponding figures for other word-based methods suggests that both of these violations have only minor influence in this setting, and statistical analysis will reveal that the best performing parameterizations of
<italic>B-bin</italic>
(which exhibit rather short words) and other word-based methods are indistinguishable. To analyze the degree to which the data and the model (mis)match, it is possible to generate data (here, distributions of binary states) under the model and then see how they (dis)agree with actual data. This self-consistency check is known as posterior predictive checking (
<xref rid="b12" ref-type="bibr">Gelman et al., 2004</xref>
).</p>
<fig id="fig1" orientation="portrait" position="float">
<label>Figure 1</label>
<caption>
<p>RF distance landscape for method
<italic>B-bin</italic>
. Average RF distance (
<italic>y</italic>
-axis) of method
<italic>B-bin</italic>
on three reference sets (top to bottom: set 2, set 4, and set 6) of two synthetic data sets (a, c, e: control; b, d, f: ASRV). Each subfigure shows the behavior as a function of word length
<italic>k</italic>
(
<italic>x</italic>
-axis) for two alphabets (AA: original amino acids, CE: chemical equivalence classes). Points are joined for ease of visual inspection only.</p>
</caption>
<graphic xlink:href="56-2-206-f1"></graphic>
</fig>
</sec>
</sec>
<sec>
<title>Data Sets</title>
<p>We employ two different types of data: (a) synthetic data that allow us to control the conditions, and for which we know the true phylogenetic trees; and (b) empirical data that was previously used to quantify the extent of lateral gene transfer (
<xref rid="b2" ref-type="bibr">Beiko et al., 2005b</xref>
), and for which high-quality phylogenetic trees exist.</p>
<p>We proceed as
<xref rid="b17" ref-type="bibr">Höhl et al., (2006)</xref>
did and complement the original amino acid sequences (AA) with sequences encoded in a reduced alphabet based on chemical equivalences (CE). The alphabet consists of the classes [AG], [DE], [FY], [KR], [ILMV], [QN], [ST], [BZX] where “[…]” groups similar amino acids together and unlisted amino acids form classes of their own.</p>
<sec>
<title>Synthetic data</title>
<p>The synthetic data were generated in a fashion very similar to
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
): we sampled trees from several tree distributions resulting from birth–death processes (
<xref rid="b24" ref-type="bibr">Nee et al., 1994</xref>
) and deviated the rooted, bifurcating trees from ultrametricity by an additive process. Using PhyloGen V1.1 (
<xref rid="b28" ref-type="bibr">Rambaut, 2002</xref>
) we sampled seven sets of 100 eight-taxon reference trees each; the parameters were birth = 10.0 and death = 5.0, with extant∈ [40, 133, …, 40,000]. The induced pairwise phylogenetic reference distances have medians of [0.75, 1.10, 1.62, 2.07, 2.44, 2.99, 3.42] substitutions per site; their upper and lower quartiles are within 0.38 units of these values. Out of a total 19, 600 distances, 2205 (corresponding to 11.25%) are < 0.75, down to < 0.01; 1940 distances (about 9.90%) are > 3.42, limited by 5.35.
<xref rid="tbl9" ref-type="table">Table A3</xref>
to
<xref rid="tbl11" ref-type="table">Table A5</xref>
show median distances calculated using methods parameterized as in
<xref rid="tbl1" ref-type="table">Tables 1</xref>
,
<xref rid="tbl2" ref-type="table">2</xref>
, and
<xref rid="tbl7" ref-type="table">Table A1</xref>
.</p>
<table-wrap id="tbl1" orientation="portrait" position="float">
<label>Table 1.</label>
<caption>
<p>Control data set. Average RF distance for each reference set of the synthetic control data set (sequence length of 1000 amino acids, no ASRV). For word-based methods, we show the best performing word length
<italic>k</italic>
for each alphabet
<italic>A</italic>
(AA: original amino acids; CE: chemical equivalence classes), the only exception being
<italic>B-bin</italic>
with CE:
<italic>k</italic>
= 5 is slightly better on this data set but
<italic>k</italic>
= 4 performs better on the other two data sets. Methods are ordered according to their rank sums ∑
<sub>R</sub>
. The Friedman test statistic is
<italic>F</italic>
<sub>
<italic>R</italic>
</sub>
= 4758.1 (
<italic>P</italic>
< 10
<sup>−10</sup>
). Significant differences are found at or beyond the α = 0.05 level between the following pairs (numbers refer to column “No.”): method 1 versus methods 22–2: method 2 versus methods 22–4; method 3 versus methods 22–5; methods 4 and 5 versus methods 22–6; method 6 versus methods 22–18; method 7 versus methods 22–19; methods 8–19 versus methods 22–20; and methods 20 and 21 versus method 22.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">Reference set of control data</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="7" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in1.jpg"></inline-graphic>
</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">1</th>
<th align="center" colspan="1" rowspan="1">2</th>
<th align="center" colspan="1" rowspan="1">3</th>
<th align="center" colspan="1" rowspan="1">4</th>
<th align="center" colspan="1" rowspan="1">5</th>
<th align="center" colspan="1" rowspan="1">6</th>
<th align="center" colspan="1" rowspan="1">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">3228.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.024</td>
<td align="center" colspan="1" rowspan="1">0.044</td>
<td align="center" colspan="1" rowspan="1">0.068</td>
<td align="center" colspan="1" rowspan="1">0.092</td>
<td align="center" colspan="1" rowspan="1">0.140</td>
<td align="center" colspan="1" rowspan="1">0.160</td>
<td align="center" colspan="1" rowspan="1">0.192</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">4285.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.044</td>
<td align="center" colspan="1" rowspan="1">0.068</td>
<td align="center" colspan="1" rowspan="1">0.090</td>
<td align="center" colspan="1" rowspan="1">0.148</td>
<td align="center" colspan="1" rowspan="1">0.266</td>
<td align="center" colspan="1" rowspan="1">0.356</td>
<td align="center" colspan="1" rowspan="1">0.518</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">4483.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.040</td>
<td align="center" colspan="1" rowspan="1">0.084</td>
<td align="center" colspan="1" rowspan="1">0.096</td>
<td align="center" colspan="1" rowspan="1">0.154</td>
<td align="center" colspan="1" rowspan="1">0.276</td>
<td align="center" colspan="1" rowspan="1">0.388</td>
<td align="center" colspan="1" rowspan="1">0.556</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">5374.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.044</td>
<td align="center" colspan="1" rowspan="1">0.070</td>
<td align="center" colspan="1" rowspan="1">0.104</td>
<td align="center" colspan="1" rowspan="1">0.176</td>
<td align="center" colspan="1" rowspan="1">0.362</td>
<td align="center" colspan="1" rowspan="1">0.570</td>
<td align="center" colspan="1" rowspan="1">0.736</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">5650.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.050</td>
<td align="center" colspan="1" rowspan="1">0.076</td>
<td align="center" colspan="1" rowspan="1">0.120</td>
<td align="center" colspan="1" rowspan="1">0.176</td>
<td align="center" colspan="1" rowspan="1">0.380</td>
<td align="center" colspan="1" rowspan="1">0.612</td>
<td align="center" colspan="1" rowspan="1">0.744</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">8127.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.068</td>
<td align="center" colspan="1" rowspan="1">0.156</td>
<td align="center" colspan="1" rowspan="1">0.222</td>
<td align="center" colspan="1" rowspan="1">0.392</td>
<td align="center" colspan="1" rowspan="1">0.590</td>
<td align="center" colspan="1" rowspan="1">0.744</td>
<td align="center" colspan="1" rowspan="1">0.872</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="center" colspan="1" rowspan="1">8285.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.076</td>
<td align="center" colspan="1" rowspan="1">0.108</td>
<td align="center" colspan="1" rowspan="1">0.234</td>
<td align="center" colspan="1" rowspan="1">0.398</td>
<td align="center" colspan="1" rowspan="1">0.660</td>
<td align="center" colspan="1" rowspan="1">0.756</td>
<td align="center" colspan="1" rowspan="1">0.872</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8</td>
<td align="center" colspan="1" rowspan="1">8316.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.082</td>
<td align="center" colspan="1" rowspan="1">0.160</td>
<td align="center" colspan="1" rowspan="1">0.276</td>
<td align="center" colspan="1" rowspan="1">0.398</td>
<td align="center" colspan="1" rowspan="1">0.624</td>
<td align="center" colspan="1" rowspan="1">0.712</td>
<td align="center" colspan="1" rowspan="1">0.844</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="center" colspan="1" rowspan="1">8336.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.058</td>
<td align="center" colspan="1" rowspan="1">0.124</td>
<td align="center" colspan="1" rowspan="1">0.228</td>
<td align="center" colspan="1" rowspan="1">0.402</td>
<td align="center" colspan="1" rowspan="1">0.660</td>
<td align="center" colspan="1" rowspan="1">0.778</td>
<td align="center" colspan="1" rowspan="1">0.882</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="center" colspan="1" rowspan="1">8362.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.062</td>
<td align="center" colspan="1" rowspan="1">0.112</td>
<td align="center" colspan="1" rowspan="1">0.224</td>
<td align="center" colspan="1" rowspan="1">0.420</td>
<td align="center" colspan="1" rowspan="1">0.666</td>
<td align="center" colspan="1" rowspan="1">0.798</td>
<td align="center" colspan="1" rowspan="1">0.870</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="center" colspan="1" rowspan="1">8452.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.052</td>
<td align="center" colspan="1" rowspan="1">0.130</td>
<td align="center" colspan="1" rowspan="1">0.240</td>
<td align="center" colspan="1" rowspan="1">0.418</td>
<td align="center" colspan="1" rowspan="1">0.662</td>
<td align="center" colspan="1" rowspan="1">0.790</td>
<td align="center" colspan="1" rowspan="1">0.882</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12</td>
<td align="center" colspan="1" rowspan="1">8529.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.054</td>
<td align="center" colspan="1" rowspan="1">0.110</td>
<td align="center" colspan="1" rowspan="1">0.240</td>
<td align="center" colspan="1" rowspan="1">0.432</td>
<td align="center" colspan="1" rowspan="1">0.696</td>
<td align="center" colspan="1" rowspan="1">0.806</td>
<td align="center" colspan="1" rowspan="1">0.872</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">13</td>
<td align="center" colspan="1" rowspan="1">8555.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.060</td>
<td align="center" colspan="1" rowspan="1">0.128</td>
<td align="center" colspan="1" rowspan="1">0.244</td>
<td align="center" colspan="1" rowspan="1">0.430</td>
<td align="center" colspan="1" rowspan="1">0.676</td>
<td align="center" colspan="1" rowspan="1">0.784</td>
<td align="center" colspan="1" rowspan="1">0.880</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="center" colspan="1" rowspan="1">8572.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.062</td>
<td align="center" colspan="1" rowspan="1">0.108</td>
<td align="center" colspan="1" rowspan="1">0.240</td>
<td align="center" colspan="1" rowspan="1">0.436</td>
<td align="center" colspan="1" rowspan="1">0.688</td>
<td align="center" colspan="1" rowspan="1">0.804</td>
<td align="center" colspan="1" rowspan="1">0.880</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="center" colspan="1" rowspan="1">8706.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.076</td>
<td align="center" colspan="1" rowspan="1">0.156</td>
<td align="center" colspan="1" rowspan="1">0.274</td>
<td align="center" colspan="1" rowspan="1">0.440</td>
<td align="center" colspan="1" rowspan="1">0.684</td>
<td align="center" colspan="1" rowspan="1">0.746</td>
<td align="center" colspan="1" rowspan="1">0.862</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="center" colspan="1" rowspan="1">8846.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.066</td>
<td align="center" colspan="1" rowspan="1">0.146</td>
<td align="center" colspan="1" rowspan="1">0.268</td>
<td align="center" colspan="1" rowspan="1">0.472</td>
<td align="center" colspan="1" rowspan="1">0.672</td>
<td align="center" colspan="1" rowspan="1">0.792</td>
<td align="center" colspan="1" rowspan="1">0.868</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">17</td>
<td align="center" colspan="1" rowspan="1">9015.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.064</td>
<td align="center" colspan="1" rowspan="1">0.138</td>
<td align="center" colspan="1" rowspan="1">0.290</td>
<td align="center" colspan="1" rowspan="1">0.480</td>
<td align="center" colspan="1" rowspan="1">0.710</td>
<td align="center" colspan="1" rowspan="1">0.800</td>
<td align="center" colspan="1" rowspan="1">0.876</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">18</td>
<td align="center" colspan="1" rowspan="1">9046.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.072</td>
<td align="center" colspan="1" rowspan="1">0.116</td>
<td align="center" colspan="1" rowspan="1">0.270</td>
<td align="center" colspan="1" rowspan="1">0.488</td>
<td align="center" colspan="1" rowspan="1">0.712</td>
<td align="center" colspan="1" rowspan="1">0.826</td>
<td align="center" colspan="1" rowspan="1">0.890</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19</td>
<td align="center" colspan="1" rowspan="1">9192.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.080</td>
<td align="center" colspan="1" rowspan="1">0.138</td>
<td align="center" colspan="1" rowspan="1">0.300</td>
<td align="center" colspan="1" rowspan="1">0.506</td>
<td align="center" colspan="1" rowspan="1">0.686</td>
<td align="center" colspan="1" rowspan="1">0.792</td>
<td align="center" colspan="1" rowspan="1">0.900</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="center" colspan="1" rowspan="1">10,286.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.110</td>
<td align="center" colspan="1" rowspan="1">0.188</td>
<td align="center" colspan="1" rowspan="1">0.394</td>
<td align="center" colspan="1" rowspan="1">0.588</td>
<td align="center" colspan="1" rowspan="1">0.798</td>
<td align="center" colspan="1" rowspan="1">0.862</td>
<td align="center" colspan="1" rowspan="1">0.888</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">21</td>
<td align="center" colspan="1" rowspan="1">10,851.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.116</td>
<td align="center" colspan="1" rowspan="1">0.240</td>
<td align="center" colspan="1" rowspan="1">0.420</td>
<td align="center" colspan="1" rowspan="1">0.648</td>
<td align="center" colspan="1" rowspan="1">0.792</td>
<td align="center" colspan="1" rowspan="1">0.884</td>
<td align="center" colspan="1" rowspan="1">0.904</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">22</td>
<td align="center" colspan="1" rowspan="1">12,599.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">(1)</td>
<td align="center" colspan="1" rowspan="1">0.494</td>
<td align="center" colspan="1" rowspan="1">0.564</td>
<td align="center" colspan="1" rowspan="1">0.688</td>
<td align="center" colspan="1" rowspan="1">0.700</td>
<td align="center" colspan="1" rowspan="1">0.836</td>
<td align="center" colspan="1" rowspan="1">0.868</td>
<td align="center" colspan="1" rowspan="1">0.892</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tbl2" orientation="portrait" position="float">
<label>Table 2.</label>
<caption>
<p>ASRV data set. Average RF distance for each reference set of the synthetic ASRV data set (sequence length of 1000 amino acids, high ASRV with α = 0.5). Order of methods and values for
<italic>k</italic>
are determined as in
<xref rid="tbl1" ref-type="table">Table 1</xref>
. The Friedman test statistic is
<italic>F</italic>
<sub>
<italic>R</italic>
</sub>
= 4873.2 (
<italic>P</italic>
< 10
<sup>−10</sup>
). Significant differences are found at or beyond the α = 0.05 level between the following pairs (numbers refer to column “No.”): method 1 versus methods 22–2; methods 2–5 versus methods 22–6; methods 6–8 versus methods 22–12; method 9 versus methods 22–14; method 10 versus methods 22–17; method 11 versus methods 22–19; methods 12–19 versus methods 22–20; and methods 20 and 21 versus method 22.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">Reference set of ASRV data</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="7" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in1.jpg"></inline-graphic>
</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">1</th>
<th align="center" colspan="1" rowspan="1">2</th>
<th align="center" colspan="1" rowspan="1">3</th>
<th align="center" colspan="1" rowspan="1">4</th>
<th align="center" colspan="1" rowspan="1">5</th>
<th align="center" colspan="1" rowspan="1">6</th>
<th align="center" colspan="1" rowspan="1">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">4571.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.040</td>
<td align="center" colspan="1" rowspan="1">0.068</td>
<td align="center" colspan="1" rowspan="1">0.078</td>
<td align="center" colspan="1" rowspan="1">0.108</td>
<td align="center" colspan="1" rowspan="1">0.144</td>
<td align="center" colspan="1" rowspan="1">0.202</td>
<td align="center" colspan="1" rowspan="1">0.238</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">4958.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.040</td>
<td align="center" colspan="1" rowspan="1">0.066</td>
<td align="center" colspan="1" rowspan="1">0.100</td>
<td align="center" colspan="1" rowspan="1">0.122</td>
<td align="center" colspan="1" rowspan="1">0.188</td>
<td align="center" colspan="1" rowspan="1">0.226</td>
<td align="center" colspan="1" rowspan="1">0.312</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">5121.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.042</td>
<td align="center" colspan="1" rowspan="1">0.070</td>
<td align="center" colspan="1" rowspan="1">0.108</td>
<td align="center" colspan="1" rowspan="1">0.130</td>
<td align="center" colspan="1" rowspan="1">0.196</td>
<td align="center" colspan="1" rowspan="1">0.244</td>
<td align="center" colspan="1" rowspan="1">0.316</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">5647.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.056</td>
<td align="center" colspan="1" rowspan="1">0.082</td>
<td align="center" colspan="1" rowspan="1">0.122</td>
<td align="center" colspan="1" rowspan="1">0.158</td>
<td align="center" colspan="1" rowspan="1">0.214</td>
<td align="center" colspan="1" rowspan="1">0.278</td>
<td align="center" colspan="1" rowspan="1">0.360</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">5722.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.058</td>
<td align="center" colspan="1" rowspan="1">0.092</td>
<td align="center" colspan="1" rowspan="1">0.126</td>
<td align="center" colspan="1" rowspan="1">0.154</td>
<td align="center" colspan="1" rowspan="1">0.216</td>
<td align="center" colspan="1" rowspan="1">0.282</td>
<td align="center" colspan="1" rowspan="1">0.364</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">7329.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.072</td>
<td align="center" colspan="1" rowspan="1">0.114</td>
<td align="center" colspan="1" rowspan="1">0.158</td>
<td align="center" colspan="1" rowspan="1">0.226</td>
<td align="center" colspan="1" rowspan="1">0.350</td>
<td align="center" colspan="1" rowspan="1">0.400</td>
<td align="center" colspan="1" rowspan="1">0.498</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="center" colspan="1" rowspan="1">7350.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.074</td>
<td align="center" colspan="1" rowspan="1">0.116</td>
<td align="center" colspan="1" rowspan="1">0.146</td>
<td align="center" colspan="1" rowspan="1">0.228</td>
<td align="center" colspan="1" rowspan="1">0.348</td>
<td align="center" colspan="1" rowspan="1">0.430</td>
<td align="center" colspan="1" rowspan="1">0.492</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8</td>
<td align="center" colspan="1" rowspan="1">7353.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.078</td>
<td align="center" colspan="1" rowspan="1">0.110</td>
<td align="center" colspan="1" rowspan="1">0.154</td>
<td align="center" colspan="1" rowspan="1">0.230</td>
<td align="center" colspan="1" rowspan="1">0.354</td>
<td align="center" colspan="1" rowspan="1">0.406</td>
<td align="center" colspan="1" rowspan="1">0.498</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="center" colspan="1" rowspan="1">7628.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.062</td>
<td align="center" colspan="1" rowspan="1">0.102</td>
<td align="center" colspan="1" rowspan="1">0.158</td>
<td align="center" colspan="1" rowspan="1">0.226</td>
<td align="center" colspan="1" rowspan="1">0.364</td>
<td align="center" colspan="1" rowspan="1">0.460</td>
<td align="center" colspan="1" rowspan="1">0.558</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="center" colspan="1" rowspan="1">7741.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.082</td>
<td align="center" colspan="1" rowspan="1">0.124</td>
<td align="center" colspan="1" rowspan="1">0.180</td>
<td align="center" colspan="1" rowspan="1">0.248</td>
<td align="center" colspan="1" rowspan="1">0.368</td>
<td align="center" colspan="1" rowspan="1">0.440</td>
<td align="center" colspan="1" rowspan="1">0.506</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="center" colspan="1" rowspan="1">8177.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.090</td>
<td align="center" colspan="1" rowspan="1">0.112</td>
<td align="center" colspan="1" rowspan="1">0.174</td>
<td align="center" colspan="1" rowspan="1">0.244</td>
<td align="center" colspan="1" rowspan="1">0.400</td>
<td align="center" colspan="1" rowspan="1">0.510</td>
<td align="center" colspan="1" rowspan="1">0.582</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12</td>
<td align="center" colspan="1" rowspan="1">8424.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.092</td>
<td align="center" colspan="1" rowspan="1">0.146</td>
<td align="center" colspan="1" rowspan="1">0.202</td>
<td align="center" colspan="1" rowspan="1">0.248</td>
<td align="center" colspan="1" rowspan="1">0.386</td>
<td align="center" colspan="1" rowspan="1">0.488</td>
<td align="center" colspan="1" rowspan="1">0.596</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">13</td>
<td align="center" colspan="1" rowspan="1">8452.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.082</td>
<td align="center" colspan="1" rowspan="1">0.136</td>
<td align="center" colspan="1" rowspan="1">0.182</td>
<td align="center" colspan="1" rowspan="1">0.272</td>
<td align="center" colspan="1" rowspan="1">0.440</td>
<td align="center" colspan="1" rowspan="1">0.484</td>
<td align="center" colspan="1" rowspan="1">0.608</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="center" colspan="1" rowspan="1">8535.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.082</td>
<td align="center" colspan="1" rowspan="1">0.120</td>
<td align="center" colspan="1" rowspan="1">0.186</td>
<td align="center" colspan="1" rowspan="1">0.238</td>
<td align="center" colspan="1" rowspan="1">0.420</td>
<td align="center" colspan="1" rowspan="1">0.550</td>
<td align="center" colspan="1" rowspan="1">0.640</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="center" colspan="1" rowspan="1">8546.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.086</td>
<td align="center" colspan="1" rowspan="1">0.150</td>
<td align="center" colspan="1" rowspan="1">0.202</td>
<td align="center" colspan="1" rowspan="1">0.258</td>
<td align="center" colspan="1" rowspan="1">0.412</td>
<td align="center" colspan="1" rowspan="1">0.496</td>
<td align="center" colspan="1" rowspan="1">0.604</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="center" colspan="1" rowspan="1">8593.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.086</td>
<td align="center" colspan="1" rowspan="1">0.132</td>
<td align="center" colspan="1" rowspan="1">0.192</td>
<td align="center" colspan="1" rowspan="1">0.256</td>
<td align="center" colspan="1" rowspan="1">0.438</td>
<td align="center" colspan="1" rowspan="1">0.514</td>
<td align="center" colspan="1" rowspan="1">0.624</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">17</td>
<td align="center" colspan="1" rowspan="1">8664.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.106</td>
<td align="center" colspan="1" rowspan="1">0.152</td>
<td align="center" colspan="1" rowspan="1">0.220</td>
<td align="center" colspan="1" rowspan="1">0.270</td>
<td align="center" colspan="1" rowspan="1">0.402</td>
<td align="center" colspan="1" rowspan="1">0.492</td>
<td align="center" colspan="1" rowspan="1">0.588</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">18</td>
<td align="center" colspan="1" rowspan="1">9025.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.090</td>
<td align="center" colspan="1" rowspan="1">0.130</td>
<td align="center" colspan="1" rowspan="1">0.238</td>
<td align="center" colspan="1" rowspan="1">0.280</td>
<td align="center" colspan="1" rowspan="1">0.460</td>
<td align="center" colspan="1" rowspan="1">0.540</td>
<td align="center" colspan="1" rowspan="1">0.660</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19</td>
<td align="center" colspan="1" rowspan="1">9119.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.102</td>
<td align="center" colspan="1" rowspan="1">0.164</td>
<td align="center" colspan="1" rowspan="1">0.220</td>
<td align="center" colspan="1" rowspan="1">0.294</td>
<td align="center" colspan="1" rowspan="1">0.452</td>
<td align="center" colspan="1" rowspan="1">0.556</td>
<td align="center" colspan="1" rowspan="1">0.634</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="center" colspan="1" rowspan="1">10,511.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.116</td>
<td align="center" colspan="1" rowspan="1">0.212</td>
<td align="center" colspan="1" rowspan="1">0.278</td>
<td align="center" colspan="1" rowspan="1">0.394</td>
<td align="center" colspan="1" rowspan="1">0.574</td>
<td align="center" colspan="1" rowspan="1">0.644</td>
<td align="center" colspan="1" rowspan="1">0.720</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">21</td>
<td align="center" colspan="1" rowspan="1">11,216.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.126</td>
<td align="center" colspan="1" rowspan="1">0.214</td>
<td align="center" colspan="1" rowspan="1">0.330</td>
<td align="center" colspan="1" rowspan="1">0.488</td>
<td align="center" colspan="1" rowspan="1">0.620</td>
<td align="center" colspan="1" rowspan="1">0.716</td>
<td align="center" colspan="1" rowspan="1">0.780</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">22</td>
<td align="center" colspan="1" rowspan="1">14,411.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">(1)</td>
<td align="center" colspan="1" rowspan="1">0.502</td>
<td align="center" colspan="1" rowspan="1">0.632</td>
<td align="center" colspan="1" rowspan="1">0.708</td>
<td align="center" colspan="1" rowspan="1">0.786</td>
<td align="center" colspan="1" rowspan="1">0.854</td>
<td align="center" colspan="1" rowspan="1">0.866</td>
<td align="center" colspan="1" rowspan="1">0.880</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Sequences were evolved along the branches of the deviated trees using SEQ-GEN (
<xref rid="b29" ref-type="bibr">Rambaut and Grassly, 1997</xref>
) V1.3.2 under the JTT model. (Whereever possible, we parameterized alignment-free methods with the JTT model and its equilibrium frequencies.) We created a control data set with a sequence length of 1000 amino acids; the main difference from the data by
<xref rid="b17" ref-type="bibr">Höhl et al., (2006)</xref>
is the use of twice as many taxa. In addition to that, we created sequences of 1000 amino acids under a model featuring high among-site rate variation (ASRV; the shape parameter of the continous gamma distribution was α = 0.5), and we created sequences of only 300 amino acids (without the presence of ASRV).</p>
</sec>
<sec>
<title>Empirical data</title>
<p>Analysis of 144 prokaryotes led to the construction of 22,437 MRCs (maximally representative clusters:
<xref rid="b15" ref-type="bibr">Harlow et al., 2004</xref>
), each containing
<italic>n</italic>
≥ 4 protein sequences conceptually translated from their genomes, representing putative orthologs. To each MRC corresponds a highest scoring multiple sequence alignment according to the word-oriented objective function (
<xref rid="b1" ref-type="bibr">Beiko et al., 2005a</xref>
). Each chosen alignment was subjected to a GBLOCKS (
<xref rid="b5" ref-type="bibr">Castresana, 2000</xref>
) analysis to remove ambiguously aligned regions (for settings see Beiko et al. 2005b; supplementary material). The remaining 22,432 trimmed alignments formed the basis for a Bayesian phylogenetic inference using MrBayes, resulting in as many consensus trees determined by the extended 50% majority rule and complete with PPs for all bipartions. Parameters in this Bayesian analysis were uniform priors on topology, branch length ∈ (0.0, 10.0], and model of sequence change (five models were considered). ASRV was modeled by a four-category discrete approximation to the continuous gamma distribution, uniformly distributed ∈ [0.1, 50.0], and with automatic estimation of the shape parameter α. For further details see Beiko et al. (
<xref rid="b2" ref-type="bibr">2005b</xref>
; supplementary material).</p>
<p>The phylogenetic distance between two taxa is a major factor that determines accuracy of tree reconstruction. Therefore, one goal in constructing a reference data set for our purposes is to obtain subsets of trees that allow us to test methods of interest on a variety of phylogenetic distances. A second goal is to contrast the behavior of methods on distinct subsets.</p>
<p>We first filtered trees and their corresponding alignments depending on the presence of certain deep phylogenetic branches (DPB) with
<italic>PP</italic>
≥ 0.95. This threshold was chosen to ensure that we draw conclusions only from highly supported bipartitions; as a consequence, reference trees may be multifurcating. In a second step, we further grouped the data into subsets by a measure of distance between clades as follows. A branch bipartitions a set of taxa into two groups; for each taxon of the first group we estimated its phylogenetic distance to every taxon in the second group. We then calculated the mean of these values and their standard deviation. The mean is an estimate of the distance between the two partitions, and we used it and the standard deviation (
<italic>SD</italic>
) to establish two filter criteria: one labeled “short” with mean ∈ [0.5, 1.0] and
<italic>SD</italic>
≤ 0.5, and one labeled “long” with mean ∈ [2.5, 3.5] and
<italic>SD</italic>
≤ 0.5 where the units are substitutions per site. For brevity, we refer to the distance thus defined simply as the DPB distance.</p>
<p>The deep phylogenetic branches mentioned previously are as follows: the branch separating Bacteria and Archaea; the branches that separate the phyla Proteobacteria, low-G+C Firmicutes, high-G+C Firmicutes, Chlamydiales, Cyanobacteria, Crenarchaeota, and Eury-archaeota from other phyla; the branches that separate the α, β, γ, and ɛ divisions of the Proteobacteria; and the branches that separate the Clostridia, Mollicutes, Bacilli, Staphylococci, and Lactobacilli divisions of the low-G+C Firmicutes. All chosen phyla/divisons contain four or more taxa in the MRP supertree (matrix representation with parsimony;
<xref rid="b2" ref-type="bibr">Beiko et al., 2005b</xref>
, Figure 6) at a PP threshold of 0.95. Phyla consisting of three or fewer taxa were not included.</p>
<p>The filter criterion on deep branches may lead to repeated inclusion of the same data in subsets. In order to ensure independence, we removed duplicates so that no data were used twice. Additionally, we applied the following criteria to select the most reliable data. We require the mean sequence length to be ≥ 200 amino acids and we require that GBLOCKS retains ≥ 90% of the alignment. We have two filter criteria depending on the number of taxa: between 4 and 8, inclusive, and between 12 and 20, inclusive. Taken together, this creates four subsets of reference alignments and trees: “few-short,” “few-long,” “many-short” and “many-long,” where few/many refers to the number of taxa and short/long to the DPB distance. These subsets are abbreviated as F-S, F-L, M-S, and M-L, respectively. They comprise 50, 52, 80, and 38 alignments and trees; for the first subset we randomly sampled 50 out of 195 filtered elements. The choice of filter criteria on number of taxa and DPB distance yields subsets that are sufficiently distinct for our purposes.</p>
</sec>
</sec>
<sec>
<title>Evaluation Setup</title>
<p>All distance-based methods tested here were given either the unaligned sequences or the
<italic>K</italic>
-mers occurring in them; where possible, word-based methods were benchmarked with values for
<italic>k</italic>
ranging from 1 to 9; for
<italic>B-bin</italic>
the minimally tested value was
<italic>k</italic>
= 2, and for
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
it was
<italic>k</italic>
= 3. The resulting test distances were used to infer neighbor-joining (
<xref rid="b33" ref-type="bibr">Saitou and Nei, 1987</xref>
) trees. As described above, the phylogenetic information of
<italic>K</italic>
-mers inferred using a Bayesian analysis was summarized with the extended 50% majority rule. Phylogenetic accuracy was measured differently depending on the data set. In each case, we computed the topological difference between a test tree and its corresponding reference tree.</p>
<p>For synthetic data, we used the Robinson-Foulds (RF
<xref rid="b31" ref-type="bibr">Robinson and Foulds, 1981</xref>
) tree topology metric as a measure of phylogenetic accuracy. Differences in rank sums between methods were assessed for statistical significance by the Friedman test (corrected for tied ranks; here,
<italic>N</italic>
= 700 and
<italic>k</italic>
= 22), followed by Tukey-style post hoc comparisons if a significant difference was found at or beyond the α = 0.05 level (see, e.g.,
<xref rid="b47" ref-type="bibr">Zar, 1999</xref>
).</p>
<p>For empirical data we employed two measures: (a) the false-negative count of bipartitions (FN), telling us whether reference tree bipartitions were reconstructed or not; and (b) a one-element subset of FN that considers only the reconstruction of a DPB (as described above). We analyzed the influence of alphabet and tree topology measure for each reference set; to this end, we obtained total rank sums over all methods for each alphabet and under each measure. Statistical significance of differences was assessed using χ
<sup>2</sup>
-tests (corrected for continuity) on 2 × 2 contingency tables (
<italic>df</italic>
= 1) where row number indicates the alphabet and column number indicates tree topology measure. The column totals were fixed (at 210 in analyses of individual reference sets and at 840 in the pooled analysis); thus the tables correspond to binomial comparative trials (category 2 in
<xref rid="b47" ref-type="bibr">Zar, 1999</xref>
).</p>
</sec>
</sec>
<sec>
<title>Results and Discussion</title>
<sec>
<title>Evaluation of Alignment-Free Methods</title>
<p>We created three different synthetic data sets, each consisting of seven reference sets with increasing phylogenetic distances; any given reference set in turn contains 100 reference tree and sequence sets. The first data set serves as a control, the second tests the influence of high among-site rate variation (ASRV), and the third tests the influence of sequence length (short-sequences). We tested the methods either on the original amino acid sequences (alphabet AA) or on the sequences encoded using chemical equivalence classes (CE); we also varied word length
<italic>k</italic>
where possible. Neighbor-joining (
<xref rid="b33" ref-type="bibr">Saitou and Nei, 1987</xref>
) trees inferred from resulting phylogenetic distances were compared to reference trees using the Robinson-Foulds (RF;
<xref rid="b31" ref-type="bibr">Robinson and Foulds, 1981</xref>
) tree topology metric; in case of
<italic>B-bin</italic>
, we compared consensus trees.</p>
<p>The main results of this paper are contained in
<xref rid="tbl1" ref-type="table">Tables 1</xref>
,
<xref rid="tbl2" ref-type="table">2</xref>
, and
<xref rid="tbl7" ref-type="table">A1</xref>
where we show the phylogenetic accuracy as measured by the RF distance of all tested methods on the synthetic data sets (control, ASRV, and short-sequences). The use of bifurcating eight-taxon trees in our synthetic data sets implies five possible values for each RF distance (0.0, 0.2, …, 1.0); therefore, all values in these tables end with an even digit. For each word-based method, we show the best performing word length
<italic>k</italic>
for alphabet AA and for alphabet CE (method
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
accepts only
<italic>k</italic>
= 1 when using conventional similarity matrices, and we test it only on AA). We find that the value of parameter
<italic>k</italic>
for each combination of method and alphabet is stable across all three data sets. The only exception is
<italic>B-bin</italic>
with CE; on the control set,
<italic>k</italic>
= 5 performs somewhat better (with mean RF distances of 0.074, 0.150, 0.314, 0.498, 0.678, 0.810, 0.872). However,
<italic>k</italic>
= 4 proved superior on the ASRV and short-sequences data sets and is therefore also included in
<xref rid="tbl1" ref-type="table">Table 1</xref>
. Performance was compared by considering the rank sums over all 700 RF distances; lower rank sums equate to lower overall RF distances and hence higher phylogenetic accuracy. The order of methods in the aforementioned tables is based on these rank sums, and we list all pairwise combinations of methods whose differences in rank sums are deemed statistically significant.</p>
<p>First, we analyze the ranking of alignment-free methods in the control data set; rank sums range from 3228.0 for
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
to 12,599.0 for
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
, an almost fourfold difference. In decreasing order, we find that
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
with CE have similar rank sums (4285.0 and 4483.5), followed by
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
with AA (5374.0 and 5650.5). Then, 14 methods with rank sums from 8127.5 to 9192.5 ensue, separated from each other by values < 200. Two variants of method
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
rank third last and second last with 10,286.0 and 10,851.0. On the ASRV data set, rank sums range from 4571.5 for
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
to 14,411.0 for
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
, a difference slightly more than threefold, indicating that phylogenetic accuracy differs less markedly. In particular,
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
with AA follow more closely with rank sums of 4958.5 and 5121.5, as do
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
with CE (rank sums: 5647.5 and 5647.5). Then, the same 14 methods as in the control data follow (in different order) with rank sums ranging from 7329.5 to 9119.5. This constitutes a difference of 1790.0 (up from 1065.0 for the control data) and is consequently reflected in large differences between some methods. Again, two variants of method
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
rank third last and second last (rank sums: 10,511.0 and 11,216.5). Finally, we observe a distribution of rank sums and spacing of differences in the short sequences data set that is similar to what we find in the control data set.</p>
<p>All variants of the pattern-based method, under both alphabets, are significantly more accurate than any other alignment-free method (
<xref rid="tbl1" ref-type="table">Tables 1</xref>
,
<xref rid="tbl2" ref-type="table">2</xref>
, and
<xref rid="tbl7" ref-type="table">A1</xref>
), including the Bayesian phylogenetic inference from
<italic>K</italic>
-mers with a binary encoding (
<italic>B-bin</italic>
). For the control and short-sequences data sets, most alignment-free methods are only significantly better performing than
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
(under both alphabets) but are statistically indistinguishable from each other. Thus, the best performing variant of
<italic>B-bin</italic>
is on par with established alignment-free methods. It also means that the relative ranking of individual alignment-free methods is largely without consequences. The situation changes slightly for the ASRV data set: a few subgroups can be recognized. However, the best subgroup (consisting of methods 6 to 8) remains statistically indistinguishable from the best performing variant of
<italic>B-bin</italic>
.</p>
<p>We find that
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
always ranks higher than
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
using the same alphabet, though their difference in rank sums is not significant as tested here. This latter variant results in higher RF distances for most but not all reference sets. The absolute difference does not exceed 0.050 (using AA on set 6 of short-sequences data), and the relative difference is limited by 23.5% (using CE on set 2 of control data). Therefore, if one is willing to accept the overall decrease in phylogenetic accuracy (its accuracy is still significantly higher than that of any remaining alignment-free method), one can take advantage of the considerable speed-up in running time of pattern-based distance calculation (see Speeding Up Pattern-Based Distance Calculation) and hence, tree reconstruction. Also, if one has prior knowledge about the sequences under consideration, it is possible to replace the all-purpose BLOSUM62 matrix we used by a matrix that better reflects the phylogenetic distances among these sequences.</p>
<p>In-Depth Analysis of Tree Reconstruction Accuracy Using Synthetic Data in the
<xref ref-type="app" rid="app1">Appendix</xref>
shows that nearly all alignment-free methods yield an increased overall tree reconstruction accuracy in the presence of high among-site rate variation (stemming from a pronounced increase for medium to high phylogenetic distances).
<xref ref-type="fig" rid="fig1">Figure 1</xref>
visualizes this increase: we show parts of the RF landscape for the newly introduced method
<italic>B-bin</italic>
. That is, we plot the RF distance for
<italic>B-bin</italic>
on the
<italic>y</italic>
-axis with the
<italic>x</italic>
-axis showing values for all tested word lengths
<italic>k</italic>
. Each of the six subfigures contains two curves: one resulting from the use of alphabet AA, the other from the use of alphabet CE. Measurements were obtained from reference sets 2, 4, and 6 of two different data sets:
<xref ref-type="fig" rid="fig1">Figures 1a</xref>
,
<xref ref-type="fig" rid="fig1">1c</xref>
,
<xref ref-type="fig" rid="fig1">1e</xref>
corresponds to the control data set and
<xref ref-type="fig" rid="fig1">Figures 1b</xref>
,
<xref ref-type="fig" rid="fig1">1d</xref>
,
<xref ref-type="fig" rid="fig1">1f</xref>
to the ASRV data set. Comparison of the left and right panels reveals that presence of high ASRV leads to lower RF distances for the optimal word length under each alphabet. Additionally, we see that higher, and therefore suboptimal, word lengths benefit from the presence of alphabet CE. RF distances from CE sequences do not degrade as quickly with increasing values for
<italic>k</italic>
as they do for AA sequences.</p>
<p>
<xref ref-type="fig" rid="fig2">Figures 2a</xref>
,
<xref ref-type="fig" rid="fig2">2b</xref>
visualizes the average RF distance of several important groups found by our analysis of the data in
<xref rid="tbl1" ref-type="table">Tables 1</xref>
and
<xref rid="tbl2" ref-type="table">2</xref>
(the graph for
<xref rid="tbl7" ref-type="table">Table A1</xref>
is very similar to
<xref ref-type="fig" rid="fig2">Figure 2a</xref>
and omitted here). We show the average RF distance on each of the seven reference sets for six selected methods. Their rank (column “No.” in the corresponding table), and hence their parametrization, is given in parentheses (when a method ranks consistently across the two tables). The methods are the ML distance estimate based on correct alignments,
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
(rank 1); the best performing pattern-based method,
<italic>d</italic>
<sup>
<italic>PB</italic>
</sup>
(rank 2); the best performing word-based method and the best performing alignment-free method not based on words; the best performing composition distance,
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
(rank 20); the W-metric,
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
(rank 22). Note that the two methods ranking 6th and 20th span an interval that encompasses most methods. Hence, these two methods serve to summarize and visualize the performance of all methods thus “contained.” Comparing
<xref ref-type="fig" rid="fig2">Figures 2a</xref>
with
<xref ref-type="fig" rid="fig2">2b</xref>
we see the extent to which most alignment-free methods (apart from
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
) show increased phylogenetic accuracy, corresponding to a reduced RF distance in the presence of high among-site rate variation (especially for medium to high phylogenetic distances). Notice also how the curve for
<italic>d</italic>
<sup>
<italic>PB</italic>
</sup>
closely follows that of
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
(
<xref ref-type="fig" rid="fig2">Figure 2b</xref>
).</p>
<fig id="fig2" orientation="portrait" position="float">
<label>Figure 2</label>
<caption>
<p>Average RF distance for six methods. Average RF distance (
<italic>y</italic>
-axis) for six selected methods on all seven reference sets (
<italic>x</italic>
-axis) of two synthetic data sets (a: control; b: ASRV). For each data set, we show (1) the ML distance estimate based on correct alignments, (2) the best pattern-based variant, (3 and 4) the best word-based method and the best method not based on words, 5) the best composition distance; and (6) the W-metric; the numbers in the inserted legends refer to the far left-hand column of
<xref rid="tbl1" ref-type="table">Tables 1</xref>
(
<xref ref-type="fig" rid="fig2">Figure 2a</xref>
) and
<xref rid="tbl2" ref-type="table">2</xref>
(
<xref ref-type="fig" rid="fig2">Figure 2b</xref>
) respectively.</p>
</caption>
<graphic xlink:href="56-2-206-f2"></graphic>
</fig>
</sec>
<sec>
<title>Analysis Using the Putative Orthologs Data Set</title>
<p>Here, we look at the phylogenetic accuracy of alignment-free methods on a smaller data set of empirical sequences; its creation is described in detail in Methods. There are four putative orthologs reference sets, labeled “few-short” (F-S), “few-long” (F-L), “many-short” (M-S), and “many-long” (M-L), where few/many indicates the number of taxa, and short/long indicates the DPB distance. As with the synthetic data sets, we tested alignment-free methods on alphabets AA and CE and varied the parameter
<italic>k</italic>
of word-based methods. Neighbor-joining or consensus trees were compared to reference trees using two measures of tree topology as explained below.</p>
<p>
<xref rid="tbl3" ref-type="table">Table 3</xref>
shows the accuracy of alignment-free methods as measured by the normalized false-negative count (FN) for all four putative orthologs reference sets. For ease of presentation, the actual numerical values obtained by FN are multiplied by a factor of 10 and rounded to three decimal places. We included all methods apart from
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
;
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
would show up as the worst method, similarly as in the previous section. For word-based methods, we analyzed how their accuracy depends on parameter
<italic>k</italic>
and included the best performing word length for each alphabet as judged by their rank over all four reference sets when comparing all parametrizations of all methods. The ranks were calculated from the average accuracy on each set: this avoids bias due to different set sizes.</p>
<table-wrap id="tbl3" orientation="portrait" position="float">
<label>Table 3.</label>
<caption>
<p>FN distance (× 10) for putative orthologs data set. Average FN distance (multiplied by 10) for each reference set of the putative orthologs data set. For word-based methods, we show the best performing word length
<italic>k</italic>
for each alphabet
<italic>A</italic>
. Methods are ordered according to their rank sums ∑
<sub>R</sub>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="4" rowspan="1">Reference set</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="4" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in1.jpg"></inline-graphic>
</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">F-S</th>
<th align="center" colspan="1" rowspan="1">F-L</th>
<th align="center" colspan="1" rowspan="1">M-S</th>
<th align="center" colspan="1" rowspan="1">M-L</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">15.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.607</td>
<td align="center" colspan="1" rowspan="1">0.272</td>
<td align="center" colspan="1" rowspan="1">0.735</td>
<td align="center" colspan="1" rowspan="1">0.866</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">16.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.536</td>
<td align="center" colspan="1" rowspan="1">0.272</td>
<td align="center" colspan="1" rowspan="1">0.837</td>
<td align="center" colspan="1" rowspan="1">0.984</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">18.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.473</td>
<td align="center" colspan="1" rowspan="1">0.167</td>
<td align="center" colspan="1" rowspan="1">0.937</td>
<td align="center" colspan="1" rowspan="1">1.252</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">22.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.580</td>
<td align="center" colspan="1" rowspan="1">0.304</td>
<td align="center" colspan="1" rowspan="1">0.840</td>
<td align="center" colspan="1" rowspan="1">1.042</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">23.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.533</td>
<td align="center" colspan="1" rowspan="1">0.272</td>
<td align="center" colspan="1" rowspan="1">0.754</td>
<td align="center" colspan="1" rowspan="1">1.337</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">27.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.650</td>
<td align="center" colspan="1" rowspan="1">0.385</td>
<td align="center" colspan="1" rowspan="1">0.712</td>
<td align="center" colspan="1" rowspan="1">1.053</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7.5</td>
<td align="center" colspan="1" rowspan="1">37.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.713</td>
<td align="center" colspan="1" rowspan="1">0.353</td>
<td align="center" colspan="1" rowspan="1">0.880</td>
<td align="center" colspan="1" rowspan="1">1.182</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7.5</td>
<td align="center" colspan="1" rowspan="1">37.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">0.657</td>
<td align="center" colspan="1" rowspan="1">0.256</td>
<td align="center" colspan="1" rowspan="1">1.022</td>
<td align="center" colspan="1" rowspan="1">1.337</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="center" colspan="1" rowspan="1">38.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.533</td>
<td align="center" colspan="1" rowspan="1">0.272</td>
<td align="center" colspan="1" rowspan="1">1.338</td>
<td align="center" colspan="1" rowspan="1">1.393</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="center" colspan="1" rowspan="1">44.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.763</td>
<td align="center" colspan="1" rowspan="1">0.423</td>
<td align="center" colspan="1" rowspan="1">0.897</td>
<td align="center" colspan="1" rowspan="1">1.074</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="center" colspan="1" rowspan="1">45.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.747</td>
<td align="center" colspan="1" rowspan="1">0.337</td>
<td align="center" colspan="1" rowspan="1">0.869</td>
<td align="center" colspan="1" rowspan="1">1.402</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12.5</td>
<td align="center" colspan="1" rowspan="1">47.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.697</td>
<td align="center" colspan="1" rowspan="1">0.449</td>
<td align="center" colspan="1" rowspan="1">0.998</td>
<td align="center" colspan="1" rowspan="1">1.259</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12.5</td>
<td align="center" colspan="1" rowspan="1">47.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.833</td>
<td align="center" colspan="1" rowspan="1">0.176</td>
<td align="center" colspan="1" rowspan="1">1.170</td>
<td align="center" colspan="1" rowspan="1">1.344</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="center" colspan="1" rowspan="1">49.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">0.800</td>
<td align="center" colspan="1" rowspan="1">0.353</td>
<td align="center" colspan="1" rowspan="1">0.991</td>
<td align="center" colspan="1" rowspan="1">1.328</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="center" colspan="1" rowspan="1">49.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.673</td>
<td align="center" colspan="1" rowspan="1">0.337</td>
<td align="center" colspan="1" rowspan="1">1.004</td>
<td align="center" colspan="1" rowspan="1">1.465</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="center" colspan="1" rowspan="1">53.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.840</td>
<td align="center" colspan="1" rowspan="1">0.224</td>
<td align="center" colspan="1" rowspan="1">1.139</td>
<td align="center" colspan="1" rowspan="1">1.619</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">17</td>
<td align="center" colspan="1" rowspan="1">55.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.713</td>
<td align="center" colspan="1" rowspan="1">0.385</td>
<td align="center" colspan="1" rowspan="1">1.454</td>
<td align="center" colspan="1" rowspan="1">1.305</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">18</td>
<td align="center" colspan="1" rowspan="1">64.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.973</td>
<td align="center" colspan="1" rowspan="1">0.321</td>
<td align="center" colspan="1" rowspan="1">1.453</td>
<td align="center" colspan="1" rowspan="1">1.437</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19.5</td>
<td align="center" colspan="1" rowspan="1">75.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.847</td>
<td align="center" colspan="1" rowspan="1">0.978</td>
<td align="center" colspan="1" rowspan="1">1.413</td>
<td align="center" colspan="1" rowspan="1">2.374</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19.5</td>
<td align="center" colspan="1" rowspan="1">75.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.807</td>
<td align="center" colspan="1" rowspan="1">1.346</td>
<td align="center" colspan="1" rowspan="1">1.832</td>
<td align="center" colspan="1" rowspan="1">2.183</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Similarly,
<xref rid="tbl8" ref-type="table">Table A2</xref>
shows the accuracy as measured by DPB. This measure considers one phylogenetic branch in each set of sequences; we present the total number of unrecovered branches for each reference set and indicate the maximal possible number by showing the size of each reference set. We used the same word lengths as in
<xref rid="tbl3" ref-type="table">Table 3</xref>
; optimizing parameter
<italic>k</italic>
for DPB yields mostly identical values. The notable exception is
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
where the optimal word length for CE is
<italic>k</italic>
= 5. This would result in
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
with CE obtaining rank 8 as opposed to 16.5 for
<italic>k</italic>
= 4. Method
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
with AA is ranked slightly higher overall when
<italic>k</italic>
= 3 instead of
<italic>k</italic>
= 4; however, this difference is inconsequential when considering only the best word lengths as in
<xref rid="tbl8" ref-type="table">Table A2</xref>
.</p>
<p>The best performing word lengths from
<xref rid="tbl3" ref-type="table">Tables 3</xref>
and
<xref rid="tbl8" ref-type="table">A2</xref>
are either identical to those determined in our previous analysis or vary by at most one. All word lengths that are optimal over any of the three synthetic and one empirical data sets are limited to values ranging from 3 to 6, with 3 obtained only on AA encoded sequences and 6 only on CE. This agreement is perhaps surprising, given the use of different data sets, tree topology measures, and word-based methods. Although it remains impossible to know the best parameter setting for a particular word-based method on every data set, our finding suggests that in practice,
<italic>k</italic>
can be set to 3–6, or even 4–5, with acceptable results over a wide range of data sets.</p>
<p>The rank order of alignment-free methods in
<xref rid="tbl3" ref-type="table">Tables 3</xref>
and
<xref rid="tbl8" ref-type="table">A2</xref>
agrees to a large extent with what we found based on RF distances for synthetic data. Variants of the pattern-based approach constitute the best performing alignment-free methods (when using alphabet CE), whereas differently parameterized composition distances perform worst. Between these two groups a few more groups are placed, recognizable by difference in their rank sums. Note that in contrast to the analysis of synthetic data, we do not attempt to attach statistical significance to these differences. Also apparent from
<xref rid="tbl3" ref-type="table">Tables 3</xref>
and
<xref rid="tbl8" ref-type="table">A2</xref>
is that the best performing variant of method
<italic>B-bin</italic>
does not improve on previously established, distance-based methods. Overall, we find that the general conclusions drawn from synthetic data about the performance of alignment-free methods relative to each other also hold for empirical data. Furthermore, this data set incorporates reference sets with up to 20 sequences, compared to 8 sequences for synthetic data. Thus, our results are not bound to data sets with a particular number of sequences.</p>
<p>In-Depth Analysis of Alphabets Using Empirical Data in the appendix shows results that are consistent with the following hypothesis. Encoding sequences with alphabet CE improves the reconstruction accuracy of long branches over the use of original sequences. At the same time, alphabet CE negatively affects the reconstruction accuracy of short branches. To see this, consider how the impact on reconstruction accuracy is picked up by the two measures. The FN count treats each branch equally, whereas the DPB count is an extreme form of a weighted variant of FN. One branch receives weight 1 (the deep phylogenetic branch of interest; see Methods), whereas all other branches contribute nothing by setting their weight to 0. Thus, for a given reference set, improvements under measure DPB reflect a better ability to correctly reconstruct branches that separate various phyla and divisions. The data show that under measure FN, alphabet AA is better than CE in three out of four cases, whereas the situation is reversed under measure DPB: alphabet CE yields lower rank sums than AA in three out of four cases. Exceptions to the overall behavior are found for reference sets with few taxa. The lower number of taxa, and hence branches, means that any influence of the alphabet on phylogenetic accuracy for certain data, as reflected in a particular measure, will show up more strongly. On reference set F-S, measure DPB shows lower rank sums for AA and higher rank sums for CE sequences relative to the overall levels. This agrees with the hypothesis that for small branch lengths, alphabet AA is the better choice than CE. On reference set F-L, measure FN yields lower rank sums for alphabet CE than for AA relative to the overall levels. Thus, the improvement in reconstruction accuracy provided by encoding sequences with CE is evident even when we consider all branches, as this reference set is dominated by long branches.</p>
</sec>
<sec>
<title>Speeding Up Pattern-Based Distance Calculation</title>
<p>In this section, we present time measurements of pattern-based distance calculation on the synthetic data sets. The measurements were conducted on a 64-bit 2.4-GHz x86-compatible Intel processor. Furthermore, we show a speed-up of an order of magnitude obtained by replacing
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
by variant
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
.</p>
<p>Pattern-based distance calculation consists of two main steps: pattern discovery and the actual distance calculation from these patterns. The duration of the distance calculation step is largely dependent on the amount of pattern data that is generated in the pattern discovery step, as well as on the number of residue pairings described by these data. Durations of both steps need to be added, yielding the total computation time; here, they are considered separately for benchmarking purposes.</p>
<p>The duration of the pattern discovery step is determined by two major factors (for any fixed set of TEIRESIAS parameters): the amount of input data and the choice of alphabet. Additionally, sequence similarity influences running time, although to a lesser degree for the range of phylogenetic distances represented by reference sets 1 and 7 of the synthetic data sets and presence/absence of ASRV considered here. For these reference sets, consisting of 100 sequence sets each, and hence 100 computations of distances, we show the average running time of a single computation in seconds. Under otherwise identical conditions, reducing the sequence length from 1000 to 300 amino acids reduces computation time for alphabet CE from 101.3 to 8.76 and from 72.5 to 6.48 (sets 1 and 7). Similarly for alphabet AA, computation time is reduced from 25.3 to 5.47 and from 7.80 to 1.09. These numbers also show the effect of alphabet choice on running time: using CE instead of AA increases time by as much as a factor of 9.3. Furthermore, running time and phylogenetic distance are inversely correlated. The short-sequences data exhibit the largest sensitivity to phylogenetic distance: for AA-encoded sequences, reducing phylogenetic distance increases time by a factor of 5.0. In the presence of high ASRV, computation time increases somewhat with respect to the control data to 113.9 and 87.4 for CE and to 26.4 and 11.5 for AA.</p>
<p>
<xref rid="tbl4" ref-type="table">Table 4</xref>
shows the duration of the distance calculation step: it is apparent that, as before, using alphabet CE instead of AA increases running time by an order of magnitude. Unlike before, however, the presence of ASRV and a change in phylogenetic distance leads to changes in running time (under both alphabets) that are not easily summarized. The absolute time (in seconds) for method
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
varies from 864 to 1172 for CE sequences of length 1000 and can be as low as 3.24 for AA sequences of length 300. When we calculate distances using variant
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
, we find that we obtain speed-ups of 8.3 to 16.8. This is an order of magnitude faster and brings the absolute time down to between 63.4 and 103.6 for CE sequences of length 1000; it can yield computation times as short as 0.39 s for AA sequences of 300 amino acids. It seems quite likely that a further speed-up can be achieved through a reimplementation of
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
: variant
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
spends most of its computation time in the optimized C implementation of Protdist, whereas
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
, on the other hand, is written entirely in Python and thus leaves room for an additional performance gain.</p>
<table-wrap id="tbl4" orientation="portrait" position="float">
<label>Table 4.</label>
<caption>
<p>Duration of distance calculation. Duration of the distance calculation step for two variants of the pattern-based method (
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
,
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
). We present the time (measured in seconds) averaged over 100 sets of sequences in any given reference set (sets with the lowest/ highest phylogenetic distances from the synthetic data sets are used) encoded using two alphabets
<italic>A</italic>
. The hardware consisted of a 64-bit 2.4-GHz x86-compatible Intel processor.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="2" rowspan="1">Control</th>
<th align="center" colspan="2" rowspan="1">ASRV</th>
<th align="center" colspan="1" rowspan="1">Short-sequences</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="2" rowspan="1">
<hr></hr>
</th>
<th colspan="2" rowspan="1">
<hr></hr>
</th>
<th colspan="1" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">Set 1</th>
<th align="center" colspan="1" rowspan="1">Set 7</th>
<th align="center" colspan="1" rowspan="1">Set 1</th>
<th align="center" colspan="1" rowspan="1">Set 7</th>
<th align="center" colspan="1" rowspan="1">Set 1</th>
<th align="center" colspan="1" rowspan="1">Set 7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">1084</td>
<td align="center" colspan="1" rowspan="1">1045</td>
<td align="center" colspan="1" rowspan="1">864</td>
<td align="center" colspan="1" rowspan="1">1172</td>
<td align="char" char="." colspan="1" rowspan="1">103.7</td>
<td align="char" char="." colspan="1" rowspan="1">87.3</td>
</tr>
<tr>
<td align="center" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">81.1</td>
<td align="center" colspan="1" rowspan="1">97.3</td>
<td align="center" colspan="1" rowspan="1">63.4</td>
<td align="center" colspan="1" rowspan="1">103.6</td>
<td align="char" char="." colspan="1" rowspan="1">7.10</td>
<td align="char" char="." colspan="1" rowspan="1">7.46</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">76.2</td>
<td align="center" colspan="1" rowspan="1">36.0</td>
<td align="center" colspan="1" rowspan="1">97.2</td>
<td align="center" colspan="1" rowspan="1">68.9</td>
<td align="char" char="." colspan="1" rowspan="1">11.77</td>
<td align="char" char="." colspan="1" rowspan="1">3.24</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">5.33</td>
<td align="center" colspan="1" rowspan="1">3.62</td>
<td align="center" colspan="1" rowspan="1">5.77</td>
<td align="center" colspan="1" rowspan="1">5.26</td>
<td align="char" char="." colspan="1" rowspan="1">0.79</td>
<td align="char" char="." colspan="1" rowspan="1">0.39</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Word-Based Bayesian Phylogenetic Inference: Analysis of Convergence and Burn-In</title>
<p>MrBayes estimates PPs of bipartitions by sampling from the phylogenetic tree distribution using Markov chain Monte Carlo (MCMC). To get an accurate estimate of PPs, one needs to sample from the chains after they have reached stationarity. Thus, the first
<italic>N</italic>
samples are discarded; they constitute the so-called burn-in phase. There are two aspects to this problem: determining convergence of chains and determining the extent of the burn-in phase. We note that in practice, it is easier to rule out convergence than to confirm it (
<xref rid="b7" ref-type="bibr">Cowles and Carlin, 1996</xref>
). For solving the second aspect, a variety of techniques have been developed to determine how large
<italic>N</italic>
should be, given the data. We follow
<xref rid="b3" ref-type="bibr">Beiko et al. (2006)</xref>
and use their novel δ statistic, as well as their formalization of a more traditional comparison of likelihoods by eye, to deal with both problems.</p>
<p>First, we calculate the extent of the burn-in phase. We then use the samples beyond that point (with an added safety margin) and assess convergence. The end of the burn-in phase is determined as follows. We sampled every 100th generation, running two analyses in parallel (default in MrBayes V3.11) for 500,000 generations. For each analysis, the mean log-likelihood of the last 1000 samples was used to find the first generation that exceeded this threshold. The sample immediately preceding this marked the end of the burn-in phase.
<xref rid="tbl5" ref-type="table">Table 5</xref>
presents summary statistics for the best performing word length for both alphabets (
<italic>k</italic>
= 3, AA, and
<italic>k</italic>
= 4, CE), detailing which generation first exceeded the threshold. The control data required longer burn-ins than the other data, with three quarters of the burn-ins completed at or before generation 17,600. Taken over all synthetic data sets, most burn-ins (96.3%) completed at or before generation 20,000. We conservatively rounded up this value and used 100,000 generations as a global end of the burn-in phase.</p>
<table-wrap id="tbl5" orientation="portrait" position="float">
<label>Table 5.</label>
<caption>
<p>Extent of the burn-in phase. Summary of the extent of the burn-in phase (measured in samples; e.g., 100 samples correspond to 10,000 generations). We show results for the overall best performing word length
<italic>k</italic>
under each alphabet
<italic>A</italic>
for the Bayesian phylogenetic inference from
<italic>K</italic>
-mers with a binary encoding (
<italic>B-bin</italic>
).</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" colspan="1" rowspan="1">Synthetic data set</th>
<th align="left" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">Upper quartile</th>
<th align="center" colspan="1" rowspan="1">Median</th>
<th align="center" colspan="1" rowspan="1">Lower quartile</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">Control</td>
<td align="left" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">176</td>
<td align="center" colspan="1" rowspan="1">149</td>
<td align="center" colspan="1" rowspan="1">126</td>
</tr>
<tr>
<td colspan="1" rowspan="1"></td>
<td align="left" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">152</td>
<td align="center" colspan="1" rowspan="1">129</td>
<td align="center" colspan="1" rowspan="1">108</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">ASRV</td>
<td align="left" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">158</td>
<td align="center" colspan="1" rowspan="1">132</td>
<td align="center" colspan="1" rowspan="1">110</td>
</tr>
<tr>
<td colspan="1" rowspan="1"></td>
<td align="left" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">141</td>
<td align="center" colspan="1" rowspan="1">117</td>
<td align="center" colspan="1" rowspan="1">96</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">Short-</td>
<td align="left" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">119</td>
<td align="center" colspan="1" rowspan="1">99</td>
<td align="center" colspan="1" rowspan="1">82</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1"> sequences</td>
<td align="left" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">106</td>
<td align="center" colspan="1" rowspan="1">88</td>
<td align="center" colspan="1" rowspan="1">72</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For assessing whether the chains converged, we used the δ statistic. It is the accumulated difference between bipartitions of two chains or two fragments of a single chain, where each bipartition is weighted by its PP as estimated in that chain or fragment. We calculated the mean δ value of adjacent fragments from a given chain and the mean of nonadjacent fragments. Contrasting these means yields a ratio: if it is close to 1.0 (
<xref rid="b3" ref-type="bibr">Beiko et al., 2006</xref>
), we may assume that we are sampling from a stationary distribution, because in-order and out-of-order values describe a very similar distribution of bipartition probabilities. We divided each chain into eight fragments of 50,000 generations each (starting at generation 100,100).
<xref rid="tbl6" ref-type="table">Table 6</xref>
shows summary statistics of the δ ratios for the best performing word length for both alphabets (
<italic>k</italic>
= 3, AA; and
<italic>k</italic>
= 4, CE). For each data set, the majority of δ ratios is reasonably close to 1.0; thus the values are likely to indicate convergence. Furthermore, for a small subset of the data, we ran chains for 5,000,000 generations and used a burn-in phase of 1,000,000 generations. The distribution of δ ratios from eight fragments of 500,000 generations each was very similar (data not shown), providing strong evidence for nonrejection of convergence.</p>
<table-wrap id="tbl6" orientation="portrait" position="float">
<label>Table 6.</label>
<caption>
<p>Convergence measured by δ ratios. Summary of assessment of convergence for method
<italic>B-bin</italic>
as measured by δ ratios of adjacent versus nonadjacent fragments. We show results for the overall best performing word length
<italic>k</italic>
under each alphabet
<italic>A</italic>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" colspan="1" rowspan="1">Synthetic data set</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">Upper quartile</th>
<th align="center" colspan="1" rowspan="1">Median</th>
<th align="center" colspan="1" rowspan="1">Lower quartile</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">Control</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1.023</td>
<td align="center" colspan="1" rowspan="1">1.002</td>
<td align="center" colspan="1" rowspan="1">0.978</td>
</tr>
<tr>
<td colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1.027</td>
<td align="center" colspan="1" rowspan="1">1.001</td>
<td align="center" colspan="1" rowspan="1">0.976</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">ASRV</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1.029</td>
<td align="center" colspan="1" rowspan="1">1.000</td>
<td align="center" colspan="1" rowspan="1">0.971</td>
</tr>
<tr>
<td colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1.028</td>
<td align="center" colspan="1" rowspan="1">0.998</td>
<td align="center" colspan="1" rowspan="1">0.972</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">Short-</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1.021</td>
<td align="center" colspan="1" rowspan="1">1.000</td>
<td align="center" colspan="1" rowspan="1">0.980</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1"> sequences</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1.020</td>
<td align="center" colspan="1" rowspan="1">0.999</td>
<td align="center" colspan="1" rowspan="1">0.979</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec>
<title>Conclusions</title>
<p>We conducted a large-scale comparison (in a phylogenetic context) of 10 alignment-free methods, among them one new approach that does not calculate distances and a faster variant of the pattern-based approach. The synthetic data sets in this study represent a refinement to the data set used previously (
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
); we increased the number of taxa and tested two additional conditions. Furthermore, we analyzed the methods on a high-quality, well-characterized empirical data set.</p>
<p>Most alignment-free methods exhibit reduced Robinson-Foulds distances, i.e., higher phylogenetic accuracy, in the presence of high among-site rate variation (ASRV), particularly for sequence sets with medium to large phylogenetic distances. This influence of a biologically important parameter had not been recognized previously. In contrast, presence of high ASRV leads to a loss of phylogenetic accuracy observed for the (correctly and incorrectly parameterized) maximum-likelihood (ML) estimate of distances based on the correct alignment. Our finding also implies that alignment-free methods may perform better in practice than previously thought and that quite possibly other relevant parameters in phylogenetics may exert a similar influence.</p>
<p>Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. This increased discriminative power in our statistical assessment (with respect to
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
) resulted from the use of eight-taxon trees, which in turn led to fewer tied ranks on individual tree comparisons. For the same reason, the baseline method
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
was shown to be significantly better than any alignment-free method tested here, whereas previously (
<xref rid="b17" ref-type="bibr">Höhl et al., 2006</xref>
),
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
were statistically indistinguishable.</p>
<p>The high phylogenetic accuracy of the pattern-based approach comes at high computational costs compared to other alignment-free methods. We presented time measurements for the two main steps in this approach and showed that the newly introduced variant
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
reduced running time in step two by an order of magnitude, as compared to
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
. This speed-up is accompanied by a rather small loss of phylogenetic accuracy. Thus, the trade-off seems acceptable for practical use, although the resource demand is still considerably higher than that of other alignment-free methods.</p>
<p>We also introduced a method to conduct a Bayesian inference from
<italic>K</italic>
-mers, thus allowing alignment-free, word-based tree reconstruction without having to calculate distances. In our test setup, it did not improve on classical alignment-free (and distance-based) methods. This result seems surprising, and we offer two possible explanations: (a) It could be the case that the phylogenetic accuracy of Bayesian inference from
<italic>K</italic>
-mers is limited to what we measured simply because there is only limited phylogenetic information in the distribution of
<italic>K</italic>
-mers among a set of sequences. To overcome this limitation, one would need take into account different data sources. The pattern-based approach would then be an example where relying on other data—e.g., changes of amino acids in patterns—leads to increased phylogenetic accuracy. (b) It could be that utilizing the additional information inherent in the
<italic>K</italic>
-mer
<italic>count</italic>
yields increased phylogenetic accuracy. Multiple states, be they unordered or ordered, would then be appropriate in the Bayesian inference. Even if it turns out that the phylogenetic accuracy cannot be improved, Bayesian inference from
<italic>K</italic>
-mers may still offer advantages over other approaches building on
<italic>K</italic>
-mers. For example, it is possible to make use of the posterior probabilities obtained for bipartitions, construct a credible set that contains the 95% most likely trees, and obtain an ML estimate of branch lengths in the same step. None of these properties was exploited in the testing framework described in this paper.</p>
<p>One conclusion from the experiments in this study is that the optimal word length
<italic>k</italic>
of word-based methods is approximately stable across various data sets, tree topology measures, and methods. We saw that for AA sequences, the optimal values for
<italic>k</italic>
are 3–5, whereas for CE sequences, the optimal values for
<italic>k</italic>
are 4–6. Finding word lengths that are optimal under a range of phylogenetic distances is a realistic setup: especially large trees will feature long and short branch lengths to varying degrees. There, a word length that performs well on small or large distances only is not of much use.</p>
<p>Finally, we provided a detailed analysis of the trade-off between alphabet AA and CE. Encoding sequences with chemical equivalence classes increases the reconstruction accuracy of long branches, while reducing it for short branches. Not all methods seemed to benefit from the use of alphabet CE, but in our experiments, the pattern-based approach did so more often than not.</p>
<sec>
<title>Prospects for Alignment-Free Methods</title>
<p>We know that the multiple sequence alignment (MSA) problem is NP-hard, and although reasonably good heuristic solutions exist, they are still computationally expensive. In an age of phylogenomics and community genomics, data sets are ever increasing in size, making the computation of MSA and ML (both distance estimation and tree inference) often unaffordable. Alignment-free methods open up an avenue to reduce the required time complexity. In fact, they are already in use; e.g., to speed up alignment construction in MUSCLE (
<xref rid="b9" ref-type="bibr">Edgar, 2004b</xref>
).</p>
<p>Many alignment-free methods show an increased accuracy in the presence of ASRV, unlike the alignment-based ML estimate. There are potentially other biological factors like this one, and the present study represents the first step towards identifying them. One undeniable advantage is the intrinsic applicability of alignment-free methods to data sets with large-scale rearrangements.</p>
<p>Finally, the research on, and the development of, alignment-free methods is still in its infancy, holding a considerable potential for improvement, whereas MSA is rather mature.</p>
</sec>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>Thanks go to Rob G. Beiko and Jonathan M. Keith for help with their δ statistic and for providing associated scripts; thanks again to RGB for the putative orthologs data and many helpful discussions and to JMK for discussions on Bayesian analysis. Thank you to Denis Baurain for discussions on Lempel-Ziv complexity and alphabets; thank you to Tamir Tuller for kindly providing C++ implementations of
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
. ARC grant CE0348221 funded part of the research.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="b1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beiko</surname>
<given-names>R. G.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>C. X.</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<article-title>A word-oriented approach to alignment validation</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>2230</fpage>
<lpage>2239</lpage>
<pub-id pub-id-type="pmid">15728118</pub-id>
</element-citation>
</ref>
<ref id="b2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beiko</surname>
<given-names>R. G.</given-names>
</name>
<name>
<surname>Harlow</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<article-title>Highways of gene sharing in prokaryotes</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>14332</fpage>
<lpage>14337</lpage>
<pub-id pub-id-type="pmid">16176988</pub-id>
</element-citation>
</ref>
<ref id="b3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beiko</surname>
<given-names>R. G.</given-names>
</name>
<name>
<surname>Keith</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Harlow</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<article-title>Searching for convergence in phylogenetic Markov chain Monte Carlo</article-title>
<source>Syst. Biol.</source>
<year>2006</year>
<volume>55</volume>
<fpage>553</fpage>
<lpage>565</lpage>
<pub-id pub-id-type="pmid">16857650</pub-id>
</element-citation>
</ref>
<ref id="b4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blaisdell</surname>
<given-names>B. E.</given-names>
</name>
</person-group>
<article-title>A measure of the similarity of sets of sequences not requiring sequence alignment</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>1986</year>
<volume>83</volume>
<fpage>5155</fpage>
<lpage>5159</lpage>
<pub-id pub-id-type="pmid">3460087</pub-id>
</element-citation>
</ref>
<ref id="b5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castresana</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis</article-title>
<source>Mol. Biol. Evol.</source>
<year>2000</year>
<volume>17</volume>
<fpage>540</fpage>
<lpage>552</lpage>
<pub-id pub-id-type="pmid">10742046</pub-id>
</element-citation>
</ref>
<ref id="b6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chu</surname>
<given-names>K. H.</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Z.-G.</given-names>
</name>
<name>
<surname>Anh</surname>
<given-names>V.</given-names>
</name>
</person-group>
<article-title>Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes</article-title>
<source>Mol. Biol. Evol.</source>
<year>2004</year>
<volume>21</volume>
<fpage>200</fpage>
<lpage>206</lpage>
<pub-id pub-id-type="pmid">14595102</pub-id>
</element-citation>
</ref>
<ref id="b7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cowles</surname>
<given-names>M. K.</given-names>
</name>
<name>
<surname>Carlin</surname>
<given-names>B. P.</given-names>
</name>
</person-group>
<article-title>Markov chain Monte Carlo convergence diagnostics: A comparative review</article-title>
<source>J. Am. Stat. Assoc.</source>
<year>1996</year>
<volume>91</volume>
<fpage>883</fpage>
<lpage>904</lpage>
</element-citation>
</ref>
<ref id="b8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>R. C.</given-names>
</name>
</person-group>
<article-title>Local homology recognition and distance measures in linear time using compressed amino acid alphabets</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>32</volume>
<fpage>380</fpage>
<lpage>385</lpage>
</element-citation>
</ref>
<ref id="b9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>R. C.</given-names>
</name>
</person-group>
<article-title>MUSCLE: Multiple sequence alignment with high accuracy and high throughput</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>1792</fpage>
<lpage>1797</lpage>
<pub-id pub-id-type="pmid">15034147</pub-id>
</element-citation>
</ref>
<ref id="b10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Felsenstein</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Phylogenies from restriction sites: A maximum-likelihood approach</article-title>
<source>Evolution</source>
<year>1992</year>
<volume>46</volume>
<fpage>159</fpage>
<lpage>173</lpage>
<pub-id pub-id-type="pmid">28564959</pub-id>
</element-citation>
</ref>
<ref id="b11">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Felsenstein</surname>
<given-names>J.</given-names>
</name>
</person-group>
<source>PHYLIP (phylogeny inference package), version 3.65</source>
<year>2005</year>
<publisher-loc>Seattle</publisher-loc>
<publisher-name>Department of Genome Sciences, University of Washington</publisher-name>
<comment>Distributed by the author</comment>
</element-citation>
</ref>
<ref id="b12">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Gelman</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Carlin</surname>
<given-names>J. B.</given-names>
</name>
<name>
<surname>Stern</surname>
<given-names>H. S.</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>D. B.</given-names>
</name>
</person-group>
<source>Bayesian data analysis</source>
<year>2004</year>
<edition>2nd edition</edition>
<publisher-loc>Boca Raton, Florida</publisher-loc>
<publisher-name>Chapman & Hall/CRC</publisher-name>
</element-citation>
</ref>
<ref id="b13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hall</surname>
<given-names>B. G.</given-names>
</name>
</person-group>
<article-title>Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences</article-title>
<source>Mol. Biol. Evol.</source>
<year>2005</year>
<volume>22</volume>
<fpage>792</fpage>
<lpage>802</lpage>
<pub-id pub-id-type="pmid">15590907</pub-id>
</element-citation>
</ref>
<ref id="b14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hao</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Prokaryote phylogeny without sequence alignment: From avoidance signature to composition distance</article-title>
<source>J. Bioinformat. Comput. Biol.</source>
<year>2004</year>
<volume>2</volume>
<fpage>1</fpage>
<lpage>19</lpage>
</element-citation>
</ref>
<ref id="b15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harlow</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Gogarten</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<article-title>A hybrid clustering approach to recognition of protein families in 114 microbial genomes</article-title>
<source>BMC Bioinformat.</source>
<year>2004</year>
<volume>5</volume>
<fpage>45</fpage>
</element-citation>
</ref>
<ref id="b16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Henikoff</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Henikoff</surname>
<given-names>J. G.</given-names>
</name>
</person-group>
<article-title>Amino acid substitution matrices from protein blocks</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>1992</year>
<volume>89</volume>
<fpage>10915</fpage>
<lpage>10919</lpage>
<pub-id pub-id-type="pmid">1438297</pub-id>
</element-citation>
</ref>
<ref id="b17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Höhl</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rigoutsos</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Ragan</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<article-title>Pattern-based phylogenetic distance estimation and tree reconstruction</article-title>
<source>Evol. Bioinf. Online</source>
<year>2006</year>
<volume>2</volume>
<fpage>357</fpage>
<lpage>373</lpage>
<comment>An earlier version is available from
<ext-link ext-link-type="uri" xlink:href="arXiv:q-bio.QM/0605002">arXiv:q-bio.QM/0605002</ext-link>
</comment>
</element-citation>
</ref>
<ref id="b18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huelsenbeck</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Ronquist</surname>
<given-names>F.</given-names>
</name>
</person-group>
<article-title>MRBAYES: Bayesian inference of phylogenetic trees</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>754</fpage>
<lpage>755</lpage>
<pub-id pub-id-type="pmid">11524383</pub-id>
</element-citation>
</ref>
<ref id="b19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jones</surname>
<given-names>D. T.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>W. R.</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>J. M.</given-names>
</name>
</person-group>
<article-title>The rapid generation of mutation data matrices from protein sequences</article-title>
<source>Comput. Appl. Biosci.</source>
<year>1992</year>
<volume>8</volume>
<fpage>275</fpage>
<lpage>282</lpage>
<pub-id pub-id-type="pmid">1633570</pub-id>
</element-citation>
</ref>
<ref id="b20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lempel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ziv</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>On the complexity of finite sequences</article-title>
<source>IEEE Trans. Inform. Theory</source>
<year>1976</year>
<volume>IT-22</volume>
<fpage>75</fpage>
<lpage>81</lpage>
</element-citation>
</ref>
<ref id="b21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>P. O.</given-names>
</name>
</person-group>
<article-title>A likelihood approach to estimating phylogeny from discrete morphological character data</article-title>
<source>Syst. Biol.</source>
<year>2001</year>
<volume>50</volume>
<fpage>913</fpage>
<lpage>925</lpage>
<pub-id pub-id-type="pmid">12116640</pub-id>
</element-citation>
</ref>
<ref id="b22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Badger</surname>
<given-names>J. H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Kwong</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kearney</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>An information-based sequence distance and its application to whole mitochondrial genome phylogeny</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>149</fpage>
<lpage>154</lpage>
<pub-id pub-id-type="pmid">11238070</pub-id>
</element-citation>
</ref>
<ref id="b23">
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Mantaci</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Restivo</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Rosone</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sciortino</surname>
<given-names>M.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Coppo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lodi</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Pinna</surname>
<given-names>G. M.</given-names>
</name>
</person-group>
<article-title>A new combinatorial approach to sequence comparison</article-title>
<year>2005</year>
<conf-name>Proceedings of the 9th Italian Conference on Theoretical Computer Science (ICTCS 2005)</conf-name>
<publisher-loc>Berlin</publisher-loc>
<publisher-name>Springer Verlag</publisher-name>
<fpage>348</fpage>
<lpage>359</lpage>
<comment>volume 3701 of LNCS</comment>
</element-citation>
</ref>
<ref id="b24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nee</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>May</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Harvey</surname>
<given-names>P. H.</given-names>
</name>
</person-group>
<article-title>The reconstructed evolutionary process</article-title>
<source>Phil. Trans. R. Soc. B</source>
<year>1994</year>
<volume>344</volume>
<fpage>305</fpage>
<lpage>311</lpage>
<pub-id pub-id-type="pmid">7938201</pub-id>
</element-citation>
</ref>
<ref id="b25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ogden</surname>
<given-names>T. H.</given-names>
</name>
<name>
<surname>Rosenberg</surname>
<given-names>M. S.</given-names>
</name>
</person-group>
<article-title>Multiple sequence alignment accuracy and phylogenetic inference</article-title>
<source>Syst. Biol.</source>
<year>2006</year>
<volume>55</volume>
<fpage>314</fpage>
<lpage>328</lpage>
<pub-id pub-id-type="pmid">16611602</pub-id>
</element-citation>
</ref>
<ref id="b26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Otu</surname>
<given-names>H. H.</given-names>
</name>
<name>
<surname>Sayood</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>A new sequence distance measure for phylogenetic tree reconstruction</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>2122</fpage>
<lpage>2130</lpage>
<pub-id pub-id-type="pmid">14594718</pub-id>
</element-citation>
</ref>
<ref id="b27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>B.-I.</given-names>
</name>
</person-group>
<article-title>Whole proteome prokaryote phylogeny without sequence alignment: A
<italic>K</italic>
-string composition approach</article-title>
<source>J. Mol. Evol.</source>
<year>2004</year>
<volume>58</volume>
<fpage>1</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="pmid">14743310</pub-id>
</element-citation>
</ref>
<ref id="b28">
<element-citation publication-type="web">
<person-group person-group-type="author">
<name>
<surname>Rambaut</surname>
<given-names>A.</given-names>
</name>
</person-group>
<source>PhyloGen: Phylogenetic tree simulator package</source>
<year>2002</year>
<comment>Available from
<ext-link ext-link-type="uri" xlink:href="http://evolve.zoo.ox.ac.uk/software/PhyloGen/main.html">http://evolve.zoo.ox.ac.uk/software/PhyloGen/main.html</ext-link>
</comment>
</element-citation>
</ref>
<ref id="b29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rambaut</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Grassly</surname>
<given-names>N. C.</given-names>
</name>
</person-group>
<article-title>Sequence-Generator: An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees</article-title>
<source>Comput. Appl. Biosci.</source>
<year>1997</year>
<volume>13</volume>
<fpage>235</fpage>
<lpage>238</lpage>
<pub-id pub-id-type="pmid">9183526</pub-id>
</element-citation>
</ref>
<ref id="b30">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rigoutsos</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Floratos</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm</article-title>
<source>Bioinformatics</source>
<year>1998</year>
<volume>14</volume>
<fpage>55</fpage>
<lpage>67</lpage>
<comment>published erratum appears in Bioinformatics,
<bold>14</bold>
:229</comment>
<pub-id pub-id-type="pmid">9520502</pub-id>
</element-citation>
</ref>
<ref id="b31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robinson</surname>
<given-names>D. F.</given-names>
</name>
<name>
<surname>Foulds</surname>
<given-names>L. R.</given-names>
</name>
</person-group>
<article-title>Comparison of phylogenetic trees</article-title>
<source>Math. Biosci.</source>
<year>1981</year>
<volume>53</volume>
<fpage>131</fpage>
<lpage>147</lpage>
</element-citation>
</ref>
<ref id="b32">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ronquist</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Huelsenback</surname>
<given-names>J. P.</given-names>
</name>
</person-group>
<article-title>MrBayes 3: Bayesian phylogenetic inference under mixed models</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>1572</fpage>
<lpage>1574</lpage>
<pub-id pub-id-type="pmid">12912839</pub-id>
</element-citation>
</ref>
<ref id="b33">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saitou</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Nei</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>The neighbor-joining method: A new method for reconstructing phylogenetic trees</article-title>
<source>Mol. Biol. Evol.</source>
<year>1987</year>
<volume>4</volume>
<fpage>406</fpage>
<lpage>425</lpage>
<pub-id pub-id-type="pmid">3447015</pub-id>
</element-citation>
</ref>
<ref id="b34">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stuart</surname>
<given-names>G. W.</given-names>
</name>
<name>
<surname>Berry</surname>
<given-names>M. W.</given-names>
</name>
</person-group>
<article-title>A comprehensive whole genome bacterial phylogeny using correlated peptide motifs defined in a high dimensional vector space</article-title>
<source>J. Bioinformat. Comput. Biol.</source>
<year>2003</year>
<volume>1</volume>
<fpage>475</fpage>
<lpage>493</lpage>
</element-citation>
</ref>
<ref id="b35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stuart</surname>
<given-names>G. W.</given-names>
</name>
<name>
<surname>Berry</surname>
<given-names>M. W.</given-names>
</name>
</person-group>
<article-title>An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage</article-title>
<source>BMC Bioinformat.</source>
<year>2004</year>
<volume>5</volume>
<fpage>204</fpage>
</element-citation>
</ref>
<ref id="b36">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stuart</surname>
<given-names>G. W.</given-names>
</name>
<name>
<surname>Moffett</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Integrated gene and species phylogenies from unaligned whole genome protein sequences</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<fpage>100</fpage>
<lpage>108</lpage>
<pub-id pub-id-type="pmid">11836217</pub-id>
</element-citation>
</ref>
<ref id="b37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stuart</surname>
<given-names>G. W.</given-names>
</name>
<name>
<surname>Moffett</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Leader</surname>
<given-names>J. J.</given-names>
</name>
</person-group>
<article-title>A comprehensive vertebrate phylogeney using vector representations of protein sequences from whole genomes</article-title>
<source>Mol. Biol. Evol.</source>
<year>2002</year>
<volume>19</volume>
<fpage>554</fpage>
<lpage>562</lpage>
<pub-id pub-id-type="pmid">11919297</pub-id>
</element-citation>
</ref>
<ref id="b38">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Taylor</surname>
<given-names>W. R.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>D. T.</given-names>
</name>
</person-group>
<article-title>Deriving an amino acid distance matrix</article-title>
<source>J. Theor. Biol.</source>
<year>1993</year>
<volume>164</volume>
<fpage>65</fpage>
<lpage>83</lpage>
<pub-id pub-id-type="pmid">8264244</pub-id>
</element-citation>
</ref>
<ref id="b39">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ulitsky</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Burstein</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Tuller</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Chor</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>The average common substring approach to phylogenomic reconstruction</article-title>
<source>J. Comput. Biol.</source>
<year>2006</year>
<volume>13</volume>
<fpage>336</fpage>
<lpage>350</lpage>
<pub-id pub-id-type="pmid">16597244</pub-id>
</element-citation>
</ref>
<ref id="b40">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van Helden</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Metrics for comparing regulatory sequences on the basis of pattern counts</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>399</fpage>
<lpage>406</lpage>
<pub-id pub-id-type="pmid">14764560</pub-id>
</element-citation>
</ref>
<ref id="b41">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vinga</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Almeida</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Alignment-free sequence comparison—A review</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>513</fpage>
<lpage>523</lpage>
<pub-id pub-id-type="pmid">12611807</pub-id>
</element-citation>
</ref>
<ref id="b42">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vinga</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Gouveia-Oliveira</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Almeida</surname>
<given-names>J. S.</given-names>
</name>
</person-group>
<article-title>Comparative evaluation of word composition distances for the recognition of SCOP relationships</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>206</fpage>
<lpage>215</lpage>
<pub-id pub-id-type="pmid">14734312</pub-id>
</element-citation>
</ref>
<ref id="b43">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>On the complexity of multiple sequence alignment</article-title>
<source>J. Comput. Biol.</source>
<year>1994</year>
<volume>1</volume>
<fpage>337</fpage>
<lpage>348</lpage>
<pub-id pub-id-type="pmid">8790475</pub-id>
</element-citation>
</ref>
<ref id="b44">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>T.-J.</given-names>
</name>
<name>
<surname>Burke</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Davison</surname>
<given-names>D. B.</given-names>
</name>
</person-group>
<article-title>A measure of DNA sequence dissimilarity based on the Mahalanobis distance between frequencies of words</article-title>
<source>Biometrics</source>
<year>1997</year>
<volume>53</volume>
<fpage>1431</fpage>
<lpage>1439</lpage>
<pub-id pub-id-type="pmid">9423258</pub-id>
</element-citation>
</ref>
<ref id="b45">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>A. C.-C.</given-names>
</name>
<name>
<surname>Goldberger</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>C.-K.</given-names>
</name>
</person-group>
<article-title>Genome classification using an information-based similarity index: Application to the SARS coronavirus</article-title>
<source>J. Comput. Biol.</source>
<year>2005</year>
<volume>12</volume>
<fpage>1103</fpage>
<lpage>1116</lpage>
<pub-id pub-id-type="pmid">16241900</pub-id>
</element-citation>
</ref>
<ref id="b46">
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>Z.-G.</given-names>
</name>
<name>
<surname>Anh</surname>
<given-names>V.</given-names>
</name>
</person-group>
<article-title>Phylogenetic tree of prokaryotes based on complete genomes using fractal and correlation analyzes</article-title>
<year>2004</year>
<conf-name>Proceedings of the 2nd Conference on Asia-Pacific Bioinformatics (APBC 2004)</conf-name>
<publisher-loc>New Zealand</publisher-loc>
<publisher-name>Dunedin</publisher-name>
<fpage>321</fpage>
<lpage>326</lpage>
</element-citation>
</ref>
<ref id="b47">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Zar</surname>
<given-names>J. H.</given-names>
</name>
</person-group>
<source>Biostatistical analysis</source>
<year>1999</year>
<edition>4th edition</edition>
<publisher-loc>Upper Saddle River, New Jersey</publisher-loc>
<publisher-name>Prentice Hall</publisher-name>
</element-citation>
</ref>
</ref-list>
<app-group>
<app id="app1">
<title>Appendix</title>
<sec>
<title>In-Depth Analysis of Tree Reconstruction Accuracy Using Synthetic Data</title>
<p>Let us compare the tree reconstruction accuracy of all methods in detail.
<xref rid="tbl1" ref-type="table">Tables 1</xref>
,
<xref rid="tbl2" ref-type="table">2</xref>
, and
<xref rid="tbl7" ref-type="table">A1</xref>
show the RF distances for each of the seven reference sets; increasing numbers indicate increasing phylogenetic reference distances. As expected, for each method the mean RF distances increase for successive reference sets (with two exceptions). However, the absolute values vary considerably between reference sets and with different methods. They range from 0.024 for trees inferred from
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
on set 1 of the control data, to 0.906 for
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
with AA on set 7 of the short-sequences data. Also, as expected, RF distances for trees inferred from short sequences are often worse; i.e., higher, than those inferred from the longer sequences in the control data, especially for the well-performing first five methods (cf.
<xref rid="tbl7" ref-type="table">Tables A1</xref>
and
<xref rid="tbl1" ref-type="table">1</xref>
).</p>
<p>If we are willing to accept a maximum RF distance of, say, 0.2 (corresponding to a tree reconstruction accuracy of 80%), we find that different methods are restricted to analyze data sets with different, limited phylogenetic distances between sequences. On the control data set, the maximum of 0.2 means that
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
can be used on all seven sets. Methods
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
can analyze sets 1 through 4, whereas almost all other alignment-free methods can only be used for the first two sets (
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
with CE is limited to set 1 and
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
would not be usable at all).</p>
<p>
<xref rid="tbl2" ref-type="table">Table 2</xref>
reveals that presence of high among-site rate variation leads to an improved overall phylogenetic accuracy for virtually all alignment-free methods. In particular, RF distances for sets 4 through 7 mostly decrease, whereas RF distances for sets 1 through 3 may increase. Note, however, that
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
is performing worse on all reference sets of this data than on the corresponding reference sets of the control data. The RF distances are for an ML estimate
<italic>without</italic>
the inclusion of ASRV as it performs better overall (as judged by rank sums) than its correctly parameterized counterpart (with α = 0.5). For completeness, here are the corresponding RF distances: 0.042, 0.074, 0.094, 0.112, 0.154, 0.196, 0.230. Based on this observation, we did not attempt to measure performance of
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
parameterized with a gamma model.</p>
<p>Repeating our previous analysis with a maximum RF distance of 0.2 for the ASRV data set shows that use of
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
is now restricted to set 6 or 5 (depending on which parameterization we choose). Methods
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
with AA are now usable up to set 5,
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
and
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
with CE remain usable up to set 4, and many other alignment-free methods (9 out of 17) can additionally handle set 3. This finding reflects the improved overall phylogenetic accuracy that presence of high among-site rate variation has on the alignment-free methods.</p>
</sec>
<sec>
<title>In-Depth Analysis of Alphabets Using Empirical Data</title>
<p>Following on from Analysis Using the Putative Orthologs Data Set, in the remainder we analyze the influence of alphabets AA and CE as described in Methods. We obtained the total rank sum of the best performing variants of all methods for each alphabet and under each measure. We first tested whether we could pool the rank sums across the four reference sets. The χ
<sup>2</sup>
test for heterogeneity yields χ
<sup>2</sup>
= 8.315 (
<italic>P</italic>
= 0.040,
<italic>df</italic>
= 3); this result is a borderline case and dependent on the significance level: at α = 0.05, we reject the null hypothesis of homogeneity and conclude that we cannot pool the heterogeneous data. However, at the more stringent α = 0.01 level, we cannot reject homogeneity and are allowed to pool the individual tests. For AA sequences, the pooled rank sums increase; i.e., worsen from 395.0 under measure FN to 452.5 under measure DPB. For CE sequences, they decrease; i.e., improve from 445.0 (FN) to 387.5 (DPB). The pooled results are distributed χ
<sup>2</sup>
= 7.601 (corrected for continuity,
<italic>P</italic>
= 0.006,
<italic>df</italic>
= 1). Thus we conclude that the difference in pooled rank sums is statistically significant. Generally, as measured by FN, using the original sequences (alphabet AA) is beneficial for most alignment-free methods including
<italic>B-bin</italic>
(but not
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
; cf.
<xref rid="tbl3" ref-type="table">Tables 3</xref>
and
<xref rid="tbl8" ref-type="table">A2</xref>
). Considering DPB only, encoding sequences using alphabet CE improves ranks summed over all methods. More precisely, 6 out of 10 methods including
<italic>B-bin</italic>
are ranked higher using CE than AA. Note that the difference between pooled AA and CE rank sums is less under measure FN than under DPB. Though not all methods profit from CE under measure DPB, those that do do so more strongly than methods profiting from AA under measure FN. In other words, under measure FN and for each method, use of AA leads to less improvement (as measured by pooled rank sums) over the use of CE than CE improves over AA under DPB. Also note that the average of pooled CE rank sums is lower; i.e. better than the average for AA. Finally, we remark that if we optimized the word lengths for DPB separately, this would yield pooled rank sums that show a bigger difference between AA and CE; i.e., CE performs better at 380.5, and conversely AA performs worse at 459.5.</p>
<p>The overall finding—alphabet AA is better than CE under measure FN, whereas under measure DPB the reverse is true—holds for the two reference sets with many; i.e., 12 to 20 taxa (M-S and M-L) in a similar, statistically supported fashion. On reference set M-S, the rank sums for AA change from 86.0 (FN) to 115.0 (DPB), whereas the rank sums for CE change from 124.0 to 95.0. On reference set M-L, the numbers are 88.0 and 117.0 (AA) versus 122.0 and 93.0 (CE). The individual test results are χ
<sup>2</sup>
= 7.480 (
<italic>P</italic>
= 0.006) and χ
<sup>2</sup>
= 7.471 (
<italic>P</italic>
= 0.006) for M-S and M-L, respectively.</p>
<p>A different picture emerges for reference sets with few, i.e., 4 to 8, taxa (F-S and F-L): both sets show no statistically significant difference in distribution of rank sums for each alphabet between the two measures. The test outcomes are χ
<sup>2</sup>
= 0.039 (
<italic>P</italic>
= 0.844) and χ
<sup>2</sup>
= 0.022 (
<italic>P</italic>
= 0.881) for F-S and F-L, respectively. Instead, we find one alphabet superior under both measures, and this alphabet changes with the reference set. On set F-S, alphabet AA with rank sums of 95.0 and 92.0 outperforms CE with rank sums of 115.0 to 118.0. On set F-L, the reverse is true: alphabet CE yields lower rank sums (84.0 and 81.5), i.e.; better results than AA (126.0 and 128.5).</p>
<table-wrap id="tbl7" orientation="portrait" position="anchor">
<label>Table A1.</label>
<caption>
<p>Short-sequences data set. Average RF distance for each reference set of the synthetic short-sequences data set (sequence length of 300 amino acids, no ASRV). Order of methods and values for
<italic>k</italic>
are determined as in
<xref rid="tbl1" ref-type="table">Table 1</xref>
. The Friedman test statistic is
<italic>F</italic>
<sub>
<italic>R</italic>
</sub>
= 3693.4 (
<italic>P</italic>
< 10
<sup>−10</sup>
). Significant differences are found at or beyond the α = 0.05 level between the following pairs (numbers refer to column “No.”): method 1 versus methods 22–2, methods 2 and 3 versus methods 22–4; methods 4 and 5 versus methods 22–6; methods 6–19 versus methods 22–20; and methods 20 and 21 versus method 22.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">Reference set of short-sequences data</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="7" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in1.jpg"></inline-graphic>
</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">1</th>
<th align="center" colspan="1" rowspan="1">2</th>
<th align="center" colspan="1" rowspan="1">3</th>
<th align="center" colspan="1" rowspan="1">4</th>
<th align="center" colspan="1" rowspan="1">5</th>
<th align="center" colspan="1" rowspan="1">6</th>
<th align="center" colspan="1" rowspan="1">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">3624.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.060</td>
<td align="center" colspan="1" rowspan="1">0.102</td>
<td align="center" colspan="1" rowspan="1">0.138</td>
<td align="center" colspan="1" rowspan="1">0.178</td>
<td align="center" colspan="1" rowspan="1">0.244</td>
<td align="center" colspan="1" rowspan="1">0.304</td>
<td align="center" colspan="1" rowspan="1">0.350</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">4765.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.076</td>
<td align="center" colspan="1" rowspan="1">0.108</td>
<td align="center" colspan="1" rowspan="1">0.172</td>
<td align="center" colspan="1" rowspan="1">0.218</td>
<td align="center" colspan="1" rowspan="1">0.360</td>
<td align="center" colspan="1" rowspan="1">0.492</td>
<td align="center" colspan="1" rowspan="1">0.632</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">4836.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.064</td>
<td align="center" colspan="1" rowspan="1">0.098</td>
<td align="center" colspan="1" rowspan="1">0.176</td>
<td align="center" colspan="1" rowspan="1">0.218</td>
<td align="center" colspan="1" rowspan="1">0.356</td>
<td align="center" colspan="1" rowspan="1">0.534</td>
<td align="center" colspan="1" rowspan="1">0.658</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">5827.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.080</td>
<td align="center" colspan="1" rowspan="1">0.106</td>
<td align="center" colspan="1" rowspan="1">0.198</td>
<td align="center" colspan="1" rowspan="1">0.266</td>
<td align="center" colspan="1" rowspan="1">0.498</td>
<td align="center" colspan="1" rowspan="1">0.662</td>
<td align="center" colspan="1" rowspan="1">0.750</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">5984.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.062</td>
<td align="center" colspan="1" rowspan="1">0.100</td>
<td align="center" colspan="1" rowspan="1">0.204</td>
<td align="center" colspan="1" rowspan="1">0.272</td>
<td align="center" colspan="1" rowspan="1">0.504</td>
<td align="center" colspan="1" rowspan="1">0.712</td>
<td align="center" colspan="1" rowspan="1">0.764</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">8171.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.110</td>
<td align="center" colspan="1" rowspan="1">0.180</td>
<td align="center" colspan="1" rowspan="1">0.308</td>
<td align="center" colspan="1" rowspan="1">0.456</td>
<td align="center" colspan="1" rowspan="1">0.684</td>
<td align="center" colspan="1" rowspan="1">0.794</td>
<td align="center" colspan="1" rowspan="1">0.838</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="center" colspan="1" rowspan="1">8206.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.112</td>
<td align="center" colspan="1" rowspan="1">0.180</td>
<td align="center" colspan="1" rowspan="1">0.312</td>
<td align="center" colspan="1" rowspan="1">0.464</td>
<td align="center" colspan="1" rowspan="1">0.670</td>
<td align="center" colspan="1" rowspan="1">0.806</td>
<td align="center" colspan="1" rowspan="1">0.830</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8</td>
<td align="center" colspan="1" rowspan="1">8251.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.096</td>
<td align="center" colspan="1" rowspan="1">0.170</td>
<td align="center" colspan="1" rowspan="1">0.338</td>
<td align="center" colspan="1" rowspan="1">0.462</td>
<td align="center" colspan="1" rowspan="1">0.700</td>
<td align="center" colspan="1" rowspan="1">0.798</td>
<td align="center" colspan="1" rowspan="1">0.850</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="center" colspan="1" rowspan="1">8258.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.074</td>
<td align="center" colspan="1" rowspan="1">0.146</td>
<td align="center" colspan="1" rowspan="1">0.322</td>
<td align="center" colspan="1" rowspan="1">0.468</td>
<td align="center" colspan="1" rowspan="1">0.714</td>
<td align="center" colspan="1" rowspan="1">0.802</td>
<td align="center" colspan="1" rowspan="1">0.892</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="center" colspan="1" rowspan="1">8359.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.108</td>
<td align="center" colspan="1" rowspan="1">0.186</td>
<td align="center" colspan="1" rowspan="1">0.328</td>
<td align="center" colspan="1" rowspan="1">0.468</td>
<td align="center" colspan="1" rowspan="1">0.706</td>
<td align="center" colspan="1" rowspan="1">0.792</td>
<td align="center" colspan="1" rowspan="1">0.842</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="center" colspan="1" rowspan="1">8442.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.068</td>
<td align="center" colspan="1" rowspan="1">0.156</td>
<td align="center" colspan="1" rowspan="1">0.322</td>
<td align="center" colspan="1" rowspan="1">0.490</td>
<td align="center" colspan="1" rowspan="1">0.730</td>
<td align="center" colspan="1" rowspan="1">0.816</td>
<td align="center" colspan="1" rowspan="1">0.890</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12</td>
<td align="center" colspan="1" rowspan="1">8456.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.106</td>
<td align="center" colspan="1" rowspan="1">0.170</td>
<td align="center" colspan="1" rowspan="1">0.370</td>
<td align="center" colspan="1" rowspan="1">0.486</td>
<td align="center" colspan="1" rowspan="1">0.706</td>
<td align="center" colspan="1" rowspan="1">0.784</td>
<td align="center" colspan="1" rowspan="1">0.834</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">13</td>
<td align="center" colspan="1" rowspan="1">8475.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.078</td>
<td align="center" colspan="1" rowspan="1">0.146</td>
<td align="center" colspan="1" rowspan="1">0.330</td>
<td align="center" colspan="1" rowspan="1">0.492</td>
<td align="center" colspan="1" rowspan="1">0.728</td>
<td align="center" colspan="1" rowspan="1">0.820</td>
<td align="center" colspan="1" rowspan="1">0.892</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="center" colspan="1" rowspan="1">8479.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.088</td>
<td align="center" colspan="1" rowspan="1">0.158</td>
<td align="center" colspan="1" rowspan="1">0.338</td>
<td align="center" colspan="1" rowspan="1">0.526</td>
<td align="center" colspan="1" rowspan="1">0.702</td>
<td align="center" colspan="1" rowspan="1">0.802</td>
<td align="center" colspan="1" rowspan="1">0.854</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="center" colspan="1" rowspan="1">8558.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.092</td>
<td align="center" colspan="1" rowspan="1">0.162</td>
<td align="center" colspan="1" rowspan="1">0.332</td>
<td align="center" colspan="1" rowspan="1">0.514</td>
<td align="center" colspan="1" rowspan="1">0.744</td>
<td align="center" colspan="1" rowspan="1">0.822</td>
<td align="center" colspan="1" rowspan="1">0.838</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="center" colspan="1" rowspan="1">8628.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.128</td>
<td align="center" colspan="1" rowspan="1">0.222</td>
<td align="center" colspan="1" rowspan="1">0.362</td>
<td align="center" colspan="1" rowspan="1">0.492</td>
<td align="center" colspan="1" rowspan="1">0.696</td>
<td align="center" colspan="1" rowspan="1">0.794</td>
<td align="center" colspan="1" rowspan="1">0.808</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">17</td>
<td align="center" colspan="1" rowspan="1">8697.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.068</td>
<td align="center" colspan="1" rowspan="1">0.154</td>
<td align="center" colspan="1" rowspan="1">0.334</td>
<td align="center" colspan="1" rowspan="1">0.544</td>
<td align="center" colspan="1" rowspan="1">0.762</td>
<td align="center" colspan="1" rowspan="1">0.842</td>
<td align="center" colspan="1" rowspan="1">0.860</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">18</td>
<td align="center" colspan="1" rowspan="1">8791.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.086</td>
<td align="center" colspan="1" rowspan="1">0.176</td>
<td align="center" colspan="1" rowspan="1">0.354</td>
<td align="center" colspan="1" rowspan="1">0.538</td>
<td align="center" colspan="1" rowspan="1">0.738</td>
<td align="center" colspan="1" rowspan="1">0.852</td>
<td align="center" colspan="1" rowspan="1">0.832</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19</td>
<td align="center" colspan="1" rowspan="1">9016.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.104</td>
<td align="center" colspan="1" rowspan="1">0.244</td>
<td align="center" colspan="1" rowspan="1">0.358</td>
<td align="center" colspan="1" rowspan="1">0.524</td>
<td align="center" colspan="1" rowspan="1">0.730</td>
<td align="center" colspan="1" rowspan="1">0.816</td>
<td align="center" colspan="1" rowspan="1">0.868</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="center" colspan="1" rowspan="1">10,198.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.116</td>
<td align="center" colspan="1" rowspan="1">0.252</td>
<td align="center" colspan="1" rowspan="1">0.444</td>
<td align="center" colspan="1" rowspan="1">0.614</td>
<td align="center" colspan="1" rowspan="1">0.816</td>
<td align="center" colspan="1" rowspan="1">0.890</td>
<td align="center" colspan="1" rowspan="1">0.906</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">21</td>
<td align="center" colspan="1" rowspan="1">10,964.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.176</td>
<td align="center" colspan="1" rowspan="1">0.338</td>
<td align="center" colspan="1" rowspan="1">0.506</td>
<td align="center" colspan="1" rowspan="1">0.692</td>
<td align="center" colspan="1" rowspan="1">0.836</td>
<td align="center" colspan="1" rowspan="1">0.890</td>
<td align="center" colspan="1" rowspan="1">0.884</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">22</td>
<td align="center" colspan="1" rowspan="1">12,108.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">(1)</td>
<td align="center" colspan="1" rowspan="1">0.482</td>
<td align="center" colspan="1" rowspan="1">0.546</td>
<td align="center" colspan="1" rowspan="1">0.668</td>
<td align="center" colspan="1" rowspan="1">0.734</td>
<td align="center" colspan="1" rowspan="1">0.800</td>
<td align="center" colspan="1" rowspan="1">0.872</td>
<td align="center" colspan="1" rowspan="1">0.886</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tbl8" orientation="portrait" position="anchor">
<label>Table A2.</label>
<caption>
<p>DPB count for putative orthologs data set. Count of unrecovered DPBs for each reference set of the putative orthologs data set; numbers in parentheses indicate set size/maximal possible values. Values for
<italic>k</italic>
are identical to
<xref rid="tbl3" ref-type="table">Table 3</xref>
; order of methods is determined as in
<xref rid="tbl3" ref-type="table">Table 3</xref>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="4" rowspan="1">Reference set</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="4" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="1" rowspan="1">F-S</th>
<th align="center" colspan="1" rowspan="1">F-L</th>
<th align="center" colspan="1" rowspan="1">M-S</th>
<th align="center" colspan="1" rowspan="1">M-L</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in1.jpg"></inline-graphic>
</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">(50)</th>
<th align="center" colspan="1" rowspan="1">(52)</th>
<th align="center" colspan="1" rowspan="1">(80)</th>
<th align="center" colspan="1" rowspan="1">(38)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">20.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">24.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">25.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">2</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4.5</td>
<td align="center" colspan="1" rowspan="1">29.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4.5</td>
<td align="center" colspan="1" rowspan="1">29.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">2</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">31.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="center" colspan="1" rowspan="1">32.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">2</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8.5</td>
<td align="center" colspan="1" rowspan="1">41.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">0</td>
<td align="center" colspan="1" rowspan="1">8</td>
<td align="center" colspan="1" rowspan="1">4</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8.5</td>
<td align="center" colspan="1" rowspan="1">41.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">0</td>
<td align="center" colspan="1" rowspan="1">8</td>
<td align="center" colspan="1" rowspan="1">4</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="center" colspan="1" rowspan="1">42.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">2</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="center" colspan="1" rowspan="1">42.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">0</td>
<td align="center" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">3</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12.5</td>
<td align="center" colspan="1" rowspan="1">44.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>B-bin</italic>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">6</td>
<td align="center" colspan="1" rowspan="1">4</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12.5</td>
<td align="center" colspan="1" rowspan="1">44.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">3</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="center" colspan="1" rowspan="1">45.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">2</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="center" colspan="1" rowspan="1">45.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">14</td>
<td align="center" colspan="1" rowspan="1">3</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16.5</td>
<td align="center" colspan="1" rowspan="1">48.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">4</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16.5</td>
<td align="center" colspan="1" rowspan="1">48.0</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">8</td>
<td align="center" colspan="1" rowspan="1">2</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">18</td>
<td align="center" colspan="1" rowspan="1">54.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">1</td>
<td align="center" colspan="1" rowspan="1">7</td>
<td align="center" colspan="1" rowspan="1">4</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19</td>
<td align="center" colspan="1" rowspan="1">73.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">2</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">15</td>
<td align="center" colspan="1" rowspan="1">11</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="center" colspan="1" rowspan="1">78.5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">7</td>
<td align="center" colspan="1" rowspan="1">17</td>
<td align="center" colspan="1" rowspan="1">5</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tbl9" orientation="portrait" position="anchor">
<label>Table A3.</label>
<caption>
<p>Distances for control data set. Median of calculated distances for each reference set of the synthetic control data set (sequence length of 1000 amino acids, no ASRV). Order of methods and values for
<italic>k</italic>
are as in
<xref rid="tbl1" ref-type="table">Table 1</xref>
. Note that method
<italic>B-bin</italic>
is not listed as it does not calculate distances.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">Reference set of control data</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">1</th>
<th align="center" colspan="1" rowspan="1">2</th>
<th align="center" colspan="1" rowspan="1">3</th>
<th align="center" colspan="1" rowspan="1">4</th>
<th align="center" colspan="1" rowspan="1">5</th>
<th align="center" colspan="1" rowspan="1">6</th>
<th align="center" colspan="1" rowspan="1">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.7444</td>
<td align="center" colspan="1" rowspan="1">1.0993</td>
<td align="center" colspan="1" rowspan="1">1.6257</td>
<td align="center" colspan="1" rowspan="1">2.0488</td>
<td align="center" colspan="1" rowspan="1">2.4290</td>
<td align="center" colspan="1" rowspan="1">2.9717</td>
<td align="center" colspan="1" rowspan="1">3.3942</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.9895</td>
<td align="center" colspan="1" rowspan="1">2.4403</td>
<td align="center" colspan="1" rowspan="1">2.8530</td>
<td align="center" colspan="1" rowspan="1">3.0180</td>
<td align="center" colspan="1" rowspan="1">3.0853</td>
<td align="center" colspan="1" rowspan="1">3.1345</td>
<td align="center" colspan="1" rowspan="1">3.1525</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">8.6963</td>
<td align="center" colspan="1" rowspan="1">9.3471</td>
<td align="center" colspan="1" rowspan="1">9.8041</td>
<td align="center" colspan="1" rowspan="1">9.9639</td>
<td align="center" colspan="1" rowspan="1">10.025</td>
<td align="center" colspan="1" rowspan="1">10.063</td>
<td align="center" colspan="1" rowspan="1">10.075</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.1372</td>
<td align="center" colspan="1" rowspan="1">1.5553</td>
<td align="center" colspan="1" rowspan="1">1.9732</td>
<td align="center" colspan="1" rowspan="1">2.1295</td>
<td align="center" colspan="1" rowspan="1">2.1927</td>
<td align="center" colspan="1" rowspan="1">2.2274</td>
<td align="center" colspan="1" rowspan="1">2.2394</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">6.8358</td>
<td align="center" colspan="1" rowspan="1">7.9676</td>
<td align="center" colspan="1" rowspan="1">8.7688</td>
<td align="center" colspan="1" rowspan="1">8.9946</td>
<td align="center" colspan="1" rowspan="1">9.0815</td>
<td align="center" colspan="1" rowspan="1">9.1209</td>
<td align="center" colspan="1" rowspan="1">9.1362</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.2536</td>
<td align="center" colspan="1" rowspan="1">1.3577</td>
<td align="center" colspan="1" rowspan="1">1.4002</td>
<td align="center" colspan="1" rowspan="1">1.4128</td>
<td align="center" colspan="1" rowspan="1">1.4176</td>
<td align="center" colspan="1" rowspan="1">1.4212</td>
<td align="center" colspan="1" rowspan="1">1.4226</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8550</td>
<td align="center" colspan="1" rowspan="1">0.9308</td>
<td align="center" colspan="1" rowspan="1">0.9668</td>
<td align="center" colspan="1" rowspan="1">0.9775</td>
<td align="center" colspan="1" rowspan="1">0.9825</td>
<td align="center" colspan="1" rowspan="1">0.9856</td>
<td align="center" colspan="1" rowspan="1">0.9871</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">5.6698</td>
<td align="center" colspan="1" rowspan="1">6.1015</td>
<td align="center" colspan="1" rowspan="1">6.3468</td>
<td align="center" colspan="1" rowspan="1">6.4461</td>
<td align="center" colspan="1" rowspan="1">6.4881</td>
<td align="center" colspan="1" rowspan="1">6.4606</td>
<td align="center" colspan="1" rowspan="1">6.5021</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.6343</td>
<td align="center" colspan="1" rowspan="1">0.8066</td>
<td align="center" colspan="1" rowspan="1">0.8821</td>
<td align="center" colspan="1" rowspan="1">0.9027</td>
<td align="center" colspan="1" rowspan="1">0.9098</td>
<td align="center" colspan="1" rowspan="1">0.9157</td>
<td align="center" colspan="1" rowspan="1">0.9175</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.8659</td>
<td align="center" colspan="1" rowspan="1">0.9516</td>
<td align="center" colspan="1" rowspan="1">0.9778</td>
<td align="center" colspan="1" rowspan="1">0.9839</td>
<td align="center" colspan="1" rowspan="1">0.9859</td>
<td align="center" colspan="1" rowspan="1">0.9870</td>
<td align="center" colspan="1" rowspan="1">0.9874</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">1.4374</td>
<td align="center" colspan="1" rowspan="1">1.7130</td>
<td align="center" colspan="1" rowspan="1">1.8759</td>
<td align="center" colspan="1" rowspan="1">1.9229</td>
<td align="center" colspan="1" rowspan="1">1.9437</td>
<td align="center" colspan="1" rowspan="1">1.9578</td>
<td align="center" colspan="1" rowspan="1">1.9650</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1850</td>
<td align="center" colspan="1" rowspan="1">1946</td>
<td align="center" colspan="1" rowspan="1">1978</td>
<td align="center" colspan="1" rowspan="1">1986</td>
<td align="center" colspan="1" rowspan="1">1990</td>
<td align="center" colspan="1" rowspan="1">1994</td>
<td align="center" colspan="1" rowspan="1">1992</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">13</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">1780</td>
<td align="center" colspan="1" rowspan="1">1902</td>
<td align="center" colspan="1" rowspan="1">1958</td>
<td align="center" colspan="1" rowspan="1">1976</td>
<td align="center" colspan="1" rowspan="1">1982</td>
<td align="center" colspan="1" rowspan="1">1986</td>
<td align="center" colspan="1" rowspan="1">1990</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1.7079</td>
<td align="center" colspan="1" rowspan="1">2.0017</td>
<td align="center" colspan="1" rowspan="1">2.1366</td>
<td align="center" colspan="1" rowspan="1">2.1712</td>
<td align="center" colspan="1" rowspan="1">2.1889</td>
<td align="center" colspan="1" rowspan="1">2.1889</td>
<td align="center" colspan="1" rowspan="1">2.1979</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.2928</td>
<td align="center" colspan="1" rowspan="1">0.3099</td>
<td align="center" colspan="1" rowspan="1">0.3184</td>
<td align="center" colspan="1" rowspan="1">0.3212</td>
<td align="center" colspan="1" rowspan="1">0.3230</td>
<td align="center" colspan="1" rowspan="1">0.3207</td>
<td align="center" colspan="1" rowspan="1">0.3221</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8560</td>
<td align="center" colspan="1" rowspan="1">0.8853</td>
<td align="center" colspan="1" rowspan="1">0.8967</td>
<td align="center" colspan="1" rowspan="1">0.9002</td>
<td align="center" colspan="1" rowspan="1">0.9017</td>
<td align="center" colspan="1" rowspan="1">0.9024</td>
<td align="center" colspan="1" rowspan="1">0.9029</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">18</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8721</td>
<td align="center" colspan="1" rowspan="1">0.8957</td>
<td align="center" colspan="1" rowspan="1">0.9045</td>
<td align="center" colspan="1" rowspan="1">0.9073</td>
<td align="center" colspan="1" rowspan="1">0.9080</td>
<td align="center" colspan="1" rowspan="1">0.9086</td>
<td align="center" colspan="1" rowspan="1">0.9094</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.4696</td>
<td align="center" colspan="1" rowspan="1">0.4961</td>
<td align="center" colspan="1" rowspan="1">0.5060</td>
<td align="center" colspan="1" rowspan="1">0.5092</td>
<td align="center" colspan="1" rowspan="1">0.5108</td>
<td align="center" colspan="1" rowspan="1">0.5114</td>
<td align="center" colspan="1" rowspan="1">0.5119</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">21</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.4672</td>
<td align="center" colspan="1" rowspan="1">0.4924</td>
<td align="center" colspan="1" rowspan="1">0.5035</td>
<td align="center" colspan="1" rowspan="1">0.5073</td>
<td align="center" colspan="1" rowspan="1">0.5086</td>
<td align="center" colspan="1" rowspan="1">0.5101</td>
<td align="center" colspan="1" rowspan="1">0.5102</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">22</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">(1)</td>
<td align="center" colspan="1" rowspan="1">0.0043</td>
<td align="center" colspan="1" rowspan="1">0.0059</td>
<td align="center" colspan="1" rowspan="1">0.0071</td>
<td align="center" colspan="1" rowspan="1">0.0081</td>
<td align="center" colspan="1" rowspan="1">0.0088</td>
<td align="center" colspan="1" rowspan="1">0.0097</td>
<td align="center" colspan="1" rowspan="1">0.0099</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tbl10" orientation="portrait" position="anchor">
<label>Table A4.</label>
<caption>
<p>Distances for ASRV data set. Median of calculated distances for each reference set of the synthetic ASRV data set (sequence length of 1000 amino acids, high ASRV with α = 0.5). Order of methods and values for
<italic>k</italic>
are as in
<xref rid="tbl2" ref-type="table">Table 2</xref>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">Reference set of ASRV data</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">1</th>
<th align="center" colspan="1" rowspan="1">2</th>
<th align="center" colspan="1" rowspan="1">3</th>
<th align="center" colspan="1" rowspan="1">4</th>
<th align="center" colspan="1" rowspan="1">5</th>
<th align="center" colspan="1" rowspan="1">6</th>
<th align="center" colspan="1" rowspan="1">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.4635</td>
<td align="center" colspan="1" rowspan="1">0.6077</td>
<td align="center" colspan="1" rowspan="1">0.7746</td>
<td align="center" colspan="1" rowspan="1">0.9114</td>
<td align="center" colspan="1" rowspan="1">1.0031</td>
<td align="center" colspan="1" rowspan="1">1.1273</td>
<td align="center" colspan="1" rowspan="1">1.2114</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.7188</td>
<td align="center" colspan="1" rowspan="1">0.8724</td>
<td align="center" colspan="1" rowspan="1">1.0516</td>
<td align="center" colspan="1" rowspan="1">1.1905</td>
<td align="center" colspan="1" rowspan="1">1.2653</td>
<td align="center" colspan="1" rowspan="1">1.3803</td>
<td align="center" colspan="1" rowspan="1">1.4582</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">5.1888</td>
<td align="center" colspan="1" rowspan="1">5.8917</td>
<td align="center" colspan="1" rowspan="1">6.5781</td>
<td align="center" colspan="1" rowspan="1">7.0346</td>
<td align="center" colspan="1" rowspan="1">7.2585</td>
<td align="center" colspan="1" rowspan="1">7.5753</td>
<td align="center" colspan="1" rowspan="1">7.7762</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.6066</td>
<td align="center" colspan="1" rowspan="1">1.8151</td>
<td align="center" colspan="1" rowspan="1">2.0460</td>
<td align="center" colspan="1" rowspan="1">2.2236</td>
<td align="center" colspan="1" rowspan="1">2.3105</td>
<td align="center" colspan="1" rowspan="1">2.4451</td>
<td align="center" colspan="1" rowspan="1">2.5255</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">7.9433</td>
<td align="center" colspan="1" rowspan="1">8.3644</td>
<td align="center" colspan="1" rowspan="1">8.7589</td>
<td align="center" colspan="1" rowspan="1">9.0329</td>
<td align="center" colspan="1" rowspan="1">9.1515</td>
<td align="center" colspan="1" rowspan="1">9.3314</td>
<td align="center" colspan="1" rowspan="1">9.4407</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.6591</td>
<td align="center" colspan="1" rowspan="1">0.7971</td>
<td align="center" colspan="1" rowspan="1">0.8737</td>
<td align="center" colspan="1" rowspan="1">0.9111</td>
<td align="center" colspan="1" rowspan="1">0.9278</td>
<td align="center" colspan="1" rowspan="1">0.9431</td>
<td align="center" colspan="1" rowspan="1">0.9511</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1612</td>
<td align="center" colspan="1" rowspan="1">1744</td>
<td align="center" colspan="1" rowspan="1">1834</td>
<td align="center" colspan="1" rowspan="1">1882</td>
<td align="center" colspan="1" rowspan="1">1904</td>
<td align="center" colspan="1" rowspan="1">1924</td>
<td align="center" colspan="1" rowspan="1">1938</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1.2087</td>
<td align="center" colspan="1" rowspan="1">1.4550</td>
<td align="center" colspan="1" rowspan="1">1.6646</td>
<td align="center" colspan="1" rowspan="1">1.7946</td>
<td align="center" colspan="1" rowspan="1">1.8633</td>
<td align="center" colspan="1" rowspan="1">1.9301</td>
<td align="center" colspan="1" rowspan="1">1.9724</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8026</td>
<td align="center" colspan="1" rowspan="1">0.8423</td>
<td align="center" colspan="1" rowspan="1">0.8678</td>
<td align="center" colspan="1" rowspan="1">0.8799</td>
<td align="center" colspan="1" rowspan="1">0.8854</td>
<td align="center" colspan="1" rowspan="1">0.8910</td>
<td align="center" colspan="1" rowspan="1">0.8942</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.0164</td>
<td align="center" colspan="1" rowspan="1">1.1450</td>
<td align="center" colspan="1" rowspan="1">1.2363</td>
<td align="center" colspan="1" rowspan="1">1.2858</td>
<td align="center" colspan="1" rowspan="1">1.3101</td>
<td align="center" colspan="1" rowspan="1">1.3347</td>
<td align="center" colspan="1" rowspan="1">1.3484</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">12</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.3952</td>
<td align="center" colspan="1" rowspan="1">0.5859</td>
<td align="center" colspan="1" rowspan="1">0.7109</td>
<td align="center" colspan="1" rowspan="1">0.7782</td>
<td align="center" colspan="1" rowspan="1">0.8081</td>
<td align="center" colspan="1" rowspan="1">0.8368</td>
<td align="center" colspan="1" rowspan="1">0.8528</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">13</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.2535</td>
<td align="center" colspan="1" rowspan="1">0.2764</td>
<td align="center" colspan="1" rowspan="1">0.2913</td>
<td align="center" colspan="1" rowspan="1">0.2995</td>
<td align="center" colspan="1" rowspan="1">0.3034</td>
<td align="center" colspan="1" rowspan="1">0.3081</td>
<td align="center" colspan="1" rowspan="1">0.3119</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.7958</td>
<td align="center" colspan="1" rowspan="1">0.8382</td>
<td align="center" colspan="1" rowspan="1">0.8631</td>
<td align="center" colspan="1" rowspan="1">0.8762</td>
<td align="center" colspan="1" rowspan="1">0.8821</td>
<td align="center" colspan="1" rowspan="1">0.8876</td>
<td align="center" colspan="1" rowspan="1">0.8902</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">1.0672</td>
<td align="center" colspan="1" rowspan="1">1.3031</td>
<td align="center" colspan="1" rowspan="1">1.4939</td>
<td align="center" colspan="1" rowspan="1">1.6150</td>
<td align="center" colspan="1" rowspan="1">1.6748</td>
<td align="center" colspan="1" rowspan="1">1.7413</td>
<td align="center" colspan="1" rowspan="1">1.7762</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">1556</td>
<td align="center" colspan="1" rowspan="1">1708</td>
<td align="center" colspan="1" rowspan="1">1808</td>
<td align="center" colspan="1" rowspan="1">1862</td>
<td align="center" colspan="1" rowspan="1">1888</td>
<td align="center" colspan="1" rowspan="1">1914</td>
<td align="center" colspan="1" rowspan="1">1926</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">17</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.7131</td>
<td align="center" colspan="1" rowspan="1">0.8085</td>
<td align="center" colspan="1" rowspan="1">0.8713</td>
<td align="center" colspan="1" rowspan="1">0.9053</td>
<td align="center" colspan="1" rowspan="1">0.9202</td>
<td align="center" colspan="1" rowspan="1">0.9359</td>
<td align="center" colspan="1" rowspan="1">0.9454</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">4.9359</td>
<td align="center" colspan="1" rowspan="1">5.4552</td>
<td align="center" colspan="1" rowspan="1">5.7908</td>
<td align="center" colspan="1" rowspan="1">5.9795</td>
<td align="center" colspan="1" rowspan="1">6.0650</td>
<td align="center" colspan="1" rowspan="1">6.1590</td>
<td align="center" colspan="1" rowspan="1">6.2287</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.4121</td>
<td align="center" colspan="1" rowspan="1">0.4457</td>
<td align="center" colspan="1" rowspan="1">0.4665</td>
<td align="center" colspan="1" rowspan="1">0.4783</td>
<td align="center" colspan="1" rowspan="1">0.4850</td>
<td align="center" colspan="1" rowspan="1">0.4906</td>
<td align="center" colspan="1" rowspan="1">0.4933</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">21</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.4167</td>
<td align="center" colspan="1" rowspan="1">0.4493</td>
<td align="center" colspan="1" rowspan="1">0.4711</td>
<td align="center" colspan="1" rowspan="1">0.4818</td>
<td align="center" colspan="1" rowspan="1">0.4877</td>
<td align="center" colspan="1" rowspan="1">0.4938</td>
<td align="center" colspan="1" rowspan="1">0.4972</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">22</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">(1)</td>
<td align="center" colspan="1" rowspan="1">0.0032</td>
<td align="center" colspan="1" rowspan="1">0.0040</td>
<td align="center" colspan="1" rowspan="1">0.0049</td>
<td align="center" colspan="1" rowspan="1">0.0053</td>
<td align="center" colspan="1" rowspan="1">0.0058</td>
<td align="center" colspan="1" rowspan="1">0.0062</td>
<td align="center" colspan="1" rowspan="1">0.0064</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tbl11" orientation="portrait" position="anchor">
<label>Table A5.</label>
<caption>
<p>Distances for short-sequences data set. Median of calculated distances for each reference set of the synthetic control data set (sequence length of 300 amino acids, no ASRV). Order of methods and values for
<italic>k</italic>
are as in
<xref rid="tbl11" ref-type="table">Table A1</xref>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">Reference set of ASRV data</th>
</tr>
<tr>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th align="center" colspan="7" rowspan="1">
<hr></hr>
</th>
</tr>
<tr>
<th align="left" colspan="1" rowspan="1">No.</th>
<th align="left" colspan="1" rowspan="1">Method</th>
<th align="center" colspan="1" rowspan="1">
<inline-graphic xlink:href="56-2-206-in2.jpg"></inline-graphic>
</th>
<th align="center" colspan="1" rowspan="1">
<italic>k</italic>
</th>
<th align="center" colspan="1" rowspan="1">1</th>
<th align="center" colspan="1" rowspan="1">2</th>
<th align="center" colspan="1" rowspan="1">3</th>
<th align="center" colspan="1" rowspan="1">4</th>
<th align="center" colspan="1" rowspan="1">5</th>
<th align="center" colspan="1" rowspan="1">6</th>
<th align="center" colspan="1" rowspan="1">7</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="1" rowspan="1">1</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.7512</td>
<td align="center" colspan="1" rowspan="1">1.0948</td>
<td align="center" colspan="1" rowspan="1">1.5985</td>
<td align="center" colspan="1" rowspan="1">2.0736</td>
<td align="center" colspan="1" rowspan="1">2.4174</td>
<td align="center" colspan="1" rowspan="1">2.9717</td>
<td align="center" colspan="1" rowspan="1">3.3904</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">2</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.6917</td>
<td align="center" colspan="1" rowspan="1">2.1449</td>
<td align="center" colspan="1" rowspan="1">2.6389</td>
<td align="center" colspan="1" rowspan="1">2.9273</td>
<td align="center" colspan="1" rowspan="1">3.0302</td>
<td align="center" colspan="1" rowspan="1">3.1279</td>
<td align="center" colspan="1" rowspan="1">3.1704</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">3</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">8.1736</td>
<td align="center" colspan="1" rowspan="1">8.9673</td>
<td align="center" colspan="1" rowspan="1">9.5701</td>
<td align="center" colspan="1" rowspan="1">9.8498</td>
<td align="center" colspan="1" rowspan="1">9.9464</td>
<td align="center" colspan="1" rowspan="1">10.048</td>
<td align="center" colspan="1" rowspan="1">10.076</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">4</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-ML</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.9011</td>
<td align="center" colspan="1" rowspan="1">1.2594</td>
<td align="center" colspan="1" rowspan="1">1.7497</td>
<td align="center" colspan="1" rowspan="1">2.0307</td>
<td align="center" colspan="1" rowspan="1">2.1513</td>
<td align="center" colspan="1" rowspan="1">2.2199</td>
<td align="center" colspan="1" rowspan="1">2.2504</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">5</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>PB-SIM</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">5.8913</td>
<td align="center" colspan="1" rowspan="1">7.2013</td>
<td align="center" colspan="1" rowspan="1">8.3587</td>
<td align="center" colspan="1" rowspan="1">8.8410</td>
<td align="center" colspan="1" rowspan="1">9.0104</td>
<td align="center" colspan="1" rowspan="1">9.1216</td>
<td align="center" colspan="1" rowspan="1">9.1536</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">6</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">0.6658</td>
<td align="center" colspan="1" rowspan="1">0.8564</td>
<td align="center" colspan="1" rowspan="1">0.9346</td>
<td align="center" colspan="1" rowspan="1">0.9541</td>
<td align="center" colspan="1" rowspan="1">0.9627</td>
<td align="center" colspan="1" rowspan="1">0.9695</td>
<td align="center" colspan="1" rowspan="1">0.9700</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">7</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8030</td>
<td align="center" colspan="1" rowspan="1">0.9004</td>
<td align="center" colspan="1" rowspan="1">0.9506</td>
<td align="center" colspan="1" rowspan="1">0.9678</td>
<td align="center" colspan="1" rowspan="1">0.9781</td>
<td align="center" colspan="1" rowspan="1">0.9871</td>
<td align="center" colspan="1" rowspan="1">0.9892</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">8</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">531</td>
<td align="center" colspan="1" rowspan="1">568</td>
<td align="center" colspan="1" rowspan="1">582</td>
<td align="center" colspan="1" rowspan="1">588</td>
<td align="center" colspan="1" rowspan="1">590</td>
<td align="center" colspan="1" rowspan="1">590</td>
<td align="center" colspan="1" rowspan="1">592</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">9</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>P</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.8566</td>
<td align="center" colspan="1" rowspan="1">0.9553</td>
<td align="center" colspan="1" rowspan="1">0.9853</td>
<td align="center" colspan="1" rowspan="1">0.9910</td>
<td align="center" colspan="1" rowspan="1">0.9949</td>
<td align="center" colspan="1" rowspan="1">0.9953</td>
<td align="center" colspan="1" rowspan="1">0.9954</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">10</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">1.5378</td>
<td align="center" colspan="1" rowspan="1">1.8705</td>
<td align="center" colspan="1" rowspan="1">2.0634</td>
<td align="center" colspan="1" rowspan="1">2.1180</td>
<td align="center" colspan="1" rowspan="1">2.1465</td>
<td align="center" colspan="1" rowspan="1">2.1758</td>
<td align="center" colspan="1" rowspan="1">2.1758</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">11</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>E</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">554</td>
<td align="center" colspan="1" rowspan="1">580</td>
<td align="center" colspan="1" rowspan="1">590</td>
<td align="center" colspan="1" rowspan="1">592</td>
<td align="center" colspan="1" rowspan="1">594</td>
<td align="center" colspan="1" rowspan="1">594</td>
<td align="center" colspan="1" rowspan="1">594</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">13</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>F</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">1.7678</td>
<td align="center" colspan="1" rowspan="1">2.0641</td>
<td align="center" colspan="1" rowspan="1">2.2064</td>
<td align="center" colspan="1" rowspan="1">2.2374</td>
<td align="center" colspan="1" rowspan="1">2.2695</td>
<td align="center" colspan="1" rowspan="1">2.2695</td>
<td align="center" colspan="1" rowspan="1">2.2695</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">14</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>ACS</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">1.2130</td>
<td align="center" colspan="1" rowspan="1">1.3537</td>
<td align="center" colspan="1" rowspan="1">1.4238</td>
<td align="center" colspan="1" rowspan="1">1.4475</td>
<td align="center" colspan="1" rowspan="1">1.4597</td>
<td align="center" colspan="1" rowspan="1">1.4713</td>
<td align="center" colspan="1" rowspan="1">1.4728</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">15</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8189</td>
<td align="center" colspan="1" rowspan="1">0.8548</td>
<td align="center" colspan="1" rowspan="1">0.8728</td>
<td align="center" colspan="1" rowspan="1">0.8774</td>
<td align="center" colspan="1" rowspan="1">0.8794</td>
<td align="center" colspan="1" rowspan="1">0.8819</td>
<td align="center" colspan="1" rowspan="1">0.8830</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">16</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">5</td>
<td align="center" colspan="1" rowspan="1">62.255</td>
<td align="center" colspan="1" rowspan="1">68.090</td>
<td align="center" colspan="1" rowspan="1">70.418</td>
<td align="center" colspan="1" rowspan="1">71.976</td>
<td align="center" colspan="1" rowspan="1">72.231</td>
<td align="center" colspan="1" rowspan="1">72.839</td>
<td align="center" colspan="1" rowspan="1">72.479</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">17</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>LZ</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1"></td>
<td align="center" colspan="1" rowspan="1">0.8366</td>
<td align="center" colspan="1" rowspan="1">0.8668</td>
<td align="center" colspan="1" rowspan="1">0.8795</td>
<td align="center" colspan="1" rowspan="1">0.8842</td>
<td align="center" colspan="1" rowspan="1">0.8861</td>
<td align="center" colspan="1" rowspan="1">0.8884</td>
<td align="center" colspan="1" rowspan="1">0.8880</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">19</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>S</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">3.2320</td>
<td align="center" colspan="1" rowspan="1">3.4452</td>
<td align="center" colspan="1" rowspan="1">3.5341</td>
<td align="center" colspan="1" rowspan="1">3.5909</td>
<td align="center" colspan="1" rowspan="1">3.5829</td>
<td align="center" colspan="1" rowspan="1">3.5996</td>
<td align="center" colspan="1" rowspan="1">3.5983</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">20</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">3</td>
<td align="center" colspan="1" rowspan="1">0.4674</td>
<td align="center" colspan="1" rowspan="1">0.4920</td>
<td align="center" colspan="1" rowspan="1">0.5044</td>
<td align="center" colspan="1" rowspan="1">0.5078</td>
<td align="center" colspan="1" rowspan="1">0.5095</td>
<td align="center" colspan="1" rowspan="1">0.5102</td>
<td align="center" colspan="1" rowspan="1">0.5105</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">21</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>C</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">CE</td>
<td align="center" colspan="1" rowspan="1">4</td>
<td align="center" colspan="1" rowspan="1">0.4623</td>
<td align="center" colspan="1" rowspan="1">0.4888</td>
<td align="center" colspan="1" rowspan="1">0.5027</td>
<td align="center" colspan="1" rowspan="1">0.5067</td>
<td align="center" colspan="1" rowspan="1">0.5091</td>
<td align="center" colspan="1" rowspan="1">0.5097</td>
<td align="center" colspan="1" rowspan="1">0.5102</td>
</tr>
<tr>
<td align="left" colspan="1" rowspan="1">22</td>
<td align="left" colspan="1" rowspan="1">
<italic>d</italic>
<sup>
<italic>W</italic>
</sup>
</td>
<td align="center" colspan="1" rowspan="1">AA</td>
<td align="center" colspan="1" rowspan="1">(1)</td>
<td align="center" colspan="1" rowspan="1">0.0154</td>
<td align="center" colspan="1" rowspan="1">0.0202</td>
<td align="center" colspan="1" rowspan="1">0.0245</td>
<td align="center" colspan="1" rowspan="1">0.0273</td>
<td align="center" colspan="1" rowspan="1">0.0296</td>
<td align="center" colspan="1" rowspan="1">0.0325</td>
<td align="center" colspan="1" rowspan="1">0.0334</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</app>
</app-group>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001147  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001147  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021