CyberinfraV1, Pmc, Corpus, bibRecord, 000226

Fast algorithms and heuristics for phylogenomics under ILS and hybridization

Identifieur interne : 000226 ( Pmc/Corpus ); précédent : 000225; suivant : 000227

Fast algorithms and heuristics for phylogenomics under ILS and hybridization

Auteurs : Yun Yu ; Nikola Ristic ; Luay Nakhleh

Source :

BMC Bioinformatics [ 1471-2105 ] ; 2013.

RBID : PMC:3852049

Abstract

Background

Phylogenomic analyses involving whole-genome or multi-locus data often entail dealing with incongruent gene trees. In this paper, we consider two causes of such incongruence, namely, incomplete lineage sorting (ILS) and hybridization, and consider both parsimony and probabilistic criteria for dealing with them.

Results

Under the assumption of ILS, computing the probability of a gene tree given a species tree is a very hard problem. We present a heuristic for speeding up the computation, and demonstrate how it scales up computations to data sizes that are not feasible to analyze using current techniques, while achieving very good accuracy. Further, under the assumption of both ILS and hybridization, computing the probability of a gene tree and parsimoniously reconciling it with a phylogenetic network are both very hard problems. We present two exact algorithms for these two problems that speed up existing techniques significantly and enable analyses of much larger data sets than is currently feasible.

Conclusion

Our heuristics and algorithms enable phylogenomic analyses of larger (in terms of numbers of taxa) data sets than is currently feasible. Further, our methods account for ILS and hybridization, thus allowing analyses of reticulate evolutionary histories.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852049

DOI: 10.1186/1471-2105-14-S15-S6
PubMed: 24564257
PubMed Central: 3852049

Links to Exploration step

PMC:3852049

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Fast algorithms and heuristics for phylogenomics under ILS and hybridization</title>
<author><name sortKey="Yu, Yun" sort="Yu, Yun" uniqKey="Yu Y" first="Yun" last="Yu">Yun Yu</name>
<affiliation><nlm:aff id="I1">Department of Computer Science, Rice University, Houston, TX, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ristic, Nikola" sort="Ristic, Nikola" uniqKey="Ristic N" first="Nikola" last="Ristic">Nikola Ristic</name>
<affiliation><nlm:aff id="I1">Department of Computer Science, Rice University, Houston, TX, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Nakhleh, Luay" sort="Nakhleh, Luay" uniqKey="Nakhleh L" first="Luay" last="Nakhleh">Luay Nakhleh</name>
<affiliation><nlm:aff id="I1">Department of Computer Science, Rice University, Houston, TX, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">24564257</idno>
<idno type="pmc">3852049</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852049</idno>
<idno type="RBID">PMC:3852049</idno>
<idno type="doi">10.1186/1471-2105-14-S15-S6</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000226</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Fast algorithms and heuristics for phylogenomics under ILS and hybridization</title>
<author><name sortKey="Yu, Yun" sort="Yu, Yun" uniqKey="Yu Y" first="Yun" last="Yu">Yun Yu</name>
<affiliation><nlm:aff id="I1">Department of Computer Science, Rice University, Houston, TX, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ristic, Nikola" sort="Ristic, Nikola" uniqKey="Ristic N" first="Nikola" last="Ristic">Nikola Ristic</name>
<affiliation><nlm:aff id="I1">Department of Computer Science, Rice University, Houston, TX, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Nakhleh, Luay" sort="Nakhleh, Luay" uniqKey="Nakhleh L" first="Luay" last="Nakhleh">Luay Nakhleh</name>
<affiliation><nlm:aff id="I1">Department of Computer Science, Rice University, Houston, TX, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Phylogenomic analyses involving whole-genome or multi-locus data often entail dealing with incongruent gene trees. In this paper, we consider two causes of such incongruence, namely, incomplete lineage sorting (ILS) and hybridization, and consider both parsimony and probabilistic criteria for dealing with them.</p>
</sec>
<sec><title>Results</title>
<p>Under the assumption of ILS, computing the probability of a gene tree given a species tree is a very hard problem. We present a heuristic for speeding up the computation, and demonstrate how it scales up computations to data sizes that are not feasible to analyze using current techniques, while achieving very good accuracy. Further, under the assumption of both ILS and hybridization, computing the probability of a gene tree and parsimoniously reconciling it with a phylogenetic network are both very hard problems. We present two exact algorithms for these two problems that speed up existing techniques significantly and enable analyses of much larger data sets than is currently feasible.</p>
</sec>
<sec><title>Conclusion</title>
<p>Our heuristics and algorithms enable phylogenomic analyses of larger (in terms of numbers of taxa) data sets than is currently feasible. Further, our methods account for ILS and hybridization, thus allowing analyses of reticulate evolutionary histories.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Maddison, Wp" uniqKey="Maddison W">WP Maddison</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Syring, J" uniqKey="Syring J">J Syring</name>
</author>
<author><name sortKey="Willyard, A" uniqKey="Willyard A">A Willyard</name>
</author>
<author><name sortKey="Cronn, R" uniqKey="Cronn R">R Cronn</name>
</author>
<author><name sortKey="Liston, A" uniqKey="Liston A">A Liston</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pollard, Da" uniqKey="Pollard D">DA Pollard</name>
</author>
<author><name sortKey="Iyer, Vn" uniqKey="Iyer V">VN Iyer</name>
</author>
<author><name sortKey="Moses, Am" uniqKey="Moses A">AM Moses</name>
</author>
<author><name sortKey="Eisen, Mb" uniqKey="Eisen M">MB Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Than, C" uniqKey="Than C">C Than</name>
</author>
<author><name sortKey="Sugino, R" uniqKey="Sugino R">R Sugino</name>
</author>
<author><name sortKey="Innan, H" uniqKey="Innan H">H Innan</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kuo, Ch" uniqKey="Kuo C">CH Kuo</name>
</author>
<author><name sortKey="Wares, Jp" uniqKey="Wares J">JP Wares</name>
</author>
<author><name sortKey="Kissinger, Jc" uniqKey="Kissinger J">JC Kissinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cranston, Ka" uniqKey="Cranston K">KA Cranston</name>
</author>
<author><name sortKey="Hurwitz, B" uniqKey="Hurwitz B">B Hurwitz</name>
</author>
<author><name sortKey="Ware, D" uniqKey="Ware D">D Ware</name>
</author>
<author><name sortKey="Stein, L" uniqKey="Stein L">L Stein</name>
</author>
<author><name sortKey="Wing, Ra" uniqKey="Wing R">RA Wing</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="White, M" uniqKey="White M">M White</name>
</author>
<author><name sortKey="Ane, C" uniqKey="Ane C">C Ane</name>
</author>
<author><name sortKey="Dewey, C" uniqKey="Dewey C">C Dewey</name>
</author>
<author><name sortKey="Larget, B" uniqKey="Larget B">B Larget</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hobolth, A" uniqKey="Hobolth A">A Hobolth</name>
</author>
<author><name sortKey="Dutheil, J" uniqKey="Dutheil J">J Dutheil</name>
</author>
<author><name sortKey="Hawks, J" uniqKey="Hawks J">J Hawks</name>
</author>
<author><name sortKey="Schierup, M" uniqKey="Schierup M">M Schierup</name>
</author>
<author><name sortKey="Mailund, T" uniqKey="Mailund T">T Mailund</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Takuno, S" uniqKey="Takuno S">S Takuno</name>
</author>
<author><name sortKey="Kado, T" uniqKey="Kado T">T Kado</name>
</author>
<author><name sortKey="Sugino, Rp" uniqKey="Sugino R">RP Sugino</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
<author><name sortKey="Innan, H" uniqKey="Innan H">H Innan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Degnan, J" uniqKey="Degnan J">J Degnan</name>
</author>
<author><name sortKey="Salter, L" uniqKey="Salter L">L Salter</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Staubach, F" uniqKey="Staubach F">F Staubach</name>
</author>
<author><name sortKey="Lorenc, A" uniqKey="Lorenc A">A Lorenc</name>
</author>
<author><name sortKey="Messer, P" uniqKey="Messer P">P Messer</name>
</author>
<author><name sortKey="Tang, K" uniqKey="Tang K">K Tang</name>
</author>
<author><name sortKey="Petrov, D" uniqKey="Petrov D">D Petrov</name>
</author>
<author><name sortKey="Tautz, D" uniqKey="Tautz D">D Tautz</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Moody, M" uniqKey="Moody M">M Moody</name>
</author>
<author><name sortKey="Rieseberg, L" uniqKey="Rieseberg L">L Rieseberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Than, C" uniqKey="Than C">C Than</name>
</author>
<author><name sortKey="Ruths, D" uniqKey="Ruths D">D Ruths</name>
</author>
<author><name sortKey="Innan, H" uniqKey="Innan H">H Innan</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Meng, C" uniqKey="Meng C">C Meng</name>
</author>
<author><name sortKey="Kubatko, Ls" uniqKey="Kubatko L">LS Kubatko</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kubatko, Ls" uniqKey="Kubatko L">LS Kubatko</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Joly, S" uniqKey="Joly S">S Joly</name>
</author>
<author><name sortKey="Mclenachan, Pa" uniqKey="Mclenachan P">PA McLenachan</name>
</author>
<author><name sortKey="Lockhart, Pj" uniqKey="Lockhart P">PJ Lockhart</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yu, Y" uniqKey="Yu Y">Y Yu</name>
</author>
<author><name sortKey="Than, C" uniqKey="Than C">C Than</name>
</author>
<author><name sortKey="Degnan, J" uniqKey="Degnan J">J Degnan</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jones, G" uniqKey="Jones G">G Jones</name>
</author>
<author><name sortKey="Sagitov, S" uniqKey="Sagitov S">S Sagitov</name>
</author>
<author><name sortKey="Oxelman, B" uniqKey="Oxelman B">B Oxelman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yu, Y" uniqKey="Yu Y">Y Yu</name>
</author>
<author><name sortKey="Barnett, R" uniqKey="Barnett R">R Barnett</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yu, Y" uniqKey="Yu Y">Y Yu</name>
</author>
<author><name sortKey="Degnan, J" uniqKey="Degnan J">J Degnan</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Than, C" uniqKey="Than C">C Than</name>
</author>
<author><name sortKey="Ruths, D" uniqKey="Ruths D">D Ruths</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tavare, S" uniqKey="Tavare S">S Tavaré</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Than, C" uniqKey="Than C">C Than</name>
</author>
<author><name sortKey="Nakhleh, L" uniqKey="Nakhleh L">L Nakhleh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Doyon, Jp" uniqKey="Doyon J">JP Doyon</name>
</author>
<author><name sortKey="Hamel, S" uniqKey="Hamel S">S Hamel</name>
</author>
<author><name sortKey="Chauve, C" uniqKey="Chauve C">C Chauve</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rambaut, A" uniqKey="Rambaut A">A Rambaut</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Robinson, D" uniqKey="Robinson D">D Robinson</name>
</author>
<author><name sortKey="Foulds, L" uniqKey="Foulds L">L Foulds</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hudson, Rr" uniqKey="Hudson R">RR Hudson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="abstract" xml:lang="en"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">24564257</article-id>
<article-id pub-id-type="pmc">3852049</article-id>
<article-id pub-id-type="publisher-id">1471-2105-14-S15-S6</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-14-S15-S6</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Proceedings</subject>
</subj-group>
</article-categories>
<title-group><article-title>Fast algorithms and heuristics for phylogenomics under ILS and hybridization</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" corresp="yes" id="A1"><name><surname>Yu</surname>
<given-names>Yun</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>yy9@rice.edu</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A2"><name><surname>Ristic</surname>
<given-names>Nikola</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>nr10@rice.edu</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A3"><name><surname>Nakhleh</surname>
<given-names>Luay</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>nakhleh@rice.edu</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Department of Computer Science, Rice University, Houston, TX, USA</aff>
<pub-date pub-type="collection"><year>2013</year>
</pub-date>
<pub-date pub-type="epub"><day>15</day>
<month>10</month>
<year>2013</year>
</pub-date>
<volume>14</volume>
<issue>Suppl 15</issue>
<supplement><named-content content-type="supplement-title">Proceedings of the Eleventh Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics</named-content>
<named-content content-type="supplement-editor">Macha Nikolski and Yves Van de Peer</named-content>
<named-content content-type="supplement-sponsor">Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. Articles have undergone the journal's standard peer revire process for supplements. The Supplement Editors declare that they have no competing interests.</named-content>
</supplement>
<fpage>S6</fpage>
<lpage>S6</lpage>
<permissions><copyright-statement>Copyright © 2013 Yu et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Yu et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/14/S15/S6"></self-uri>
<abstract><sec><title>Background</title>
<p>Phylogenomic analyses involving whole-genome or multi-locus data often entail dealing with incongruent gene trees. In this paper, we consider two causes of such incongruence, namely, incomplete lineage sorting (ILS) and hybridization, and consider both parsimony and probabilistic criteria for dealing with them.</p>
</sec>
<sec><title>Results</title>
<p>Under the assumption of ILS, computing the probability of a gene tree given a species tree is a very hard problem. We present a heuristic for speeding up the computation, and demonstrate how it scales up computations to data sizes that are not feasible to analyze using current techniques, while achieving very good accuracy. Further, under the assumption of both ILS and hybridization, computing the probability of a gene tree and parsimoniously reconciling it with a phylogenetic network are both very hard problems. We present two exact algorithms for these two problems that speed up existing techniques significantly and enable analyses of much larger data sets than is currently feasible.</p>
</sec>
<sec><title>Conclusion</title>
<p>Our heuristics and algorithms enable phylogenomic analyses of larger (in terms of numbers of taxa) data sets than is currently feasible. Further, our methods account for ILS and hybridization, thus allowing analyses of reticulate evolutionary histories.</p>
</sec>
</abstract>
<conference><conf-date>17-19 October 2013</conf-date>
<conf-name>Eleventh Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics</conf-name>
<conf-loc>Lyon, France</conf-loc>
</conference>
</article-meta>
</front>
<body><sec sec-type="intro"><title>Introduction</title>
<p>The phenomenon of gene tree incongruence arises in phylogenomic studies [<xref ref-type="bibr" rid="B1">1</xref>
]. This incongruence can be caused by many processes, including incomplete lineage sorting (ILS) and hybridization. Recent studies have shown large extents of gene tree incongruence in various groups of organisms due to ILS [<xref ref-type="bibr" rid="B2">2</xref>
-<xref ref-type="bibr" rid="B9">9</xref>
]. Degnan and Salter proposed the first method for computing the probability of a gene tree topology given a species tree using the concept of <italic>coalescent histories </italic>
[<xref ref-type="bibr" rid="B10">10</xref>
]. Later, Wu introduced the concept of <italic>ancestral configurations </italic>
(or AC) to speed up the computation for the same task [<xref ref-type="bibr" rid="B11">11</xref>
], and proposed maximum likelihood method for inference of species tree given a set of gene tree topologies. However, even with the speedup of [<xref ref-type="bibr" rid="B11">11</xref>
], inference in practice remained limited to data sets with small numbers of taxa, mainly due the very slow computation of gene tree probabilities.</p>
<p>Furthermore, simultaneous patterns of hybridization and incomplete lineage sorting have been reported in several studies [<xref ref-type="bibr" rid="B12">12</xref>
-<xref ref-type="bibr" rid="B14">14</xref>
]. Therefore, it is important to enable analyses under both ILS and hybridization. While methods for handling limited cases of reticulation were introduced [<xref ref-type="bibr" rid="B15">15</xref>
-<xref ref-type="bibr" rid="B20">20</xref>
], it was not until most recently that parsimony and probabilistic methods for the general cases were introduced [<xref ref-type="bibr" rid="B21">21</xref>
,<xref ref-type="bibr" rid="B22">22</xref>
]. However, the methods of [<xref ref-type="bibr" rid="B21">21</xref>
,<xref ref-type="bibr" rid="B22">22</xref>
] are computationally intensive and their application is limited to data sets with very few taxa.</p>
<p>The contributions of this paper are threefold. First, we devise a fast heuristic for estimating the probability of a gene tree under ILS. We use the heuristic to infer species trees from multi-locus data, and show that the method achieves significant speedup over the method of [<xref ref-type="bibr" rid="B11">11</xref>
]. Second, we introduce a novel concept of weighted ancestral configurations on phylogenetic networks and devise a much faster exact algorithm than that of [<xref ref-type="bibr" rid="B21">21</xref>
] for reconciling a gene tree with a phylogenetic network. Third, using the same concept of weighted ancestral configurations, we devise a much faster exact algorithm than that of [<xref ref-type="bibr" rid="B22">22</xref>
] for computing the probability of a gene tree under both ILS and hybridization.</p>
<p>We have implemented the methods in the PhyloNet software package [<xref ref-type="bibr" rid="B23">23</xref>
], which is freely available for download in open source at <ext-link ext-link-type="uri" xlink:href="http://bioinfo.cs.rice.edu/phylonet">http://bioinfo.cs.rice.edu/phylonet</ext-link>
. The methods will enable much larger phylogenomic analyses than are currently possible using existing methods.</p>
</sec>
<sec sec-type="methods"><title>Methods</title>
<p>Given a directed graph <italic>g</italic>
, we denote by <italic>V </italic>
(<italic>g</italic>
) and <italic>E</italic>
(<italic>g</italic>
) the node- and edge-sets of <italic>g</italic>
. Given a node <italic>v </italic>
in a graph <italic>g</italic>
, we denote by <italic>d<sup>-</sup>
</italic>
(<italic>v</italic>
), <italic>δ<sup>-</sup>
</italic>
(<italic>v</italic>
), <italic>d</italic>
<sup>+</sup>
(<italic>v</italic>
), <italic>δ</italic>
<sup>+</sup>
(<italic>v</italic>
) the in-degree, in-edges, out-degree, and out-edges, respectively, of node <italic>v</italic>
. In this paper, we consider a special type of rooted, directed, acyclic graphs (rDAG) <italic>g </italic>
= (<italic>V, E</italic>
), where <italic>V </italic>
is partitioned into four sets <italic>V<sub>T </sub>
</italic>
= {<italic>v </italic>
∈ <italic>V </italic>
: <italic>d<sup>-</sup>
</italic>
(<italic>v</italic>
) = 1 Λ <italic>d</italic>
<sup>+</sup>
(<italic>v</italic>
) ≥ 2}, <italic>V<sub>N </sub>
</italic>
= {<italic>v </italic>
∈ <italic>V </italic>
: <italic>d<sup>-</sup>
</italic>
(<italic>v</italic>
) = 2 Λ <italic>d</italic>
<sup>+</sup>
(<italic>v</italic>
) = 1} (reticulation nodes), <italic>V<sub>L </sub>
</italic>
= <italic>{v </italic>
∈ <italic>V </italic>
: <italic>d<sup>-</sup>
</italic>
(<italic>v</italic>
) = 1 Λ <italic>d</italic>
<sup>+</sup>
(<italic>v</italic>
) = 0}, and {<italic>r</italic>
}, where <italic>r</italic>
, the root, is a unique node with <italic>d</italic>
<sup>-</sup>
(<italic>r</italic>
) = 0 and <italic>d</italic>
<sup>+</sup>
(<italic>r</italic>
) ≥ 2. An <inline-formula><mml:math id="M1" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-rDAG is an rDAG whose leaves (the set <italic>V<sub>L</sub>
</italic>
) are bijectively labeled by set <inline-formula><mml:math id="M2" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
. Further, we consider a special type of rooted trees, where a rooted tree is an rDAG with <italic>V<sub>N </sub>
</italic>
= ∅. An <inline-formula><mml:math id="M3" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-tree is a rooted tree whose leaves are bijectively labeled by set <inline-formula><mml:math id="M4" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
.</p>
<sec><title>Networks and trees</title>
<p>Here, we use the following definition of phylogenetic network topologies [<xref ref-type="bibr" rid="B24">24</xref>
].</p>
<p><bold>Definition 1 </bold>
<italic>A </italic>
phylogenetic <inline-formula><mml:math id="M5" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-network, <italic>or phylogenetic network for short, N is a 3-tuple </italic>
(<italic>G, λ, γ</italic>
), <italic>where G </italic>
= (<italic>V, E</italic>
) <italic>is an </italic>
<inline-formula><mml:math id="M6" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-<italic>rDAG</italic>
, λ: <italic>E → </italic>
ℝ<sup>+ </sup>
<italic>is the edge lengths mapping, and γ </italic>
: <italic>E → </italic>
[0, 1] <italic>is the inheritance probability mapping, which satisfies </italic>
<inline-formula><mml:math id="M7" name="1471-2105-14-S15-S6-i1" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mo mathsize="big">∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msup><mml:mrow><mml:mi>δ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">-</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mi>γ</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>for every v </italic>
∈ <italic>V - </italic>
{<italic>r</italic>
}.</p>
<p>A <italic>gene tree </italic>
on set <inline-formula><mml:math id="M8" name="1471-2105-14-S15-S6-i25" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 of alleles (copies of a gene) is a <inline-formula><mml:math id="M9" name="1471-2105-14-S15-S6-i25" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-tree. Since a gene tree is inferred from gene copies that are sampled from a set of species, there is a mapping from the leaves of the gene tree to the leaves of the phylogenetic network.</p>
<p><bold>Definition 2 </bold>
<italic>A species network/gene tree instance is a triplet </italic>
(<italic>N, T, µ</italic>
), <italic>where N is an </italic>
<inline-formula><mml:math id="M10" name="1471-2105-14-S15-S6-i24" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-<italic>network, T is a </italic>
<inline-formula><mml:math id="M11" name="1471-2105-14-S15-S6-i25" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
-<italic>tree, and </italic>
<inline-formula><mml:math id="M12" name="1471-2105-14-S15-S6-i26" overflow="scroll"><mml:mrow><mml:mi>μ</mml:mi>
<mml:mo class="MathClass-rel">:</mml:mo>
<mml:mi mathvariant="script">Y</mml:mi>
<mml:mo class="MathClass-rel">→</mml:mo>
<mml:mi mathvariant="script">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>is a function that maps the gene tree leaf labels to the phylogenetic network leaf labels</italic>
.</p>
<p>Notice that the mapping <italic>µ </italic>
is not necessarily injective (as, biologically, not all species have exactly the same number of copies of a given gene) or surjective (since some species may have zero copies of a given gene). In order not to introduce too many symbols, we will refer to the nodes and edges of a phylogenetic network or a gene tree when we actually mean the nodes and edges of the topologies of these structures.</p>
<p>The way in which a gene evolves within the edges of a phylogenetic network can be described by a <italic>coalescent history </italic>
[<xref ref-type="bibr" rid="B22">22</xref>
]. Let <italic>g </italic>
be an rDAG or a rooted tree. For a node <italic>u </italic>
in <italic>g</italic>
, we denote by <italic>g<sub>u </sub>
</italic>
the set of all nodes that are reachable from node <italic>u</italic>
.</p>
<p><bold>Definition 3 </bold>
<italic>Given a species network/gene tree instance </italic>
(<italic>N, g, µ</italic>
), <italic>a coalescent history is a function h</italic>
: <italic>V </italic>
(<italic>g</italic>
) → <italic>E</italic>
(<italic>N </italic>
) <italic>that satisfies the following two conditions:</italic>
</p>
<p>• <italic>If d</italic>
<sup>+</sup>
(<italic>v</italic>
) = 0 <italic>and µ</italic>
(<italic>v</italic>
) = <italic>w, then h</italic>
(<italic>v</italic>
) = (<italic>u, w</italic>
) <italic>for the edge </italic>
(<italic>u, w</italic>
) ∈ <italic>E</italic>
(<italic>N </italic>
).</p>
<p>• <italic>If v </italic>
∈ <italic>g<sub>u </sub>
for node u </italic>
∈ <italic>V </italic>
(<italic>g</italic>
), <italic>and h</italic>
(<italic>u</italic>
) = (<italic>p, q</italic>
), <italic>then h</italic>
(<italic>v</italic>
) = (<italic>x, y</italic>
) <italic>where x </italic>
∈ <italic>N<sub>q</sub>
</italic>
.</p>
<p>Given a species network/gene tree instance (<italic>N, g, µ</italic>
), we denote by <italic>HN </italic>
(<italic>g</italic>
) the set of all coalescent histories of <italic>g </italic>
given <italic>N</italic>
.</p>
</sec>
<sec><title>Probability and extra lineages</title>
<p>Two central quantities to compute are <italic>P</italic>
(<italic>g|N</italic>
), the probability of observing a gene tree topology <italic>g </italic>
given the phylogenetic network <italic>N</italic>
, and <italic>XL</italic>
(<italic>N, g</italic>
), the minimum number of extra lineages that arise from the optimal reconciliation of <italic>g </italic>
with <italic>N</italic>
. We now define these two quantities. The probability of observing gene tree <italic>g </italic>
given phylogenetic network <italic>N </italic>
is</p>
<p><disp-formula id="bmcM1"><label>(1)</label>
<mml:math id="M13" name="1471-2105-14-S15-S6-i2" overflow="scroll"><mml:mrow><mml:mi>P</mml:mi>
<mml:mfenced close=")" open="("><mml:mrow><mml:mi>g</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo mathsize="small">∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>h</mml:mi>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msub><mml:mrow><mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>N</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>g</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mi>P</mml:mi>
<mml:mfenced close=")" open="("><mml:mrow><mml:mi>h</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where</p>
<p><disp-formula><mml:math id="M14" name="1471-2105-14-S15-S6-i3" overflow="scroll"><mml:mrow><mml:mi>P</mml:mi>
<mml:mfenced close=")" open="("><mml:mrow><mml:mi>h</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>w</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow><mml:mi>d</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo mathsize="small">∏</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:mi>E</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mi>γ</mml:mi>
<mml:msub><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mi>b</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow><mml:msub><mml:mi>p</mml:mi>
<mml:mrow><mml:msub><mml:mi>u</mml:mi>
<mml:mi>b</mml:mi>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>b</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mi>.</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>In this equation, the entities <italic>u<sub>b</sub>
</italic>
(<italic>h</italic>
) and <italic>v<sub>b</sub>
</italic>
(<italic>h</italic>
) for an edge <italic>b </italic>
= (<italic>x, y</italic>
) in the phylogenetic network denote the numbers of gene copies that "enter" edge <italic>b </italic>
from below (the <inline-formula><mml:math id="M15" name="1471-2105-14-S15-S6-i25" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 endpoint) and "exit" edge <italic>b </italic>
from above (the <italic>x </italic>
endpoint). These two entities define the number of coalescent events that occurred on edge <italic>b</italic>
, which equals <italic>r </italic>
= <italic>u<sub>b</sub>
</italic>
(<italic>h</italic>
) - <italic>v<sub>b</sub>
</italic>
(<italic>h</italic>
). The probability of <italic>r </italic>
coalescent events occurring, reducing <italic>u<sub>b</sub>
</italic>
(<italic>h</italic>
) lineages into <italic>v<sub>b</sub>
</italic>
(<italic>h</italic>
) lineages, on an edge whose length is <italic>λ<sub>b</sub>
</italic>
, is given by the quantity <inline-formula><mml:math id="M16" name="1471-2105-14-S15-S6-i4" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:msub><mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
[<xref ref-type="bibr" rid="B25">25</xref>
]. The quantities <italic>w<sub>b</sub>
</italic>
(<italic>h</italic>
)<italic>/d<sub>b</sub>
</italic>
(<italic>h</italic>
) is the proportion of all <italic>r </italic>
coalescent scenarios that are consistent with the gene tree (not every scenario of <italic>r </italic>
coalescent events will results in a topology that agrees with <italic>g</italic>
) [<xref ref-type="bibr" rid="B10">10</xref>
]. This quantity without the <italic>b </italic>
subscript corresponds to the root of <italic>N </italic>
(where there is no explicit edge incoming into it).</p>
<p>Coalescent histories can also be used to compute the minimum number of extra lineages required to reconcile gene tree <italic>g </italic>
with <italic>N</italic>
. Given a coalescent history <italic>h </italic>
: <italic>V </italic>
(<italic>g</italic>
) → <italic>E</italic>
(<italic>N </italic>
), the number of extra lineages arising from <italic>h </italic>
on an edge <italic>b </italic>
= (<italic>w, y</italic>
) is <italic>XL</italic>
(<italic>b, h</italic>
) = max{<italic>vb</italic>
(<italic>h</italic>
) - 1, 0}. The number of extra lineages arising from <italic>h </italic>
on the entire network <italic>N</italic>
, denoted by <italic>XL</italic>
(<italic>N, h</italic>
), is <inline-formula><mml:math id="M17" name="1471-2105-14-S15-S6-i5" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mo mathsize="big">∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:mi>E</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>b</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
. For a gene tree <italic>g</italic>
, the minimum number of extra lineages arising from reconciling it within the edges of phylogenetic network <italic>N</italic>
, denoted by <italic>XL</italic>
(<italic>N, g</italic>
), is</p>
<p><disp-formula id="bmcM2"><label>(2)</label>
<mml:math id="M18" name="1471-2105-14-S15-S6-i6" overflow="scroll"><mml:mrow><mml:mi>X</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>N</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:munder class="msub"><mml:mrow><mml:mstyle class="text"><mml:mtext class="textsf" mathvariant="sans-serif">min</mml:mtext>
</mml:mstyle>
</mml:mrow>
<mml:mrow><mml:mi>h</mml:mi>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msub><mml:mrow><mml:mi>H</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>N</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>g</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:mi>X</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>N</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mi>.</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Notice that for computing extra lineages, the <italic>λ </italic>
and <italic>γ </italic>
parameters are not used. This is a <italic>maximum parsimony </italic>
criterion for explaining the gene tree <italic>g </italic>
given the phylogenetic network <italic>N</italic>
. Different from Eq. (1), <italic>N </italic>
here represents the topology of the network only.</p>
<p>When the phylogenetic network <italic>N </italic>
is a tree (that is, <italic>V<sub>N </sub>
</italic>
= ∅ and <italic>γ</italic>
(<italic>b</italic>
) = 1, for all <italic>b </italic>
∈ <italic>E</italic>
(<italic>N </italic>
)), algorithms exist for computing <italic>P</italic>
(<italic>g|N</italic>
) [<xref ref-type="bibr" rid="B10">10</xref>
,<xref ref-type="bibr" rid="B11">11</xref>
] and <italic>XL</italic>
(<italic>N, g</italic>
) [<xref ref-type="bibr" rid="B1">1</xref>
,<xref ref-type="bibr" rid="B26">26</xref>
]. More recently, we proposed new methods for computing these two quantities when <italic>N </italic>
is a phylogenetic network (<italic>V<sub>N </sub>
≠ </italic>
∅), based on a technique that converts the network into a multi-labeled tree (MUL-tree) [<xref ref-type="bibr" rid="B21">21</xref>
,<xref ref-type="bibr" rid="B22">22</xref>
]. However, these techniques become prohibitive even for data sets with 15 to 20 taxa and even without hybridization being involved [<xref ref-type="bibr" rid="B11">11</xref>
]. The contribution of this paper, which we present in detail next, is a heuristic that significantly reduces time to approximate <italic>P </italic>
(<italic>g|N</italic>
), when N is a tree as well as two novel strategies to speed up the algorithms of [<xref ref-type="bibr" rid="B21">21</xref>
,<xref ref-type="bibr" rid="B22">22</xref>
] significantly.</p>
</sec>
<sec><title>Speeding up probability computation under ILS</title>
<p>From Eq. (1) we can see that the probability of observing gene tree <italic>g </italic>
given species tree <italic>N </italic>
equals to the sum of all coalescent histories. Number of coalescent histories increases rapidly with the increase of number of taxa, but not all of the histories contribute equally, or even significantly to the sum. We propose the heuristic approach in which the target probability is approximated by summing over a subset of coalescent histories which carry the most weight. In order to do so we first compute limiting coalescent history (<italic>LCH</italic>
), which is one distinct, easy to compute, coalescent history and will be used to bound the search space. If <italic>v </italic>
is an internal node of species tree <italic>N</italic>
, let <italic>b<sub>v </sub>
</italic>
represent the branch that connects node <italic>v </italic>
to its parent with length <inline-formula><mml:math id="M19" name="1471-2105-14-S15-S6-i7" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 Let <italic>numL<sub>v </sub>
</italic>
represent the number of lineages entering branch <italic>b<sub>v </sub>
</italic>
and <italic>minL<sub>v </sub>
</italic>
the minimal number of lineages exiting that same branch (if all coalescent events that are permitted by topologies of gene and species tree happen on this branch). Initially all lineages coalesce prior to the root and <italic>numL<sub>v </sub>
</italic>
for all internal nodes <italic>v </italic>
of the species tree is set accordingly. Coalescent history is represented by a vector <italic>C </italic>
= (<italic>c</italic>
<sub>1</sub>
, <italic>c</italic>
<sub>2</sub>
, ...c<italic><sub>n-2</sub>
</italic>
). Each element of <italic>C </italic>
corresponds to an internal node (clade) of the gene tree, and the appropriate value represents the node <italic>v </italic>
in species tree on whose branch <italic>b<sub>v </sub>
</italic>
the clade coalesces. <italic>LCH </italic>
is just one distinct coalescent history. We define a function called <bold>ComputeLCH </bold>
which takes a species tree <italic>N </italic>
and gene tree <italic>g</italic>
, and returns <italic>LCH </italic>
for these two trees.</p>
<p>Algorithm 1: ComputeLCH.</p>
<p><bold>Input</bold>
: Phylogenetic tree <italic>N </italic>
with branch lengths, gene tree <italic>g</italic>
</p>
<p><bold>Output</bold>
: <italic>LCH(N, g)</italic>
</p>
<p><bold>foreach </bold>
<italic>internal node v </italic>
∈ <italic>N, with parent w in post-order traversal </italic>
<bold>do</bold>
</p>
<p>  <bold>if </bold>
<italic>minL<sub>v </sub>
≠ </italic>
<italic>numL<sub>v </sub>
</italic>
<bold>then</bold>
</p>
<p>    Find <italic>x, minL<sub>v </sub>
</italic>
≤ <italic>x </italic>
≤ <italic>numL<sub>v</sub>
</italic>
, that maximizes <inline-formula><mml:math id="M20" name="1471-2105-14-S15-S6-i8" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>n</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>m</mml:mi>
<mml:msub><mml:mrow><mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>    Pick <italic>r </italic>
= <italic>numL<sub>v </sub>
- x </italic>
coalescent events on branch <italic>b<sub>v </sub>
</italic>
that agree with gene and species tree topologies (ties are broken randomly); Update <italic>LCH </italic>
accordingly; <italic>numL<sub>w </sub>
← numL<sub>w </sub>
- r</italic>
;</p>
<p><bold>return </bold>
<italic>LCH</italic>
;</p>
<p>Once we have <italic>LCH</italic>
, we will approximate the probability of <italic>g </italic>
by summing over all coalescent histories that are "under" <italic>LCH</italic>
. Coalescent history <italic>H </italic>
= (<italic>h</italic>
1, <italic>h</italic>
2, ...<italic>h<sub>n-2</sub>
</italic>
) is "under" <italic>LCH </italic>
= (<italic>l</italic>
<sub>1</sub>
, <italic>l</italic>
<sub>2</sub>
, ..., <italic>l<sub>n-2</sub>
</italic>
) if for each <italic>i </italic>
∈ {1, 2, ..., <italic>n - </italic>
2}, <italic>h<sub>i </sub>
</italic>
is ether equal to <italic>l<sub>i </sub>
</italic>
or is a descendant of <italic>l<sub>i</sub>
</italic>
. Similar approach in duplication/loss model was suggested in [<xref ref-type="bibr" rid="B27">27</xref>
].</p>
<p>We do not address the parsimony approach, since it is trivial when we work with trees. It consists of computing least common ancestor (<italic>LCA</italic>
) mapping of all pairs of nodes within a gene tree on a species tree.</p>
</sec>
<sec><title>Ancestral configurations on networks</title>
<p>Central to our methods is the concept of weighted <italic>ancestral configuration </italic>
(AC, or simply configuration). When its unweighted version was first introduced, it was defined on species trees for computing the probability of gene tree topologies [<xref ref-type="bibr" rid="B11">11</xref>
]. However, the concept is extended significantly here to apply to networks.</p>
<p>Given a species network <italic>N </italic>
with <italic>q </italic>
= <italic>|V<sub>N </sub>
| </italic>
reticulation nodes numbered 1, 2, ..., <italic>q </italic>
and a gene tree <italic>g </italic>
on set <inline-formula><mml:math id="M21" name="1471-2105-14-S15-S6-i25" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 of alleles, an ancestral configuration can be associated with a node <italic>v </italic>
of <italic>N</italic>
, denoted by <italic>AC<sub>v</sub>
</italic>
, or an edge <italic>e </italic>
of <italic>N</italic>
, denoted by <italic>AC<sub>e</sub>
</italic>
, and is an element of the set <inline-formula><mml:math id="M22" name="1471-2105-14-S15-S6-i27" overflow="scroll"><mml:mrow><mml:msup><mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow><mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-bin">×</mml:mo>
<mml:msup><mml:mrow><mml:mi>ℤ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>q</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-bin">×</mml:mo>
<mml:mi>ℝ</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 where the first element is the set of all subsets of alleles in <inline-formula><mml:math id="M23" name="1471-2105-14-S15-S6-i25" overflow="scroll"><mml:mrow><mml:mtext> </mml:mtext>
<mml:mi mathvariant="script">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
, the second is the set of all vectors of integers of size <italic>q</italic>
, and the third element is the set of real numbers. When the context is clear, we omit the subscript. For an AC (<italic>B, a, w</italic>
), the interpretation is as follows:</p>
<p>• <italic>B </italic>
⊆ <italic>A</italic>
: a set of lineages that exist at the point (node or edge) with which the AC is associated.</p>
<p>• <italic>a</italic>
[<italic>i</italic>
], for 1 ≤ <italic>i </italic>
≤ <italic>q</italic>
: an index for the AC split that occurred at reticulation node <italic>i </italic>
and gave rise to <italic>B</italic>
.</p>
<p>• <italic>w</italic>
: a weight of the AC; we discuss below how to set/use this entry.</p>
<p>Given two ACs, <italic>AC</italic>
<sub>1 </sub>
= (<italic>B</italic>
<sub>1</sub>
, <italic>a</italic>
<sub>1</sub>
, <italic>w</italic>
<sub>1</sub>
) and <italic>AC</italic>
<sub>2 </sub>
= (<italic>B</italic>
<sub>2</sub>
, <italic>a</italic>
<sub>2</sub>
, <italic>w</italic>
<sub>2</sub>
), we say that <italic>AC</italic>
<sub>1 </sub>
and <italic>AC</italic>
<sub>2 </sub>
are <italic>compatible </italic>
if for each <italic>i</italic>
, 1 ≤ <italic>i </italic>
≤ <italic>q</italic>
, either <italic>a</italic>
<sub>1</sub>
[<italic>i</italic>
] = <italic>a</italic>
<sub>2</sub>
[<italic>i</italic>
] or <italic>a</italic>
<sub>1</sub>
[<italic>i</italic>
] <italic>· a</italic>
<sub>2</sub>
[<italic>i</italic>
] = 0; otherwise, the two ACs are <italic>incompatible</italic>
. Further, if <italic>B</italic>
<sub>1 </sub>
= <italic>B</italic>
<sub>2 </sub>
and <italic>a</italic>
<sub>1 </sub>
= <italic>a</italic>
<sub>2</sub>
, we say that the two ACs are <italic>identical</italic>
.</p>
<p>Ancestral configurations are computed in a bottom-up fashion by algorithms below. Two major operations that occur as the algorithms proceed bottom-up are:</p>
<p>• Splitting an AC whenever a reticulation node is encountered. Let (<italic>B, a, w</italic>
) be an AC on the edge incident out of reticulation node <italic>k</italic>
. Further, assume that for each reticulation node <italic>i </italic>
(1 ≤ <italic>i </italic>
≤ <italic>q</italic>
), we have a counter <italic>o<sub>i</sub>
</italic>
, that is initialized to 0 at the start of an algorithm. Splitting (<italic>B, a, w</italic>
) at node <italic>k </italic>
results in two ACs <italic>AC</italic>
<sub>1 </sub>
= (<italic>B</italic>
<sub>1</sub>
, <italic>a</italic>
<sub>1</sub>
, <italic>w</italic>
<sub>1</sub>
) and <italic>AC</italic>
<sub>2</sub>
(<italic>B</italic>
<sub>2</sub>
, <italic>a<sub>2</sub>
, w</italic>
<sub>2</sub>
), each associated with one of the two reticulation edges, such that <italic>B</italic>
<sub>1 </sub>
∪ <italic>B</italic>
<sub>2 </sub>
= <italic>B</italic>
, <italic>B</italic>
<sub>1 </sub>
∩ <italic>B</italic>
<sub>2 </sub>
= ∅, <italic>a</italic>
<sub>1</sub>
[<italic>k</italic>
] = <italic>a</italic>
<sub>2</sub>
[<italic>k</italic>
] = <italic>o<sub>k </sub>
</italic>
+ 1, and <italic>o<sub>k </sub>
</italic>
is incremented by 1. For the weights, <italic>w</italic>
<sub>1 </sub>
= <italic>w </italic>
and <italic>w</italic>
<sub>2 </sub>
= 0 if the algorithm used is <bold>CountXL </bold>
below, and <italic>w</italic>
<sub>1 </sub>
= <italic>w </italic>
and <italic>w</italic>
<sub>2 </sub>
= 1 if the algorithm used is <bold>CalProb </bold>
below.</p>
<p>• Merging two ACs whenever an internal tree node is encountered. Let (<italic>B</italic>
<sub>1</sub>
, <italic>a</italic>
<sub>1</sub>
, <italic>w</italic>
<sub>1</sub>
) and (<italic>B</italic>
<sub>2</sub>
, <italic>a</italic>
<sub>2</sub>
, <italic>w</italic>
<sub>2</sub>
) be two compatible ACs associated with the edges incident from a tree node <italic>u</italic>
. Then, these two ACs are merged into one AC (<italic>B, a, w</italic>
) at node <italic>u </italic>
where <italic>B </italic>
= <italic>B</italic>
<sub>1 </sub>
∪ <italic>B</italic>
<sub>2 </sub>
and <italic>a</italic>
[<italic>i</italic>
] = max{<italic>a</italic>
<sub>1</sub>
[<italic>i</italic>
], <italic>a</italic>
<sub>2</sub>
[<italic>i</italic>
]} for all 1 ≤ <italic>i </italic>
≤ <italic>q</italic>
. For the weights, <italic>w </italic>
= <italic>w</italic>
<sub>1 </sub>
+ <italic>w</italic>
<sub>2 </sub>
if the algorithm used is <bold>CountXL </bold>
below, and <italic>w </italic>
= <italic>w</italic>
<sub>1 </sub>
<italic>· w</italic>
<sub>2 </sub>
if the algorithm used is <bold>CalProb </bold>
below.</p>
<p>For <italic>AC </italic>
= (<italic>B, a, w</italic>
) we denote by <italic>n</italic>
(<italic>AC</italic>
) the quantity <italic>|B|</italic>
. We denote by <inline-formula><mml:math id="M24" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 the set of ACs associated with a node or edge. When <inline-formula><mml:math id="M25" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 is associated with an edge, it denotes the set of ACs that result after all coalescence events took place on the edge.</p>
<p>Assume <italic>m </italic>
and <italic>n </italic>
are two gene lineages that meet at some node in a gene tree <italic>g</italic>
. When reconciling <italic>g </italic>
within the edges of a species network <italic>N </italic>
, after the two entered the same edge of <italic>N </italic>
, they might or might not have coalesced before leaving that edge, the probability of which depends on the length (in terms of time) and width (in terms of population size) of that edge. Therefore, one configuration entering a edge of <italic>N </italic>
might give rise to several different configurations leaving that edge with different probabilities. We denote by <italic>Coal</italic>
(<italic>B, g</italic>
), for a set <italic>B </italic>
of lineages and gene tree <italic>g</italic>
, the set of all sets of lineages that <italic>B </italic>
could coalesce into with respect to the topology of <italic>g</italic>
. Ancestral configurations provide a compact representation of coalescent histories, thus allowing for efficient computing: while redundant parts that appear in different coalescent histories must be computed explicitly every time they are encountered, particularly over the different allele mappings employed in the approaches of [<xref ref-type="bibr" rid="B21">21</xref>
,<xref ref-type="bibr" rid="B22">22</xref>
], using ancestral configurations ameliorates this by computing the values only once for each ancestral configuration. Further, when these computations are coupled with network space search, local perturbations to candidate networks necessitate new computations to only a small number of ancestral configurations. We now show how to use configurations to compute <italic>P </italic>
(<italic>g|N</italic>
) and <italic>XL</italic>
(<italic>N, g</italic>
) efficiently.</p>
</sec>
<sec><title>Counting the number of extra lineages under ILS and hybridization</title>
<p>For a configuration <italic>AC</italic>
, we denote by <italic>xl</italic>
(<italic>AC</italic>
) the minimum number of extra lineages arising from coalescing the extant gene lineages in <italic>AC </italic>
to the present gene lineages in <italic>AC</italic>
. In this method, weight <italic>w </italic>
in (<italic>B, a, w</italic>
) ∈ <inline-formula><mml:math id="M26" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 corresponds to <italic>xl</italic>
(<italic>AC</italic>
), where <inline-formula><mml:math id="M27" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 is either <inline-formula><mml:math id="M28" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 where <italic>v </italic>
is a node or <inline-formula><mml:math id="M29" name="1471-2105-14-S15-S6-i30" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 where <italic>b </italic>
is a edge.</p>
<p><bold>Observation 1 </bold>
<italic>Let AC </italic>
= (<italic>B, a, w</italic>
) <italic>be a configuration entering a edge b and AC</italic>
<sup>+ </sup>
= (<italic>B</italic>
<sup>+</sup>
, <italic>a</italic>
<sup>+</sup>
, <italic>w</italic>
<sup>+</sup>
) <italic>be a configuration that AC coalesced into before leaving b. Then w</italic>
<sup>+ </sup>
= <italic>w</italic>
<sup>+ </sup>
max{<italic>n</italic>
(<italic>AC</italic>
<sup>+</sup>
) - 1, 0}, <italic>where n</italic>
(<italic>AC</italic>
<sup>+</sup>
) <italic>is the number of lineages on edge b</italic>
.</p>
<p>We define a function called <bold>CreateCACsForXL </bold>
which takes a gene tree <italic>g</italic>
, an edge <italic>b </italic>
= (<italic>u, v</italic>
) of the network <italic>N </italic>
and a set of ACs <inline-formula><mml:math id="M30" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 that enter edge <italic>b</italic>
, and returns a set of ACs <inline-formula><mml:math id="M31" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 that exit edge <italic>b</italic>
.</p>
<p>Note that although one configuration can coalesce into several different configurations along an edge, under parsimony we only need to keep the one that has the minimum total number of extra lineages. Therefore</p>
<p>Algorithm 2: CreateCACsForXL.</p>
<p><bold>Input</bold>
: Gene tree <italic>g</italic>
, edge <italic>b </italic>
= (<italic>u, v</italic>
), set of ACs <inline-formula><mml:math id="M32" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p><bold>Output</bold>
: A set of ACs <inline-formula><mml:math id="M33" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p><bold>foreach </bold>
(<italic>B, a, w</italic>
) ∈ <inline-formula><mml:math id="M34" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
</p>
<p>  <inline-formula><mml:math id="M35" name="1471-2105-14-S15-S6-i9" overflow="scroll"><mml:mrow><mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:mstyle class="text"><mml:mtext class="textsf" mathvariant="sans-serif">argmi</mml:mtext>
</mml:mstyle>
<mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext class="textsf" mathvariant="sans-serif">n</mml:mtext>
</mml:mstyle>
</mml:mrow>
<mml:mrow><mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mo class="MathClass-punc">;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p>  Compute <italic>w</italic>
<sup>+ </sup>
using Rule 1;</p>
<p>  <inline-formula><mml:math id="M36" name="1471-2105-14-S15-S6-i32" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-bin">∪</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mi>a</mml:mi>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:msup><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p><bold>return </bold>
<inline-formula><mml:math id="M37" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p><inline-formula><mml:math id="M38" name="1471-2105-14-S15-S6-i33" overflow="scroll"><mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
 and there is 1-1 correspondence between configurations in <inline-formula><mml:math id="M39" name="1471-2105-14-S15-S6-i34" overflow="scroll"><mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
 and configurations in <inline-formula><mml:math id="M40" name="1471-2105-14-S15-S6-i35" overflow="scroll"><mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
.</p>
<p>For a phylogenetic network <italic>N </italic>
and a gene tree <italic>g</italic>
, the algorithm for computing the minimum number of extra lineages required to reconcile <italic>g </italic>
within <italic>N </italic>
is shown in Alg. 3. Basically, we traverse the nodes of the network in post-order. For every node <italic>v </italic>
we visit, we construct the set of ACs <inline-formula><mml:math id="M41" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 for node <italic>v </italic>
based on its type. Recall that there are four types of nodes in a phylogenetic network, which are leaves, reticulation nodes, internal tree nodes, and the root. Finally when we arrive at the root of <italic>N</italic>
, we are able to obtain <italic>XL</italic>
(<italic>N, g</italic>
).</p>
<p>Algorithm 3: CountXL.</p>
<p><bold>Input</bold>
: Phylogenetic network <italic>N </italic>
with <italic>q </italic>
reticulation nodes, gene tree <italic>g</italic>
</p>
<p><bold>Output</bold>
: <italic>XL</italic>
(<italic>N, g</italic>
)</p>
<p><bold>while </bold>
<italic>traversing the nodes of N in post-order </italic>
<bold>do</bold>
</p>
<p><bold>if </bold>
<italic>node v is a leaf, who has parent u </italic>
<bold>then</bold>
</p>
<p>  <inline-formula><mml:math id="M42" name="1471-2105-14-S15-S6-i36" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">{</mml:mo>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mi>a</mml:mi>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo class="MathClass-close">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
 where <italic>B </italic>
is the set of leaves in <italic>g </italic>
that are sampled from the species associated with <italic>v </italic>
and <italic>a </italic>
is a</p>
<p>  vector of <italic>q </italic>
0's;</p>
<p>  <inline-formula><mml:math id="M43" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 ← CreateCACsForXL(<italic>g</italic>
, (<italic>u, v</italic>
), <inline-formula><mml:math id="M44" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
);</p>
<p><bold>else if </bold>
<italic>node v is a reticulation node, who has child w, and two parents u</italic>
<sub>1 </sub>
<italic>and u</italic>
<sub>2 </sub>
<bold>then</bold>
</p>
<p>  <inline-formula><mml:math id="M45" name="1471-2105-14-S15-S6-i37" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>  <bold>foreach </bold>
<italic>AC </italic>
∈ <inline-formula><mml:math id="M46" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
</p>
<p>    Split <italic>AC </italic>
in every possible way into pairs of ACs, and for each pair, add one AC to <inline-formula><mml:math id="M47" name="1471-2105-14-S15-S6-i10" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 and the other AC to <inline-formula><mml:math id="M48" name="1471-2105-14-S15-S6-i11" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p><bold>else if </bold>
<italic>node v is an internal tree node, who has two children w</italic>
<sub>1 </sub>
<italic>and w</italic>
<sub>2 </sub>
<bold>then</bold>
</p>
<p>  <bold>foreach </bold>
<italic>pair </italic>
(<italic>AC</italic>
<sub>1</sub>
, <italic>AC</italic>
<sub>2 </sub>
) <italic>of compatible ACs in </italic>
<inline-formula><mml:math id="M49" name="1471-2105-14-S15-S6-i38" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msub><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-bin">×</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msub><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
</p>
<p>    Merge <italic>AC</italic>
<sub>1 </sub>
and <italic>AC</italic>
<sub>2 </sub>
and add the resulting AC to <inline-formula><mml:math id="M50" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>  <bold>if </bold>
<italic>node v is an internal tree node, who has a parent u </italic>
<bold>then</bold>
</p>
<p>    <inline-formula><mml:math id="M51" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 ← CreateCACsForXL(<italic>g</italic>
, (<italic>u, v</italic>
), <inline-formula><mml:math id="M52" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
);</p>
<p>  <bold>else</bold>
</p>
<p>    <bold>return </bold>
<inline-formula><mml:math id="M53" name="1471-2105-14-S15-S6-i12" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mtext>min</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mi>x</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
</sec>
<sec><title>Calculating gene tree probability under ILS and hybridization</title>
<p>For a configuration <italic>AC</italic>
, we denote by <italic>p</italic>
(<italic>AC</italic>
) the cumulative probability of the extant gene lineages in <italic>AC </italic>
coalescing into the present gene lineages in <italic>AC </italic>
from time 0. In this method, weight <italic>w </italic>
in <italic>AC </italic>
= (<italic>B, a, w</italic>
) ∈ <inline-formula><mml:math id="M54" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 corresponds to <italic>p</italic>
(<italic>AC</italic>
), where <inline-formula><mml:math id="M55" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
 is either <inline-formula><mml:math id="M56" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 where <italic>v </italic>
is a node or <inline-formula><mml:math id="M57" name="1471-2105-14-S15-S6-i30" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 where <italic>b </italic>
is an edge.</p>
<p><bold>Lemma 1 </bold>
<italic>Let B be a set of gene lineages entering branch b of network N with branch length λ<sub>b</sub>
. Then the probability of observing a set of gene lineages B</italic>
<sup>+ </sup>
<italic>leaving branch b is</italic>
</p>
<p><disp-formula id="bmcM3"><label>(3)</label>
<mml:math id="M58" name="1471-2105-14-S15-S6-i13" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="tmspace" width="2.77695pt"></mml:mspace>
<mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="tmspace" width="2.77695pt"></mml:mspace>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:msub><mml:mrow><mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p><italic>where </italic>
<inline-formula><mml:math id="M59" name="1471-2105-14-S15-S6-i39" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">|</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>is the probability that |B| gene lineages coalesce into |B</italic>
<sup>+</sup>
<italic>| gene lineages within time λ<sub>b</sub>
, wb</italic>
(<italic>B, B</italic>
<sup>+</sup>
) <italic>is the number of ways that coalescent events can occur along edge b to coalesce B into B</italic>
<sup>+ </sup>
<italic>with respect to the gene tree topology, and db</italic>
(<italic>B, B</italic>
<sup>+</sup>
) <italic>is the number of all possible orderings of |B|-|B</italic>
<sup>+</sup>
<italic>| coalescent events</italic>
.</p>
<p><bold>Observation 2 </bold>
<italic>Let AC </italic>
= (<italic>B, a, w</italic>
) <italic>be a configuration entering an edge b and AC</italic>
<sup>+ </sup>
= (<italic>B</italic>
<sup>+</sup>
, <italic>a</italic>
<sup>+</sup>
, <italic>w</italic>
<sup>+</sup>
) <italic>be a configuration that AC coalesced into when leaving b. Then w</italic>
<sup>+ </sup>
= <italic>w · p<sub>t</sub>
</italic>
(<italic>B, B</italic>
<sup>+</sup>
, <italic>b</italic>
).</p>
<p>We define a function called <bold>CreateCACsForProb </bold>
which takes a gene tree <italic>g</italic>
, an edge <italic>b </italic>
= (<italic>u, v</italic>
) of the network <italic>N </italic>
and a set of ACs <inline-formula><mml:math id="M60" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 that enter edge <italic>b</italic>
, and returns a set of all possible ACs <inline-formula><mml:math id="M61" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 that exit edge <italic>b</italic>
. The algorithm for calculating the probability of observing a gene tree <italic>g </italic>
given a species network <italic>N</italic>
</p>
<p>Algorithm 4: CreateCACsForProb.</p>
<p><bold>Input</bold>
: Gene tree <italic>g</italic>
, an edge <italic>b </italic>
= (<italic>u, v</italic>
), a set of ACs <inline-formula><mml:math id="M62" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p><bold>Output</bold>
: A set of ACs <inline-formula><mml:math id="M63" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p><bold>foreach </bold>
(<italic>B, a, w</italic>
) ∈ <inline-formula><mml:math id="M64" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
</p>
<p>  <bold>foreach </bold>
<italic>B</italic>
<sup>+ </sup>
∈ <italic>Coal</italic>
(<italic>B, g</italic>
) <bold>do</bold>
</p>
<p>    Compute <italic>w</italic>
<sup>+ </sup>
using Rule 2;</p>
<p>    <bold>if </bold>
∃(<italic>B</italic>
', <italic>a</italic>
', <italic>w</italic>
') ∈ <inline-formula><mml:math id="M65" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>where B</italic>
' = <italic>B</italic>
<sup>+ </sup>
<italic>and a</italic>
' = <italic>a </italic>
<bold>then</bold>
</p>
<p>      <italic>w</italic>
' ← <italic>w</italic>
' + <italic>w</italic>
<sup>+</sup>
;</p>
<p>    <bold>else</bold>
</p>
<p>      <inline-formula><mml:math id="M66" name="1471-2105-14-S15-S6-i32" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-bin">∪</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msup><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mi>a</mml:mi>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:msup><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p><bold>return </bold>
<inline-formula><mml:math id="M67" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>is shown in Alg. 5. The basic idea is similar to the parsimony method we described in the previous section. It is important to note that the running time of the algorithms can be exponential for some data sets, as the complexity of both problems is open and conjectured to be NP-hard.</p>
</sec>
<sec><title>Reducing the number of configurations</title>
<p>At every reticulation node <italic>v </italic>
in the species network, every configuration <italic>AC </italic>
in <inline-formula><mml:math id="M68" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 is split in all 2<sup><italic>n</italic>
(<italic>AC</italic>
) </sup>
possible ways. This may result in multiple configurations which contain the same set of gene lineages but are all distinct because of different vector values in some <inline-formula><mml:math id="M69" name="1471-2105-14-S15-S6-i28" overflow="scroll"><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
. It is clear that the running time (and memory usage) of both these two algorithms depends on the number of configurations. Therefore, in order to reduce the number of configurations so as to speed up the computation, we make use of <italic>articulation</italic>
</p>
<p>Algorithm 5: CalProb.</p>
<p><bold>Input</bold>
: Phylogenetic network <italic>N </italic>
including topology, edge lengths and inheritance probabilities, gene tree <italic>g</italic>
</p>
<p><bold>Output</bold>
: <italic>P </italic>
(<italic>g|N</italic>
)</p>
<p><bold>while </bold>
<italic>traversing the nodes of N in post-order </italic>
<bold>do</bold>
</p>
<p>  <bold>if </bold>
<italic>node v is a leaf, whose parent is u </italic>
<bold>then</bold>
</p>
<p>    <inline-formula><mml:math id="M70" name="1471-2105-14-S15-S6-i40" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">{</mml:mo>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mi>a</mml:mi>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo class="MathClass-close">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
 where <italic>B </italic>
is the set of leaves in <italic>g </italic>
sampled from the species associated with <italic>v </italic>
and <italic>a </italic>
is a vector of <italic>q </italic>
0's;</p>
<p>    <inline-formula><mml:math id="M71" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 ← CreateCACsForProb(<italic>g</italic>
, (<italic>u, v</italic>
), <inline-formula><mml:math id="M72" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
);</p>
<p>  <bold>else if </bold>
<italic>node v is a reticulation node, who has child w, and two parents u</italic>
<sub>1 </sub>
<italic>and u</italic>
<sub>2 </sub>
<bold>then</bold>
</p>
<p>    <inline-formula><mml:math id="M73" name="1471-2105-14-S15-S6-i37" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>    <italic>S</italic>
<sub>1 </sub>
← ∅;</p>
<p>    <italic>S</italic>
<sub>2 </sub>
← ∅;</p>
<p>    <bold>foreach </bold>
<italic>AC </italic>
∈ <inline-formula><mml:math id="M74" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
</p>
<p>      Split <italic>AC </italic>
in every possible way into pairs of ACs, and for each pair, add one AC to <italic>S</italic>
<sub>1 </sub>
and the other AC to <italic>S</italic>
<sub>2</sub>
;</p>
<p>    <bold>foreach </bold>
(<italic>B, a, w</italic>
) ∈ <italic>S</italic>
<sub>1 </sub>
<bold>do</bold>
</p>
<p>      <inline-formula><mml:math id="M75" name="1471-2105-14-S15-S6-i14" overflow="scroll"><mml:mrow><mml:mi>w</mml:mi>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo class="MathClass-bin">⋅</mml:mo>
<mml:msubsup><mml:mrow><mml:mi>γ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo class="MathClass-punc">;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p>    <inline-formula><mml:math id="M76" name="1471-2105-14-S15-S6-i15" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 ← CreateCACsForProb(<italic>g</italic>
, (<italic>u</italic>
<sub>1 </sub>
, <italic>v</italic>
), <italic>S</italic>
<sub>1</sub>
);</p>
<p>    <bold>foreach </bold>
(<italic>B, a, w</italic>
) ∈ <italic>S</italic>
<sub>2 </sub>
<bold>do</bold>
</p>
<p>      <inline-formula><mml:math id="M77" name="1471-2105-14-S15-S6-i16" overflow="scroll"><mml:mrow><mml:mi>w</mml:mi>
<mml:mo class="MathClass-rel">←</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo class="MathClass-bin">⋅</mml:mo>
<mml:msubsup><mml:mrow><mml:mi>γ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow><mml:mo class="MathClass-rel">|</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo class="MathClass-rel">|</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo class="MathClass-punc">;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p>    <inline-formula><mml:math id="M78" name="1471-2105-14-S15-S6-i17" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 ← CreateCACsForProb(<italic>g</italic>
, (<italic>u</italic>
<sub>2 </sub>
, <italic>v</italic>
), <italic>S</italic>
<sub>2 </sub>
);</p>
<p>  <bold>else if </bold>
<italic>node v is an internal tree node, who has two children w</italic>
<sub>1 </sub>
<italic>and w</italic>
<sub>2 </sub>
<bold>then</bold>
</p>
<p>    <bold>foreach </bold>
<italic>pair </italic>
(<italic>AC</italic>
<sub>1</sub>
, <italic>AC</italic>
<sub>2</sub>
) <italic>of compatible ACs in </italic>
<inline-formula><mml:math id="M79" name="1471-2105-14-S15-S6-i18" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msub><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-bin">×</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>v</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msub><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
</p>
<p>      Merge <italic>AC</italic>
<sub>1 </sub>
and <italic>AC</italic>
<sub>2 </sub>
and add the resulting AC to <inline-formula><mml:math id="M80" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>    <bold>if </bold>
<italic>node v is an internal tree node, who has a parent u </italic>
<bold>then</bold>
</p>
<p>      <inline-formula><mml:math id="M81" name="1471-2105-14-S15-S6-i31" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>u</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 ← CreateCACsForProb(<italic>g</italic>
, (<italic>u, v</italic>
), <inline-formula><mml:math id="M82" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
);</p>
<p>    <bold>else</bold>
</p>
<p>      Let <italic>B<sub>R </sub>
</italic>
be the root lineage of the gene tree <italic>g</italic>
;</p>
<p>      <bold>return </bold>
<inline-formula><mml:math id="M83" name="1471-2105-14-S15-S6-i19" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mo mathsize="big">∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>w</mml:mi>
<mml:mo class="MathClass-bin">⋅</mml:mo>
<mml:msub><mml:mrow><mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace class="thinspace" width="0.3em"></mml:mspace>
<mml:msub><mml:mrow><mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>R</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-bin">+</mml:mo>
<mml:mi>∞</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
;</p>
<p>nodes in the graph (an articulation node is a node whose removal disconnects the phylogenetic network). Obviously, the reticulation nodes inside the sub-network rooted at an articulation node are independent of the reticulation nodes outside the sub-network. So at articulation node <italic>v </italic>
we can reset the vectors in all ACs in <inline-formula><mml:math id="M84" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 to 0's so that all configurations at <italic>v </italic>
containing the same set of gene lineages become identical. More precisely, when traversing the species network, after constructing <inline-formula><mml:math id="M85" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 for some internal tree node <italic>v </italic>
as we have described in Alg. 3 and Alg. 5, if <italic>v </italic>
is an articulation node, we reset the vector to 0's in every AC in <inline-formula><mml:math id="M86" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
. Then for counting the minimum number of extra lineages, we update <inline-formula><mml:math id="M87" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 to be <inline-formula><mml:math id="M88" name="1471-2105-14-S15-S6-i20" overflow="scroll"><mml:mrow><mml:msubsup><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
 such that only the configuration containing the minimum weight is left, using the statement: <inline-formula><mml:math id="M89" name="1471-2105-14-S15-S6-i21" overflow="scroll"><mml:mrow><mml:msubsup><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">{</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext class="textsf" mathvariant="sans-serif">argmin</mml:mtext>
</mml:mstyle>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
. And for computing the probability of the topology of a gene tree, we keep only one copy of every distinct configuration. More precisely, we update <inline-formula><mml:math id="M90" name="1471-2105-14-S15-S6-i29" overflow="scroll"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
 to be <inline-formula><mml:math id="M91" name="1471-2105-14-S15-S6-i22" overflow="scroll"><mml:mrow><mml:msubsup><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
 using <inline-formula><mml:math id="M92" name="1471-2105-14-S15-S6-i23" overflow="scroll"><mml:mrow><mml:msubsup><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mrow><mml:mo class="MathClass-open">{</mml:mo>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msup><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">:</mml:mo>
<mml:msup><mml:mrow><mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>′</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:msub><mml:mrow><mml:mo mathsize="big"> ∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow><mml:mi>B</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-rel">∈</mml:mo>
<mml:msub><mml:mrow><mml:mi mathvariant="script">A</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
. Note that <italic>a </italic>
is a zero vector.</p>
</sec>
</sec>
<sec><title>Results and discussion</title>
<sec><title>Species tree inference under ILS</title>
<p>It is important to note that although we used coalescent histories to describe the heuristic, in practice we estimate the probability of the gene tree given a species tree using the concept of ancestral configurations.</p>
<p>To test the speed of proposed heuristic we generated 4 datasets, each consisting of 50 random species trees with same number of taxa and same height using PhyloGen [<xref ref-type="bibr" rid="B28">28</xref>
]. We used trees with 10, 20, 30 and 300 taxa of heights 5, 10, 15 and 20 respectively. From each species tree we simulated 25 gene trees, and estimated the probability of the gene trees given that species tree. On average, to compute (or estimate) the probability of a single gene tree, for the exact method it took 0.0019, 0.297 and 10.05 and for heuristic 0.0019, 0.0068 and 0.01 seconds for the trees with 10, 20 and 30 taxa, respectively. For the exact method we see that as we increase the number of taxa the time starts increasing dramatically. For the dataset consisting of trees with 300 taxa using heuristic took on average 0.2594 seconds to compute the same probability.</p>
<p>To further test the proposed heuristic we modified and ran the maximum likelihood species tree inference program STELLS [<xref ref-type="bibr" rid="B11">11</xref>
] on synthetic data generated by PhyloGen. Modification of STELLS consisted of changing the function that computes the probability of the gene tree given a species tree to use the proposed heuristic. Tests were performed on 3 different datasets each consisting of 10 randomly generated species trees. For the first dataset all trees contained 15 taxa and were of height 7 coalescent units. On this data set we ran both unmodified STELLS (which uses all coalescent histories to compute the probability of the gene tree given a species tree) and modified STELLS which employs the proposed heuristic. For the second dataset all trees contained 30 taxa and were of height 15 coalescent units. For the third dataset all trees contained 60 taxa and were of height 30 coalescent units. From each of the species trees we have generated 120 sets consisting of 30 sets with 25 gene trees, 30 sets with 50 gene trees, 30 sets with 100 gene trees and 30 sets with 200 gene trees. We have imposed the 72 hour time constrain on execution time, that is, killed the jobs that did not complete in 72 hours. We compared the inferred species tree with the true species tree (the tree from which we simulated the gene trees) in terms of averaged normalized Robinson-Foulds (RF) distance [<xref ref-type="bibr" rid="B29">29</xref>
], Figure <xref ref-type="fig" rid="F1">1(a)</xref>
, as well as the average inference runtime, Figure <xref ref-type="fig" rid="F1">1</xref>
 (b-d).</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p><bold>Normalized RF distance between inferred and true species trees (a), and average running times (b), (c) and (d) for heuristic with 15, 30 and 60 taxa and exact method for 15 taxa</bold>
.</p>
</caption>
<graphic xlink:href="1471-2105-14-S15-S6-1"></graphic>
</fig>
<p>For the inference using heuristic on datasets with 15 and 30 taxa all of the runs finished within the given time frame of 72 hours. On the dataset with 60 taxa 5 out of 300 runs with 25 gene trees, 6 out of 300 runs with 100 gene trees and 31 out of 300 runs with 200 gene trees did not finish in the given time frame. For the exact method on 15 taxa data set, 1 out of 300 runs with 100 gene trees and 16 out of 300 runs with 200 gene trees did not finish in the given time frame.</p>
<p>From Figure <xref ref-type="fig" rid="F1">1(a)</xref>
 we can see that if we have more gene trees the inferred species tree will be more accurate, which is a good property. When the heuristic is compared with exact method on 15 taxa dataset, the accuracy of the heuristic decreases slightly, but the running time improves significantly. Average speedup of the heuristic over the exact method on this dataset was 112.</p>
<p>One important thing to keep in mind while looking at the runtime results is that we are working with ML framework. In order to infer species tree we need to examine large number of potential species trees, for each of them we need to estimate branch lengths and finally compute the likelihood function. This means that for any potential species tree we will have to recompute the probabilities of all given gene trees multiple times. So when we increase the number of taxa, we increase the number of potential species tree we have to evaluate, and we increase the number of branches within the tree. This is the reason why the computation on datasets with large number of taxa will still take long time, but not due to the slow computation of probability.</p>
</sec>
<sec><title>Gene tree parsimonious reconciliation and probability under ILS and hybridization</title>
<p>To study the performance of the two methods compared to the MUL-tree based methods, we ran all four on synthetic data generated as follows. We first generated 100 random 24-taxon species trees using PhyloGen [<xref ref-type="bibr" rid="B28">28</xref>
], and from these we generated random species networks with 1, 2, 4, 6 and 8 reticulation nodes. When expanding a species network with <italic>n </italic>
reticulation nodes to a species network with <italic>n </italic>
+ 1 reticulation nodes, we randomly selected two existing edges in the species network and connected their midpoints from the higher one to the lower one and then the lower one becomes a new reticulation node. Then, we simulated 10, 20, 50, 100, 200, 500 and 1000 gene trees respectively within the edges of each species network using the ms program [<xref ref-type="bibr" rid="B30">30</xref>
]. Since the MUL-tree methods are computationally very intensive, we employed the following strategy: for the parsimony methods, we bounded the time at 24 hours (that is, killed jobs that did not complete within 24 hours). For the probabilistic methods, we bounded the time at 8 hours. The reason for this choice is that we found that in most cases if the probabilistic method did not finish within 8 hours, then it often did not finish within 24 hours (which is <italic>not </italic>
the case of the parsimony methods). Therefore, to save running time that would be "wasted" without adding to the results, we decided on the 8-hour bound for the probabilistic methods.</p>
<p>For computing the minimum number of extra lineages, the results of the running time of both methods are shown in Figure <xref ref-type="fig" rid="F2">2</xref>
. Overall, both methods spent more time on data sets where the species networks contain more reticulation nodes. It is not surprising given the fact that adding more reticulation nodes increases the complexity of the networks in general. We can see that the speedup of the AC-based method over the MUL-tree based method also increased when the number of reticulation nodes in the species networks increased. In the best cases, the method achieves an improvement of about 5 orders of magnitude. In this figure, we only plot the results of the computations that could finish in 24 hours across all different number of loci sampled. In fact, the AC based method finished every computation in less than 3 minutes, even for the largest data set which contained species networks with 8 reticulations and 1000 gene trees. For the MUL-tree based method, out of 100 repetitions the numbers of repetitions that were able to finish in 24 hours across all different loci are 100, 100, 99, 96 and 88 for data sets containing species networks with 1, 2, 4, 6 and 8 reticulation nodes.</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p><bold>The running times (ln of number seconds) of the MUL-tree based (<italic>t</italic>
(<italic>MUL</italic>
)), and AC-based (<italic>t</italic>
(<italic>AC</italic>
)) methods for computing parsimonious reconciliations, as well as the speedup <italic>log</italic>
<sub>10</sub>
(<italic>t</italic>
(<italic>MUL</italic>
)<italic>/t</italic>
(<italic>AC</italic>
))</bold>
.</p>
</caption>
<graphic xlink:href="1471-2105-14-S15-S6-2"></graphic>
</fig>
<p>For computing the probability of the gene tree topologies given a species network, we were not able to run the MUL-tree based method because we found it could not finish the computation in 24 hours even for the smallest data set (one gene tree and a species network with one reticulation node). In contrast, the AC-based method only needed 0.4 seconds on the same data set which implies a speedup of at least 5 orders of magnitude. Part of the results of the AC based algorithm are shown in Figure <xref ref-type="fig" rid="F3">3</xref>
.</p>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p><bold>The running time (ln of number of seconds) of the AC-based algorithm for computing the probability of gene tree topologies given a species network</bold>
. The columns from left to right correspond to data sets containing species networks with 1, 4 and 8 reticulation nodes, respectively.</p>
</caption>
<graphic xlink:href="1471-2105-14-S15-S6-3"></graphic>
</fig>
<p>For both parsimony and probabilistic methods, it is not surprising to see that for a fixed number of taxa the running time increases significantly when the number of reticulation nodes in the species networks increased. However, even for the same number of reticulation nodes, we can see that the running time still differs significantly from case to case. We find that for the data sets of the same size (e.g., number of taxa and reticulation nodes) there are several factors that can affect the number of configurations generated during the computation which directly dominates the running time of the algorithm. More specifically, the running time of the algorithms increases when there are more leaves under reticulation nodes and when the reticulation nodes are more dependent on each other. With respect to the topology of the gene tree and the species network, the more coalescent events that are allowed under reticulation nodes the faster the parsimony method is, and the opposite for the probabilistic method. For most cases, the AC-based methods are significantly much faster than the MUL-tree based ones. For parsimony, the gain in terms of efficiency comes from avoiding allele mappings that are guaranteed to result in suboptimal reconciliations or correspond to configurations being removed at articulation nodes. For probabilistic reconciliation, the gain comes from two factors: (1) avoiding redundant computations by reducing the number of configurations at articulation nodes which could not be avoided in MUL-tree based method, and (2) using ACs to compute the probability instead of enumerating the coalescent histories.</p>
</sec>
</sec>
<sec><title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
</body>
<back><sec><title>Acknowledgements</title>
<p>The authors would like to thank R. Matthew Barnett for providing the code for generating random species networks.</p>
</sec>
<sec><title>Declarations</title>
<p>Publication of this article was funded in part by NSF grants DBI-1062463 and CCF-1302179, grant R01LM009494 from the National Library of Medicine, an Alfred P. Sloan Research Fellowship, and a Guggenheim Fellowship to L.N. This work was also supported in part by the Data Analysis and Visualization Cyberinfrastructure funded by NSF under grant OCI-0959097. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NSF, National Library of Medicine, the National Institutes of Health, the Alfred P. Sloan Foundation, or the John Simon Guggenheim Memorial Foundation.</p>
<p>This article has been published as part of <italic>BMC Bioinformatics </italic>
Volume 14 Supplement 15, 2013: Proceedings from the Eleventh Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at <ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S15">http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S15</ext-link>
.</p>
</sec>
<ref-list><ref id="B1"><mixed-citation publication-type="journal"><name><surname>Maddison</surname>
<given-names>WP</given-names>
</name>
<article-title>Gene trees in species trees</article-title>
<source>Syst Biol</source>
<year>1997</year>
<volume>46</volume>
<fpage>523</fpage>
<lpage>536</lpage>
<pub-id pub-id-type="doi">10.1093/sysbio/46.3.523</pub-id>
</mixed-citation>
</ref>
<ref id="B2"><mixed-citation publication-type="journal"><name><surname>Syring</surname>
<given-names>J</given-names>
</name>
<name><surname>Willyard</surname>
<given-names>A</given-names>
</name>
<name><surname>Cronn</surname>
<given-names>R</given-names>
</name>
<name><surname>Liston</surname>
<given-names>A</given-names>
</name>
<article-title>Evolutionary relationships among <italic>Pinus </italic>
(Pinaceae) subsections inferred from multiple low-copy nuclear loci</article-title>
<source>American Journal of Botany</source>
<year>2005</year>
<volume>92</volume>
<fpage>2086</fpage>
<lpage>2100</lpage>
<pub-id pub-id-type="doi">10.3732/ajb.92.12.2086</pub-id>
<pub-id pub-id-type="pmid">21646125</pub-id>
</mixed-citation>
</ref>
<ref id="B3"><mixed-citation publication-type="journal"><name><surname>Pollard</surname>
<given-names>DA</given-names>
</name>
<name><surname>Iyer</surname>
<given-names>VN</given-names>
</name>
<name><surname>Moses</surname>
<given-names>AM</given-names>
</name>
<name><surname>Eisen</surname>
<given-names>MB</given-names>
</name>
<article-title>Widespread discordance of gene trees with species tree in <italic>Drosophila</italic>
: evidence for incomplete lineage sorting</article-title>
<source>PLoS Genet</source>
<year>2006</year>
<volume>2</volume>
<fpage>1634</fpage>
<lpage>1647</lpage>
</mixed-citation>
</ref>
<ref id="B4"><mixed-citation publication-type="journal"><name><surname>Than</surname>
<given-names>C</given-names>
</name>
<name><surname>Sugino</surname>
<given-names>R</given-names>
</name>
<name><surname>Innan</surname>
<given-names>H</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>Efficient Inference of Bacterial Strain Trees From Genomescale Multi-locus Data</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>i123</fpage>
<lpage>i131</lpage>
<comment>[Proceedings of the 16th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB '08)]</comment>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn149</pub-id>
<pub-id pub-id-type="pmid">18586704</pub-id>
</mixed-citation>
</ref>
<ref id="B5"><mixed-citation publication-type="journal"><name><surname>Kuo</surname>
<given-names>CH</given-names>
</name>
<name><surname>Wares</surname>
<given-names>JP</given-names>
</name>
<name><surname>Kissinger</surname>
<given-names>JC</given-names>
</name>
<article-title>The Apicomplexan whole-genome phylogeny: An analysis of incongurence among gene trees</article-title>
<source>Mol Biol Evol</source>
<year>2008</year>
<volume>25</volume>
<issue>12</issue>
<fpage>2689</fpage>
<lpage>2698</lpage>
<pub-id pub-id-type="doi">10.1093/molbev/msn213</pub-id>
<pub-id pub-id-type="pmid">18820254</pub-id>
</mixed-citation>
</ref>
<ref id="B6"><mixed-citation publication-type="journal"><name><surname>Cranston</surname>
<given-names>KA</given-names>
</name>
<name><surname>Hurwitz</surname>
<given-names>B</given-names>
</name>
<name><surname>Ware</surname>
<given-names>D</given-names>
</name>
<name><surname>Stein</surname>
<given-names>L</given-names>
</name>
<name><surname>Wing</surname>
<given-names>RA</given-names>
</name>
<article-title>Species trees from highly incongruent gene trees in rice</article-title>
<source>Syst Biol</source>
<year>2009</year>
<volume>58</volume>
<fpage>489</fpage>
<lpage>500</lpage>
<pub-id pub-id-type="doi">10.1093/sysbio/syp054</pub-id>
<pub-id pub-id-type="pmid">20525603</pub-id>
</mixed-citation>
</ref>
<ref id="B7"><mixed-citation publication-type="journal"><name><surname>White</surname>
<given-names>M</given-names>
</name>
<name><surname>Ane</surname>
<given-names>C</given-names>
</name>
<name><surname>Dewey</surname>
<given-names>C</given-names>
</name>
<name><surname>Larget</surname>
<given-names>B</given-names>
</name>
<collab>BAPayseur</collab>
<article-title>Fine-scale phylogenetic discordance across the house mouse genome</article-title>
<source>PLoS Genetics</source>
<year>2009</year>
<volume>5</volume>
<fpage>e1000729</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pgen.1000729</pub-id>
<pub-id pub-id-type="pmid">19936022</pub-id>
</mixed-citation>
</ref>
<ref id="B8"><mixed-citation publication-type="journal"><name><surname>Hobolth</surname>
<given-names>A</given-names>
</name>
<name><surname>Dutheil</surname>
<given-names>J</given-names>
</name>
<name><surname>Hawks</surname>
<given-names>J</given-names>
</name>
<name><surname>Schierup</surname>
<given-names>M</given-names>
</name>
<name><surname>Mailund</surname>
<given-names>T</given-names>
</name>
<article-title>Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection</article-title>
<source>Genome Research</source>
<year>2011</year>
<volume>21</volume>
<issue>3</issue>
<fpage>349</fpage>
<lpage>356</lpage>
<pub-id pub-id-type="doi">10.1101/gr.114751.110</pub-id>
<pub-id pub-id-type="pmid">21270173</pub-id>
</mixed-citation>
</ref>
<ref id="B9"><mixed-citation publication-type="journal"><name><surname>Takuno</surname>
<given-names>S</given-names>
</name>
<name><surname>Kado</surname>
<given-names>T</given-names>
</name>
<name><surname>Sugino</surname>
<given-names>RP</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<name><surname>Innan</surname>
<given-names>H</given-names>
</name>
<article-title>Population Genomics in Bacteria: A Case Study of Staphylococcus aureus</article-title>
<source>Molecular Biology and Evolution</source>
<year>2012</year>
<volume>29</volume>
<issue>2</issue>
<fpage>797</fpage>
<lpage>809</lpage>
<pub-id pub-id-type="doi">10.1093/molbev/msr249</pub-id>
<pub-id pub-id-type="pmid">22009061</pub-id>
</mixed-citation>
</ref>
<ref id="B10"><mixed-citation publication-type="journal"><name><surname>Degnan</surname>
<given-names>J</given-names>
</name>
<name><surname>Salter</surname>
<given-names>L</given-names>
</name>
<article-title>Gene tree distributions under the coalescent process</article-title>
<source>Evolution</source>
<year>2005</year>
<volume>59</volume>
<fpage>24</fpage>
<lpage>37</lpage>
<pub-id pub-id-type="pmid">15792224</pub-id>
</mixed-citation>
</ref>
<ref id="B11"><mixed-citation publication-type="journal"><name><surname>Wu</surname>
<given-names>Y</given-names>
</name>
<article-title>Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood</article-title>
<source>Evolution</source>
<year>2012</year>
<volume>66</volume>
<fpage>763</fpage>
<lpage>775</lpage>
<pub-id pub-id-type="doi">10.1111/j.1558-5646.2011.01476.x</pub-id>
<pub-id pub-id-type="pmid">22380439</pub-id>
</mixed-citation>
</ref>
<ref id="B12"><mixed-citation publication-type="journal"><name><surname>Staubach</surname>
<given-names>F</given-names>
</name>
<name><surname>Lorenc</surname>
<given-names>A</given-names>
</name>
<name><surname>Messer</surname>
<given-names>P</given-names>
</name>
<name><surname>Tang</surname>
<given-names>K</given-names>
</name>
<name><surname>Petrov</surname>
<given-names>D</given-names>
</name>
<name><surname>Tautz</surname>
<given-names>D</given-names>
</name>
<article-title>Genome patterns of selection and introgression of haplotypes in natural populations of the house mouse (Mus musculus)</article-title>
<source>PLoS Genetics</source>
<year>2012</year>
<volume>8</volume>
<issue>8</issue>
<fpage>e1002891</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pgen.1002891</pub-id>
<pub-id pub-id-type="pmid">22956910</pub-id>
</mixed-citation>
</ref>
<ref id="B13"><mixed-citation publication-type="journal"><collab>Consortium THG</collab>
<article-title>Butterfly genome reveals promiscuous exchange of mimicry adaptations among species</article-title>
<source>Nature</source>
<year>2012</year>
<volume>487</volume>
<issue>7405</issue>
<fpage>94</fpage>
<lpage>98</lpage>
<pub-id pub-id-type="pmid">22722851</pub-id>
</mixed-citation>
</ref>
<ref id="B14"><mixed-citation publication-type="journal"><name><surname>Moody</surname>
<given-names>M</given-names>
</name>
<name><surname>Rieseberg</surname>
<given-names>L</given-names>
</name>
<article-title>Sorting Through The Chaff, nDNA Gene Trees For Phylogenetic Inference And Hybrid Identification Of Annual Sunflowers (Helianthus sect Helianthus)</article-title>
<source>Molecular Phylogenetics And Evolution</source>
<year>2012</year>
<volume>64</volume>
<fpage>145</fpage>
<lpage>155</lpage>
<pub-id pub-id-type="doi">10.1016/j.ympev.2012.03.012</pub-id>
<pub-id pub-id-type="pmid">22724134</pub-id>
</mixed-citation>
</ref>
<ref id="B15"><mixed-citation publication-type="journal"><name><surname>Than</surname>
<given-names>C</given-names>
</name>
<name><surname>Ruths</surname>
<given-names>D</given-names>
</name>
<name><surname>Innan</surname>
<given-names>H</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions</article-title>
<source>J Comput Biol</source>
<year>2007</year>
<volume>14</volume>
<fpage>517</fpage>
<lpage>535</lpage>
<pub-id pub-id-type="doi">10.1089/cmb.2007.A010</pub-id>
<pub-id pub-id-type="pmid">17572027</pub-id>
</mixed-citation>
</ref>
<ref id="B16"><mixed-citation publication-type="journal"><name><surname>Meng</surname>
<given-names>C</given-names>
</name>
<name><surname>Kubatko</surname>
<given-names>LS</given-names>
</name>
<article-title>Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model</article-title>
<source>Theor Popul Biol</source>
<year>2009</year>
<volume>75</volume>
<fpage>35</fpage>
<lpage>45</lpage>
<pub-id pub-id-type="doi">10.1016/j.tpb.2008.10.004</pub-id>
<pub-id pub-id-type="pmid">19038278</pub-id>
</mixed-citation>
</ref>
<ref id="B17"><mixed-citation publication-type="journal"><name><surname>Kubatko</surname>
<given-names>LS</given-names>
</name>
<article-title>Identifying hybridization events in the presence of coalescence via model selection</article-title>
<source>Syst Biol</source>
<year>2009</year>
<volume>58</volume>
<issue>5</issue>
<fpage>478</fpage>
<lpage>488</lpage>
<pub-id pub-id-type="doi">10.1093/sysbio/syp055</pub-id>
<pub-id pub-id-type="pmid">20525602</pub-id>
</mixed-citation>
</ref>
<ref id="B18"><mixed-citation publication-type="journal"><name><surname>Joly</surname>
<given-names>S</given-names>
</name>
<name><surname>McLenachan</surname>
<given-names>PA</given-names>
</name>
<name><surname>Lockhart</surname>
<given-names>PJ</given-names>
</name>
<article-title>A statistical approach for distinguishing hybridization and incomplete lineage sorting</article-title>
<source>Am Nat</source>
<year>2009</year>
<volume>174</volume>
<issue>2</issue>
<fpage>E54</fpage>
<lpage>E70</lpage>
<pub-id pub-id-type="doi">10.1086/600082</pub-id>
<pub-id pub-id-type="pmid">19519219</pub-id>
</mixed-citation>
</ref>
<ref id="B19"><mixed-citation publication-type="journal"><name><surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Than</surname>
<given-names>C</given-names>
</name>
<name><surname>Degnan</surname>
<given-names>J</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting</article-title>
<source>Systematic Biology</source>
<year>2011</year>
<volume>60</volume>
<fpage>138</fpage>
<lpage>149</lpage>
<pub-id pub-id-type="doi">10.1093/sysbio/syq084</pub-id>
<pub-id pub-id-type="pmid">21248369</pub-id>
</mixed-citation>
</ref>
<ref id="B20"><mixed-citation publication-type="other"><name><surname>Jones</surname>
<given-names>G</given-names>
</name>
<name><surname>Sagitov</surname>
<given-names>S</given-names>
</name>
<name><surname>Oxelman</surname>
<given-names>B</given-names>
</name>
<article-title>Statistical inference of allopolyploid species networks in the presence of incomplete lineage sorting</article-title>
<source>arXiv</source>
<year>2012</year>
<fpage>1208</fpage>
<comment>3606</comment>
</mixed-citation>
</ref>
<ref id="B21"><mixed-citation publication-type="other"><name><surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Barnett</surname>
<given-names>R</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>Parsimonious inference of hybridization in the presence of incomplete lineage sorting</article-title>
<source>Systematic Biology</source>
<year>2013</year>
<comment>[To appear]</comment>
</mixed-citation>
</ref>
<ref id="B22"><mixed-citation publication-type="journal"><name><surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Degnan</surname>
<given-names>J</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection</article-title>
<source>PLoS Genetics</source>
<year>2012</year>
<volume>8</volume>
<fpage>e1002660</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pgen.1002660</pub-id>
<pub-id pub-id-type="pmid">22536161</pub-id>
</mixed-citation>
</ref>
<ref id="B23"><mixed-citation publication-type="journal"><name><surname>Than</surname>
<given-names>C</given-names>
</name>
<name><surname>Ruths</surname>
<given-names>D</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>322</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-322</pub-id>
<pub-id pub-id-type="pmid">18662388</pub-id>
</mixed-citation>
</ref>
<ref id="B24"><mixed-citation publication-type="book"><name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<person-group person-group-type="editor">Heath L, Ramakrishnan N</person-group>
<article-title>Evolutionary phylogenetic networks: models and issues</article-title>
<source>The Problem Solving Handbook for Computational Biology and Bioinformatics</source>
<year>2010</year>
<publisher-name>New York: Springer</publisher-name>
<fpage>125</fpage>
<lpage>158</lpage>
</mixed-citation>
</ref>
<ref id="B25"><mixed-citation publication-type="journal"><name><surname>Tavaré</surname>
<given-names>S</given-names>
</name>
<article-title>Line-of-descent and genealogical processes, and their applications in population genetics models</article-title>
<source>Theor Pop Biol</source>
<year>1984</year>
<volume>26</volume>
<fpage>119</fpage>
<lpage>164</lpage>
<pub-id pub-id-type="doi">10.1016/0040-5809(84)90027-3</pub-id>
<pub-id pub-id-type="pmid">6505980</pub-id>
</mixed-citation>
</ref>
<ref id="B26"><mixed-citation publication-type="journal"><name><surname>Than</surname>
<given-names>C</given-names>
</name>
<name><surname>Nakhleh</surname>
<given-names>L</given-names>
</name>
<article-title>Species tree inference by minimizing deep coalescences</article-title>
<source>PLoS Computational Biology</source>
<year>2009</year>
<volume>5</volume>
<issue>9</issue>
<fpage>e1000501</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000501</pub-id>
<pub-id pub-id-type="pmid">19749978</pub-id>
</mixed-citation>
</ref>
<ref id="B27"><mixed-citation publication-type="journal"><name><surname>Doyon</surname>
<given-names>JP</given-names>
</name>
<name><surname>Hamel</surname>
<given-names>S</given-names>
</name>
<name><surname>Chauve</surname>
<given-names>C</given-names>
</name>
<article-title>An Efficient Method for Exploring the Space of Gene Tree/Species Tree Reconciliations in a Probabilistic Framework</article-title>
<source>Computational Biology and Bioinformatics, IEEE/ACM Transactions on</source>
<year>2012</year>
<volume>9</volume>
<fpage>26</fpage>
<lpage>39</lpage>
</mixed-citation>
</ref>
<ref id="B28"><mixed-citation publication-type="other"><name><surname>Rambaut</surname>
<given-names>A</given-names>
</name>
<article-title>Phylogen v1.1</article-title>
<year>2012</year>
<ext-link ext-link-type="uri" xlink:href="http://tree.bio.ed.ac.uk/software/phylogen/">http://tree.bio.ed.ac.uk/software/phylogen/</ext-link>
</mixed-citation>
</ref>
<ref id="B29"><mixed-citation publication-type="journal"><name><surname>Robinson</surname>
<given-names>D</given-names>
</name>
<name><surname>Foulds</surname>
<given-names>L</given-names>
</name>
<article-title>Comparison of phylogenetic trees</article-title>
<source>Math Biosci</source>
<year>1981</year>
<volume>53</volume>
<fpage>131</fpage>
<lpage>147</lpage>
<pub-id pub-id-type="doi">10.1016/0025-5564(81)90043-2</pub-id>
</mixed-citation>
</ref>
<ref id="B30"><mixed-citation publication-type="journal"><name><surname>Hudson</surname>
<given-names>RR</given-names>
</name>
<article-title>Generating samples under a Wright-Fisher neutral model of genetic variation</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<fpage>337</fpage>
<lpage>338</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/18.2.337</pub-id>
<pub-id pub-id-type="pmid">11847089</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000226 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000226 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3852049
   |texte=   Fast algorithms and heuristics for phylogenomics under ILS and hybridization
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24564257" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024

	Serveur d'exploration Cyberinfrastructure
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration Cyberinfrastructure

Fast algorithms and heuristics for phylogenomics under ILS and hybridization

Fast algorithms and heuristics for phylogenomics under ILS and hybridization

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki