Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures

Identifieur interne : 000674 ( Pmc/Checkpoint ); précédent : 000673; suivant : 000675

Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures

Auteurs : Alexandros Stamatakis [Allemagne] ; Michael Ott [Allemagne]

Source :

RBID : PMC:2607410

Abstract

The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on ‘gappy’ multi-gene alignments. By ‘gappy’ we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAxML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.


Url:
DOI: 10.1098/rstb.2008.0163
PubMed: 18852107
PubMed Central: 2607410


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:2607410

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures</title>
<author>
<name sortKey="Stamatakis, Alexandros" sort="Stamatakis, Alexandros" uniqKey="Stamatakis A" first="Alexandros" last="Stamatakis">Alexandros Stamatakis</name>
<affiliation wicri:level="3">
<nlm:aff id="aff1">
<institution>Department of Computer Science The Exelixis Lab, Ludwig-Maximilians-Universität München</institution>
<addr-line>Amalienstrasse 17, 80333 München, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Amalienstrasse 17, 80333 München</wicri:regionArea>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ott, Michael" sort="Ott, Michael" uniqKey="Ott M" first="Michael" last="Ott">Michael Ott</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<institution>Department of Computer Science, Technische Universität München</institution>
<addr-line>Boltzmannstrasse 3, 85747 Garching b. München, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Boltzmannstrasse 3, 85747 Garching b. München</wicri:regionArea>
<wicri:noRegion>85747 Garching b. München</wicri:noRegion>
<wicri:noRegion>85747 Garching b. München</wicri:noRegion>
<wicri:noRegion>85747 Garching b. München</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">18852107</idno>
<idno type="pmc">2607410</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607410</idno>
<idno type="RBID">PMC:2607410</idno>
<idno type="doi">10.1098/rstb.2008.0163</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000282</idno>
<idno type="wicri:Area/Pmc/Curation">000282</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000674</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures</title>
<author>
<name sortKey="Stamatakis, Alexandros" sort="Stamatakis, Alexandros" uniqKey="Stamatakis A" first="Alexandros" last="Stamatakis">Alexandros Stamatakis</name>
<affiliation wicri:level="3">
<nlm:aff id="aff1">
<institution>Department of Computer Science The Exelixis Lab, Ludwig-Maximilians-Universität München</institution>
<addr-line>Amalienstrasse 17, 80333 München, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Amalienstrasse 17, 80333 München</wicri:regionArea>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ott, Michael" sort="Ott, Michael" uniqKey="Ott M" first="Michael" last="Ott">Michael Ott</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<institution>Department of Computer Science, Technische Universität München</institution>
<addr-line>Boltzmannstrasse 3, 85747 Garching b. München, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Boltzmannstrasse 3, 85747 Garching b. München</wicri:regionArea>
<wicri:noRegion>85747 Garching b. München</wicri:noRegion>
<wicri:noRegion>85747 Garching b. München</wicri:noRegion>
<wicri:noRegion>85747 Garching b. München</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Philosophical Transactions of the Royal Society B: Biological Sciences</title>
<idno type="ISSN">0962-8436</idno>
<idno type="eISSN">1471-2970</idno>
<imprint>
<date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on ‘gappy’ multi-gene alignments. By ‘gappy’ we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in
<sc>RAxML</sc>
indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.</p>
</div>
</front>
</TEI>
<pmc article-type="research-article" xml:lang="EN">
<pmc-comment>The publisher of this article does not allow downloading of the full text in XML form.</pmc-comment>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Philos Trans R Soc Lond B Biol Sci</journal-id>
<journal-id journal-id-type="publisher-id">RSTB</journal-id>
<journal-title>Philosophical Transactions of the Royal Society B: Biological Sciences</journal-title>
<issn pub-type="ppub">0962-8436</issn>
<issn pub-type="epub">1471-2970</issn>
<publisher>
<publisher-name>The Royal Society</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">18852107</article-id>
<article-id pub-id-type="pmc">2607410</article-id>
<article-id pub-id-type="publisher-id">rstb20080163</article-id>
<article-id pub-id-type="doi">10.1098/rstb.2008.0163</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>Alexandros</given-names>
</name>
<xref ref-type="aff" rid="aff1">1</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ott</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="aff2">2</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<institution>Department of Computer Science The Exelixis Lab, Ludwig-Maximilians-Universität München</institution>
<addr-line>Amalienstrasse 17, 80333 München, Germany</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<institution>Department of Computer Science, Technische Universität München</institution>
<addr-line>Boltzmannstrasse 3, 85747 Garching b. München, Germany</addr-line>
</aff>
<author-notes>
<corresp id="cor1">
<label>*</label>
Author for correspondence (
<email>alexandros.stamatakis@gmail.com</email>
)</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>7</day>
<month>10</month>
<year>2008</year>
</pub-date>
<pub-date pub-type="ppub">
<day>27</day>
<month>12</month>
<year>2008</year>
</pub-date>
<volume>363</volume>
<issue>1512</issue>
<issue-title>Discussion Meeting Issue ‘Statistical and computational challenges in molecular phylogenetics and evolution’ organized by Ziheng Yang and Nick Goldman</issue-title>
<fpage>3977</fpage>
<lpage>3984</lpage>
<permissions>
<copyright-statement>© 2008 The Royal Society</copyright-statement>
<copyright-year>2008</copyright-year>
</permissions>
<abstract xml:lang="EN">
<p>The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on ‘gappy’ multi-gene alignments. By ‘gappy’ we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in
<sc>RAxML</sc>
indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.</p>
</abstract>
<kwd-group>
<kwd>phylogenetic inference</kwd>
<kwd>maximum likelihood</kwd>
<kwd>
<sc>RAxML</sc>
</kwd>
<kwd>multi-gene phylogenies</kwd>
<kwd>multi-core architectures</kwd>
<kwd>OpenMP</kwd>
</kwd-group>
</article-meta>
</front>
<floats-wrap>
<fig id="fig1" position="float">
<label>Figure 1</label>
<caption>
<p>Example of a gappy multi-gene alignment.</p>
</caption>
<graphic xlink:href="rstb20080163f01"></graphic>
</fig>
<fig id="fig2" position="float">
<label>Figure 2</label>
<caption>
<p>(
<italic>a</italic>
,
<italic>b</italic>
) Data structures and likelihood computation for gappy multi-gene alignments.</p>
</caption>
<graphic xlink:href="rstb20080163f02"></graphic>
</fig>
<fig id="fig3" position="float">
<label>Figure 3</label>
<caption>
<p>Likelihood vector organization.</p>
</caption>
<graphic xlink:href="rstb20080163f03"></graphic>
</fig>
<fig id="fig4" position="float">
<label>Figure 4</label>
<caption>
<p>Speedup of Pthreads-based parallel
<sc>RAxML</sc>
version on different multi-core architectures. Solid line, Barcelona; long-dashed line, Clovertown; short-dashed line, x4600; linear speedup, dotted line.</p>
</caption>
<graphic xlink:href="rstb20080163f04"></graphic>
</fig>
<table-wrap id="tbl1" position="float">
<label>Table 1</label>
<caption>
<p>Execution times on an AMD Opteron for standard and fast likelihood computations on gappy multi-gene alignments.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="bottom" align="left" rowspan="1" colspan="1">dataset</th>
<th valign="bottom" rowspan="1" colspan="1">
<italic>S-Travers</italic>
</th>
<th valign="bottom" rowspan="1" colspan="1">
<italic>F-Travers</italic>
</th>
<th valign="bottom" rowspan="1" colspan="1">
<italic>S-Opt</italic>
</th>
<th valign="bottom" rowspan="1" colspan="1">
<italic>F-Opt</italic>
</th>
<th valign="bottom" rowspan="1" colspan="1">
<italic>gaps</italic>
(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">d59_8</td>
<td align="char" char="." rowspan="1" colspan="1">1.87</td>
<td align="char" char="." rowspan="1" colspan="1">1.19</td>
<td align="char" char="." rowspan="1" colspan="1">13.32</td>
<td align="char" char="." rowspan="1" colspan="1">2.85</td>
<td align="char" char="." rowspan="1" colspan="1">28</td>
</tr>
<tr>
<td rowspan="1" colspan="1">d404_11</td>
<td align="char" char="." rowspan="1" colspan="1">37.04</td>
<td align="char" char="." rowspan="1" colspan="1">8.08</td>
<td align="char" char="." rowspan="1" colspan="1">303.18</td>
<td align="char" char="." rowspan="1" colspan="1">9.00</td>
<td align="char" char="." rowspan="1" colspan="1">73</td>
</tr>
<tr>
<td rowspan="1" colspan="1">d2177_68</td>
<td align="char" char="." rowspan="1" colspan="1">756.96</td>
<td align="char" char="." rowspan="1" colspan="1">68.69</td>
<td align="char" char="." rowspan="1" colspan="1">7483.53</td>
<td align="char" char="." rowspan="1" colspan="1">165.87</td>
<td align="char" char="." rowspan="1" colspan="1">91</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tbl2" position="float">
<label>Table 2</label>
<caption>
<p>Execution times in seconds on a four-way AMD Opteron for the Pthreads, OpenMP and sequential
<sc>RAxML</sc>
versions.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="bottom" align="left" rowspan="1" colspan="1">dataset</th>
<th valign="bottom" align="left" rowspan="1" colspan="1">Pthreads</th>
<th valign="bottom" align="left" rowspan="1" colspan="1">OpenMP</th>
<th valign="bottom" align="left" rowspan="1" colspan="1">sequential</th>
<th valign="bottom" align="left" rowspan="1" colspan="1">loop length</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">d125</td>
<td align="char" char="." rowspan="1" colspan="1">3682.30</td>
<td align="char" char="." rowspan="1" colspan="1">4722.05</td>
<td align="char" char="." rowspan="1" colspan="1">16 604.25</td>
<td align="char" char="." rowspan="1" colspan="1">19 436</td>
</tr>
<tr>
<td rowspan="1" colspan="1">d150</td>
<td align="char" char="." rowspan="1" colspan="1">214.25</td>
<td align="char" char="." rowspan="1" colspan="1">216.22</td>
<td align="char" char="." rowspan="1" colspan="1">782.47</td>
<td align="char" char="." rowspan="1" colspan="1">1130</td>
</tr>
<tr>
<td rowspan="1" colspan="1">d500</td>
<td align="char" char="." rowspan="1" colspan="1">1282.22</td>
<td align="char" char="." rowspan="1" colspan="1">1393.85</td>
<td align="char" char="." rowspan="1" colspan="1">4749.67</td>
<td align="char" char="." rowspan="1" colspan="1">1193</td>
</tr>
<tr>
<td rowspan="1" colspan="1">d714</td>
<td align="char" char="." rowspan="1" colspan="1">1675.64</td>
<td align="char" char="." rowspan="1" colspan="1">1694.64</td>
<td align="char" char="." rowspan="1" colspan="1">6207.23</td>
<td align="char" char="." rowspan="1" colspan="1">1231</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-wrap>
</pmc>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>Bavière</li>
<li>District de Haute-Bavière</li>
</region>
<settlement>
<li>Munich</li>
</settlement>
</list>
<tree>
<country name="Allemagne">
<region name="Bavière">
<name sortKey="Stamatakis, Alexandros" sort="Stamatakis, Alexandros" uniqKey="Stamatakis A" first="Alexandros" last="Stamatakis">Alexandros Stamatakis</name>
</region>
<name sortKey="Ott, Michael" sort="Ott, Michael" uniqKey="Ott M" first="Michael" last="Ott">Michael Ott</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000674 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000674 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:2607410
   |texte=   Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:18852107" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024