Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

Identifieur interne : 000526 ( Pmc/Corpus ); précédent : 000525; suivant : 000527

Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

Auteurs : Alexandros Stamatakis ; Markus Göker ; Guido W. Grimm

Source :

RBID : PMC:2880847

Abstract

The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.


Url:
PubMed: 20535232
PubMed Central: 2880847

Links to Exploration step

PMC:2880847

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling</title>
<author>
<name sortKey="Stamatakis, Alexandros" sort="Stamatakis, Alexandros" uniqKey="Stamatakis A" first="Alexandros" last="Stamatakis">Alexandros Stamatakis</name>
<affiliation>
<nlm:aff id="af1-ebo-2010-073">The Exelixis Lab, Dept. of Computer Science, Technische Universität München, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Goker, Markus" sort="Goker, Markus" uniqKey="Goker M" first="Markus" last="Göker">Markus Göker</name>
<affiliation>
<nlm:aff id="af2-ebo-2010-073">German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Grimm, Guido W" sort="Grimm, Guido W" uniqKey="Grimm G" first="Guido W." last="Grimm">Guido W. Grimm</name>
<affiliation>
<nlm:aff id="af3-ebo-2010-073">Department of Palaeobotany, Swedish Museum of Natural History, Stockholm, Sweden</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20535232</idno>
<idno type="pmc">2880847</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880847</idno>
<idno type="RBID">PMC:2880847</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000526</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling</title>
<author>
<name sortKey="Stamatakis, Alexandros" sort="Stamatakis, Alexandros" uniqKey="Stamatakis A" first="Alexandros" last="Stamatakis">Alexandros Stamatakis</name>
<affiliation>
<nlm:aff id="af1-ebo-2010-073">The Exelixis Lab, Dept. of Computer Science, Technische Universität München, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Goker, Markus" sort="Goker, Markus" uniqKey="Goker M" first="Markus" last="Göker">Markus Göker</name>
<affiliation>
<nlm:aff id="af2-ebo-2010-073">German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Grimm, Guido W" sort="Grimm, Guido W" uniqKey="Grimm G" first="Guido W." last="Grimm">Guido W. Grimm</name>
<affiliation>
<nlm:aff id="af3-ebo-2010-073">Department of Palaeobotany, Swedish Museum of Natural History, Stockholm, Sweden</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Evolutionary Bioinformatics Online</title>
<idno type="eISSN">1176-9343</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene
<italic>rbcL</italic>
alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene
<italic>rbcL</italic>
alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Zwickl, Dj" uniqKey="Zwickl D">DJ Zwickl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Ludwig, T" uniqKey="Ludwig T">T Ludwig</name>
</author>
<author>
<name sortKey="Meier, H" uniqKey="Meier H">H Meier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Blagojevic, F" uniqKey="Blagojevic F">F Blagojevic</name>
</author>
<author>
<name sortKey="Antonopoulos, Cd" uniqKey="Antonopoulos C">CD Antonopoulos</name>
</author>
<author>
<name sortKey="Nikolopoulos, Ds" uniqKey="Nikolopoulos D">DS Nikolopoulos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Minh, Bq" uniqKey="Minh B">BQ Minh</name>
</author>
<author>
<name sortKey="Vinh, Ls" uniqKey="Vinh L">LS Vinh</name>
</author>
<author>
<name sortKey="Von Haeseler, A" uniqKey="Von Haeseler A">A von Haeseler</name>
</author>
<author>
<name sortKey="Schmidt, Ha" uniqKey="Schmidt H">HA Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guindon, S" uniqKey="Guindon S">S Guindon</name>
</author>
<author>
<name sortKey="Gascuel, O" uniqKey="Gascuel O">O Gascuel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hordijk, W" uniqKey="Hordijk W">W Hordijk</name>
</author>
<author>
<name sortKey="Gascuel, O" uniqKey="Gascuel O">O Gascuel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jobb, G" uniqKey="Jobb G">G Jobb</name>
</author>
<author>
<name sortKey="Von Haeseler, A" uniqKey="Von Haeseler A">A von Haeseler</name>
</author>
<author>
<name sortKey="Strimmer, K" uniqKey="Strimmer K">K Strimmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huelsenbeck, Jp" uniqKey="Huelsenbeck J">JP Huelsenbeck</name>
</author>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F Ronquist</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F Ronquist</name>
</author>
<author>
<name sortKey="Huelsenbeck, Jp" uniqKey="Huelsenbeck J">JP Huelsenbeck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dunn, Cw" uniqKey="Dunn C">CW Dunn</name>
</author>
<author>
<name sortKey="Hejnol, A" uniqKey="Hejnol A">A Hejnol</name>
</author>
<author>
<name sortKey="Matus, Dq" uniqKey="Matus D">DQ Matus</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcmahon, Mm" uniqKey="Mcmahon M">MM McMahon</name>
</author>
<author>
<name sortKey="Sanderson, Mj" uniqKey="Sanderson M">MJ Sanderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gueidan, C" uniqKey="Gueidan C">C Gueidan</name>
</author>
<author>
<name sortKey="Roux, C" uniqKey="Roux C">C Roux</name>
</author>
<author>
<name sortKey="Lutzoni, F" uniqKey="Lutzoni F">F Lutzoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jansen, Rk" uniqKey="Jansen R">RK Jansen</name>
</author>
<author>
<name sortKey="Cai, Z" uniqKey="Cai Z">Z Cai</name>
</author>
<author>
<name sortKey="Raubeson, La" uniqKey="Raubeson L">LA Raubeson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hackett, Sj" uniqKey="Hackett S">SJ Hackett</name>
</author>
<author>
<name sortKey="Kimball, Rt" uniqKey="Kimball R">RT Kimball</name>
</author>
<author>
<name sortKey="Reddy, S" uniqKey="Reddy S">S Reddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yoon, Hs" uniqKey="Yoon H">HS Yoon</name>
</author>
<author>
<name sortKey="Grant, J" uniqKey="Grant J">J Grant</name>
</author>
<author>
<name sortKey="Tekle, Yi" uniqKey="Tekle Y">YI Tekle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gee, H" uniqKey="Gee H">H Gee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
<author>
<name sortKey="Snell, Ea" uniqKey="Snell E">EA Snell</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
<author>
<name sortKey="Lopez, P" uniqKey="Lopez P">P Lopez</name>
</author>
<author>
<name sortKey="Holland, Pwh" uniqKey="Holland P">PWH Holland</name>
</author>
<author>
<name sortKey="Casane, D" uniqKey="Casane D">D Casane</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jeffroy, O" uniqKey="Jeffroy O">O Jeffroy</name>
</author>
<author>
<name sortKey="Brinkmann, H" uniqKey="Brinkmann H">H Brinkmann</name>
</author>
<author>
<name sortKey="Delsuc, F" uniqKey="Delsuc F">F Delsuc</name>
</author>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mc Guire, Ja" uniqKey="Mc Guire J">JA Mc Guire</name>
</author>
<author>
<name sortKey="Witt, Cc" uniqKey="Witt C">CC Witt</name>
</author>
<author>
<name sortKey="Altshuler, Dl" uniqKey="Altshuler D">DL Altshuler</name>
</author>
<author>
<name sortKey="Remsen, Jv" uniqKey="Remsen J">JV Remsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kolokotronis, So" uniqKey="Kolokotronis S">SO Kolokotronis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Sa" uniqKey="Smith S">SA Smith</name>
</author>
<author>
<name sortKey="Beaulieu, Jm" uniqKey="Beaulieu J">JM Beaulieu</name>
</author>
<author>
<name sortKey="Donoghue, Mj" uniqKey="Donoghue M">MJ Donoghue</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ripplinger, J" uniqKey="Ripplinger J">J Ripplinger</name>
</author>
<author>
<name sortKey="Sullivan, J" uniqKey="Sullivan J">J Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tavare, S" uniqKey="Tavare S">S Tavare</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ott, M" uniqKey="Ott M">M Ott</name>
</author>
<author>
<name sortKey="Zola, J" uniqKey="Zola J">J Zola</name>
</author>
<author>
<name sortKey="Aluru, S" uniqKey="Aluru S">S Aluru</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Ott, M" uniqKey="Ott M">M Ott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Hoover, P" uniqKey="Hoover P">P Hoover</name>
</author>
<author>
<name sortKey="Rougemont, J" uniqKey="Rougemont J">J Rougemont</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bininda Emonds, Orp" uniqKey="Bininda Emonds O">ORP Bininda-Emonds</name>
</author>
<author>
<name sortKey="Brady, Sg" uniqKey="Brady S">SG Brady</name>
</author>
<author>
<name sortKey="King, J" uniqKey="King J">J King</name>
</author>
<author>
<name sortKey="Sanderson, Mj" uniqKey="Sanderson M">MJ Sanderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moret, Bme" uniqKey="Moret B">BME Moret</name>
</author>
<author>
<name sortKey="Roshan, U" uniqKey="Roshan U">U Roshan</name>
</author>
<author>
<name sortKey="Warnow, T" uniqKey="Warnow T">T Warnow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Z" uniqKey="Yang Z">Z Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goloboff, Pa" uniqKey="Goloboff P">PA Goloboff</name>
</author>
<author>
<name sortKey="Catalano, Sa" uniqKey="Catalano S">SA Catalano</name>
</author>
<author>
<name sortKey="Marcos Mirande, J" uniqKey="Marcos Mirande J">J Marcos Mirande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zwickl, Dj" uniqKey="Zwickl D">DJ Zwickl</name>
</author>
<author>
<name sortKey="Hillis, Dm" uniqKey="Hillis D">DM Hillis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Delsuc, F" uniqKey="Delsuc F">F Delsuc</name>
</author>
<author>
<name sortKey="Brinkmann, H" uniqKey="Brinkmann H">H Brinkmann</name>
</author>
<author>
<name sortKey="Philippe, H" uniqKey="Philippe H">H Philippe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Graham, Sw" uniqKey="Graham S">SW Graham</name>
</author>
<author>
<name sortKey="Olmstead, Rg" uniqKey="Olmstead R">RG Olmstead</name>
</author>
<author>
<name sortKey="Barrett, Sch" uniqKey="Barrett S">SCH Barrett</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Savolainen, V" uniqKey="Savolainen V">V Savolainen</name>
</author>
<author>
<name sortKey="Chase, Mw" uniqKey="Chase M">MW Chase</name>
</author>
<author>
<name sortKey="Hoot, Sb" uniqKey="Hoot S">SB Hoot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stevens, Pf" uniqKey="Stevens P">PF Stevens</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hilu, Kw" uniqKey="Hilu K">KW Hilu</name>
</author>
<author>
<name sortKey="Borsch, T" uniqKey="Borsch T">T Borsch</name>
</author>
<author>
<name sortKey="Muller, K" uniqKey="Muller K">K Müller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davies, Jt" uniqKey="Davies J">JT Davies</name>
</author>
<author>
<name sortKey="Barradough, Tg" uniqKey="Barradough T">TG Barradough</name>
</author>
<author>
<name sortKey="Chase, Mw" uniqKey="Chase M">MW Chase</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soltis, Ps" uniqKey="Soltis P">PS Soltis</name>
</author>
<author>
<name sortKey="Soltis, De" uniqKey="Soltis D">DE Soltis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qiu, Yl" uniqKey="Qiu Y">YL Qiu</name>
</author>
<author>
<name sortKey="Dombrovska, O" uniqKey="Dombrovska O">O Dombrovska</name>
</author>
<author>
<name sortKey="Lee, J" uniqKey="Lee J">J Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soltis, De" uniqKey="Soltis D">DE Soltis</name>
</author>
<author>
<name sortKey="Gitzendanner, Ma" uniqKey="Gitzendanner M">MA Gitzendanner</name>
</author>
<author>
<name sortKey="Soltis, Ps" uniqKey="Soltis P">PS Soltis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanderson, Mj" uniqKey="Sanderson M">MJ Sanderson</name>
</author>
<author>
<name sortKey="Wojciechowski, Mf" uniqKey="Wojciechowski M">MF Wojciechowski</name>
</author>
<author>
<name sortKey="Hu, J M" uniqKey="Hu J">J-M Hu</name>
</author>
<author>
<name sortKey="Sher Khan, T" uniqKey="Sher Khan T">T Sher Khan</name>
</author>
<author>
<name sortKey="Brady, Sg" uniqKey="Brady S">SG Brady</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rydin, C" uniqKey="Rydin C">C Rydin</name>
</author>
<author>
<name sortKey="K Llersjo, M" uniqKey="K Llersjo M">M Källersjö</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lanyon, Sm" uniqKey="Lanyon S">SM Lanyon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Efron, B" uniqKey="Efron B">B Efron</name>
</author>
<author>
<name sortKey="Halloran, E" uniqKey="Halloran E">E Halloran</name>
</author>
<author>
<name sortKey="Holmes, S" uniqKey="Holmes S">S Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Swofford, Dl" uniqKey="Swofford D">DL Swofford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Dezulian, T" uniqKey="Dezulian T">T Dezulian</name>
</author>
<author>
<name sortKey="Rausch, C" uniqKey="Rausch C">C Rausch</name>
</author>
<author>
<name sortKey="Richter, D" uniqKey="Richter D">D Richter</name>
</author>
<author>
<name sortKey="Rupp, R" uniqKey="Rupp R">R Rupp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Bryant, D" uniqKey="Bryant D">D Bryant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holland, B" uniqKey="Holland B">B Holland</name>
</author>
<author>
<name sortKey="Moulton, V" uniqKey="Moulton V">V Moulton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grimm, Gw" uniqKey="Grimm G">GW Grimm</name>
</author>
<author>
<name sortKey="Renner, Ss" uniqKey="Renner S">SS Renner</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Hemleben, V" uniqKey="Hemleben V">V Hemleben</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holland, B" uniqKey="Holland B">B Holland</name>
</author>
<author>
<name sortKey="Huber, Kt" uniqKey="Huber K">KT Huber</name>
</author>
<author>
<name sortKey="Moulton, V" uniqKey="Moulton V">V Moulton</name>
</author>
<author>
<name sortKey="Lockhart, Pj" uniqKey="Lockhart P">PJ Lockhart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nandi, Oi" uniqKey="Nandi O">OI Nandi</name>
</author>
<author>
<name sortKey="Chase, Mw" uniqKey="Chase M">MW Chase</name>
</author>
<author>
<name sortKey="Endress, Pk" uniqKey="Endress P">PK Endress</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qiu, Yl" uniqKey="Qiu Y">YL Qiu</name>
</author>
<author>
<name sortKey="Lee, Jh" uniqKey="Lee J">JH Lee</name>
</author>
<author>
<name sortKey="Bernasconi Quadroni, F" uniqKey="Bernasconi Quadroni F">F Bernasconi-Quadroni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soltis, De" uniqKey="Soltis D">DE Soltis</name>
</author>
<author>
<name sortKey="Senters, Ae" uniqKey="Senters A">AE Senters</name>
</author>
<author>
<name sortKey="Zanis, Mj" uniqKey="Zanis M">MJ Zanis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, S" uniqKey="Kim S">S Kim</name>
</author>
<author>
<name sortKey="Soltis, De" uniqKey="Soltis D">DE Soltis</name>
</author>
<author>
<name sortKey="Soltis, Ps" uniqKey="Soltis P">PS Soltis</name>
</author>
<author>
<name sortKey="Zanis, Mj" uniqKey="Zanis M">MJ Zanis</name>
</author>
<author>
<name sortKey="Suh, Y" uniqKey="Suh Y">Y Suh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pattengale, Nd" uniqKey="Pattengale N">ND Pattengale</name>
</author>
<author>
<name sortKey="Alipour, M" uniqKey="Alipour M">M Alipour</name>
</author>
<author>
<name sortKey="Bininda Emonds, Orp" uniqKey="Bininda Emonds O">ORP Bininda-Emonds</name>
</author>
<author>
<name sortKey="Moret, Bme" uniqKey="Moret B">BME Moret</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
<author>
<name sortKey="Komornik, Z" uniqKey="Komornik Z">Z Komornik</name>
</author>
<author>
<name sortKey="Berger, Sa" uniqKey="Berger S">SA Berger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, Sa" uniqKey="Berger S">SA Berger</name>
</author>
<author>
<name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Evol Bioinform Online</journal-id>
<journal-id journal-id-type="publisher-id">101256319</journal-id>
<journal-title-group>
<journal-title>Evolutionary Bioinformatics Online</journal-title>
</journal-title-group>
<issn pub-type="epub">1176-9343</issn>
<publisher>
<publisher-name>Libertas Academica</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">20535232</article-id>
<article-id pub-id-type="pmc">2880847</article-id>
<article-id pub-id-type="publisher-id">ebo-2010-073</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>Alexandros</given-names>
</name>
<xref ref-type="aff" rid="af1-ebo-2010-073">1</xref>
<xref ref-type="corresp" rid="c1-ebo-2010-073"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Göker</surname>
<given-names>Markus</given-names>
</name>
<xref ref-type="aff" rid="af2-ebo-2010-073">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Grimm</surname>
<given-names>Guido W.</given-names>
</name>
<xref ref-type="aff" rid="af3-ebo-2010-073">3</xref>
</contrib>
</contrib-group>
<aff id="af1-ebo-2010-073">
<label>1</label>
The Exelixis Lab, Dept. of Computer Science, Technische Universität München, Germany</aff>
<aff id="af2-ebo-2010-073">
<label>2</label>
German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany</aff>
<aff id="af3-ebo-2010-073">
<label>3</label>
Department of Palaeobotany, Swedish Museum of Natural History, Stockholm, Sweden</aff>
<author-notes>
<corresp id="c1-ebo-2010-073">Email:
<email>stamatak@cs.tum.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>5</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="collection">
<year>2010</year>
</pub-date>
<volume>6</volume>
<fpage>73</fpage>
<lpage>90</lpage>
<permissions>
<copyright-statement>© 2010 the author(s), publisher and licensee Libertas Academica Ltd.</copyright-statement>
<copyright-year>2010</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene
<italic>rbcL</italic>
alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene
<italic>rbcL</italic>
alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.</p>
</abstract>
<kwd-group>
<kwd>RAxML</kwd>
<kwd>phylogenetic inference</kwd>
<kwd>many taxon analyses</kwd>
<kwd>taxon jackknifing</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>At present phylogenetic inference using statistical models of evolution has come of age and several novel, fast, and accurate likelihood-based phylogenetic inference programs such as GARLI,
<xref ref-type="bibr" rid="b1-ebo-2010-073">1</xref>
RAxML,
<xref ref-type="bibr" rid="b2-ebo-2010-073">2</xref>
<xref ref-type="bibr" rid="b4-ebo-2010-073">4</xref>
IQPNNI,
<xref ref-type="bibr" rid="b5-ebo-2010-073">5</xref>
PHYML,
<xref ref-type="bibr" rid="b6-ebo-2010-073">6</xref>
,
<xref ref-type="bibr" rid="b7-ebo-2010-073">7</xref>
Tree-Finder,
<xref ref-type="bibr" rid="b8-ebo-2010-073">8</xref>
using maximum likelihood (ML), or Bayesian programs like for instance MrBayes
<xref ref-type="bibr" rid="b9-ebo-2010-073">9</xref>
,
<xref ref-type="bibr" rid="b10-ebo-2010-073">10</xref>
have become available. One key question that arises is up to which number of taxa they scale on single-gene alignments with respect to accuracy because of the comparatively weak phylogenetic signal.</p>
<p>Despite the increasing popularity of phylogenomic analyses comprising up to 150 genes
<xref ref-type="bibr" rid="b11-ebo-2010-073">11</xref>
(see also refs.
<xref ref-type="bibr" rid="b12-ebo-2010-073">12</xref>
<xref ref-type="bibr" rid="b16-ebo-2010-073">16</xref>
for recent examples of such studies), which has stimulated a growing controversy about their assembly
<xref ref-type="bibr" rid="b17-ebo-2010-073">17</xref>
<xref ref-type="bibr" rid="b19-ebo-2010-073">19</xref>
and the choice of appropriate models as well as partitioning schemes,
<xref ref-type="bibr" rid="b20-ebo-2010-073">20</xref>
,
<xref ref-type="bibr" rid="b21-ebo-2010-073">21</xref>
the inference of trees based on large single-gene alignments, e.g.,
<xref ref-type="bibr" rid="b22-ebo-2010-073">22</xref>
still remains an important issue for two reasons:
<italic>Firstly</italic>
, for many organismic groups comprehensive sequence data that provide a sufficiently dense taxon sampling are only available for commonly used gene markers such as
<italic>rbcL</italic>
for green plants, a small number of mitochondrial genes for animals, and the large subunit and small subunit ribosomal RNA genes for various unicellular organisms.
<italic>Secondly</italic>
, the addition of genes (increased gene sampling) leads to extremely “gappy” alignments that typically contain more than 70% of gaps due to unsampled genes. For instance, the datasets used by McMahon and Sanderson
<xref ref-type="bibr" rid="b12-ebo-2010-073">12</xref>
exhibit a gappyness of 90%. Thus, only little supplementary signal is provided by addition of more genes
<xref ref-type="bibr" rid="b18-ebo-2010-073">18</xref>
at the expense of significantly larger data matrices and a scarce taxon sampling. The information content of the data does not increase linearly with the alignment length, and more importantly, the computational cost. The high computational cost is generated by the extremely large inference times and memory footprints of the alignments under the widely used
<xref ref-type="bibr" rid="b23-ebo-2010-073">23</xref>
General Time Reversible (GTR) model
<xref ref-type="bibr" rid="b24-ebo-2010-073">24</xref>
of nucleotide substitution combined with the gamma (Γ) model of rate heterogeneity.
<xref ref-type="bibr" rid="b25-ebo-2010-073">25</xref>
Analyses of such datasets typically require supercomputing resources like the IBM BlueGene/L or the SGI Altix
<xref ref-type="bibr" rid="b26-ebo-2010-073">26</xref>
,
<xref ref-type="bibr" rid="b27-ebo-2010-073">27</xref>
which are difficult to exploit for most taxonomists (biologists). Current phylogenomic analysis projects with RAxML required 89 GB of main memory and 2.25 million CPU hours on an IBM BlueGene/L supercomputer. In addition, there is a realistic chance that such large-scale analyses on supercomputers might be limited by high energy costs in the near future. On the other hand, the recent introduction of a rapid Bootstrapping (BS)
<xref ref-type="bibr" rid="b28-ebo-2010-073">28</xref>
algorithm in RAxML version 7.0.4
<xref ref-type="bibr" rid="b29-ebo-2010-073">29</xref>
allows for full, i.e., more than 100 BS replicates and a thorough search for the best-scoring ML tree, large-scale phylogenetic analyses on single-gene datasets of more than 1,000 taxa within a couple of days on a modern desktop computer. While the accuracy of phylogenetic reconstruction depends on the sequence length of the alignment
<xref ref-type="bibr" rid="b30-ebo-2010-073">30</xref>
,
<xref ref-type="bibr" rid="b31-ebo-2010-073">31</xref>
and ML is consistent if the sequence length goes to infinity,
<xref ref-type="bibr" rid="b32-ebo-2010-073">32</xref>
it still remains crucial to explore the scalability limits for current single-gene alignments because of the aforementioned reasons. Due to the immense computational resource requirements in terms of random access memory and number of CPUs there is a clear trade-off: One can either compute trees with many taxa, e.g.,
<xref ref-type="bibr" rid="b22-ebo-2010-073">22</xref>
,
<xref ref-type="bibr" rid="b33-ebo-2010-073">33</xref>
or with many genes, e.g.,
<xref ref-type="bibr" rid="b14-ebo-2010-073">14</xref>
but not both, i.e., one needs to choose between dense taxon sampling and dense gene sampling.</p>
<p>Finally, the discussion on the impact of appropriate taxon sampling
<xref ref-type="bibr" rid="b34-ebo-2010-073">34</xref>
on results of phylogenetic analyses tends to be neglected, despite recent findings that phylogeny reconstruction is more susceptible to incomplete taxon than to incomplete gene sampling (see ref.
<xref ref-type="bibr" rid="b35-ebo-2010-073">35</xref>
for a review).</p>
<p>Another problem that can potentially be resolved by increasing the number of terminal accessions is the selection (and taxon density) of outgroup(s) used in phylogenetic studies. Outgroup selection may bias subtree topologies by placing a false root, e.g.,
<xref ref-type="bibr" rid="b36-ebo-2010-073">36</xref>
which can be prevented by using all available (and “alignable”) sequence data to assemble a dense outgroup that contains multiple organisms.</p>
<p>Due to the undertaken systematic efforts by studies of single- or few-gene genealogies based on dense taxon samples as well as on multigene phylogenies, many (terminal) clades in contemporary systematics can be considered to be well established. For instance, this is the case for the currently accepted families and orders of angiosperms,
<xref ref-type="bibr" rid="b37-ebo-2010-073">37</xref>
<xref ref-type="bibr" rid="b44-ebo-2010-073">44</xref>
among others. However, some interclade relationships remain unresolved.
<xref ref-type="bibr" rid="b14-ebo-2010-073">14</xref>
,
<xref ref-type="bibr" rid="b38-ebo-2010-073">38</xref>
,
<xref ref-type="bibr" rid="b42-ebo-2010-073">42</xref>
,
<xref ref-type="bibr" rid="b43-ebo-2010-073">43</xref>
Thus, although one is not necessarily interested in obtaining additional support to merely confirm well-established clades, the usage of placeholder taxa for such clades can potentially bias results because of insufficient taxon sampling and hence a lack of sufficient phylogenetic signal from these clades. The impact of taxon sampling on the inferred topology has been demonstrated in several cases, in particular for angiosperms.
<xref ref-type="bibr" rid="b34-ebo-2010-073">34</xref>
,
<xref ref-type="bibr" rid="b45-ebo-2010-073">45</xref>
,
<xref ref-type="bibr" rid="b46-ebo-2010-073">46</xref>
</p>
<p>We therefore address the important issue of scalability of ML to large single-gene alignments via an empirical study of the widely used
<italic>rbcL</italic>
gene for which currently over 26,000 sequences are available (NCBI GenBank query, 08/17/2008). We assembled several large alignments for eudicots (excluding euasterids, represented by c. 3,500 additional
<italic>rbcL</italic>
sequences), rosids, eurosids I, and eurosids II, containing 3,490, 2,259, 1,590, and 436 taxa respectively. The fact, that the rosids are a subset of the eudicots and that eurosids I and II are nested within the rosids, allowed us to assess the scalability of ML on these datasets. In addition to these comprehensive large-scale analyses, we also examined an alternative approach that uses Group-based Randomized Taxon Subsampling (GRTS) and allows for comparisons of trees with varying taxon numbers while maintaining the relative taxon composition. In contrast, the commonly known taxon jackknifing approach as introduced by Lanyon
<xref ref-type="bibr" rid="b47-ebo-2010-073">47</xref>
applies agnostic taxon subsampling; this leads to distinct sets of leaves in each jackknife replicate, which hinders computation of consensus trees and bipartition frequencies using currently available software. Also, Lanyon
<xref ref-type="bibr" rid="b47-ebo-2010-073">47</xref>
only removed a single leaf per replicate, which rather corresponds to a leave-one-out experiment. Hence, his approach is not well-suited for improving scalability. We here compare topologies as well as bipartition frequencies obtained via GRTS, which is primarily used as a vehicle to assess scalability of ML, to those obtained by straightforward comprehensive analyses of the aforementioned datasets.</p>
<p>The main objective of this paper is to assess how good comprehensive ML analyses (combined tree inference and bootstrapping analyses) using RAxML scale up to alignments that contain several thousand taxa by example of single
<italic>rbcL</italic>
genes. We use RAxML as a typical representative of modern ML-based algorithms, that, similarly to GARLI and the most recent version of PHYML, deploys an implementation of lazy SPR (Subtree Prunung and Re-grafting) moves. While RAxML, GARLI, and PHYML typically yield trees that are not significantly different from each other, based on the standard statistical significance tests, RAxML on average returns trees with the best-known likelihood values on datasets of more than 1,000 to 2,000 taxa. Nonetheless, the results obtained here with RAxML, are qualitatively very similar to those that could be obtained via GARLI or PHYML, because all programs deploy comparable search mechanisms.</p>
<p>The results of the
<italic>rbcL</italic>
analyses are compared to previous results from the literature in the on-line supplements. We demonstrate that: (i) ML scales well on large single-gene matrices. (ii) GRTS using well-established groups as TU deserves further investigation to assess its potential for resolving higher level phylogenetic relationships. (iii) Analyses of densely sampled rbcL data are informative and in good agreement with “multigene analyses”.</p>
</sec>
<sec sec-type="materials|methods">
<title>Material and Methods</title>
<sec>
<title>RbcL alignment assembly</title>
<p>Gene bank data was accessed in spring 2007 via the NCBI GenBank taxonomy portal. Searches and downloads of
<italic>rbcL</italic>
data were conducted at the ordinal level as provided by the NCBI taxonomy. Comprehensive alignments were assembled as follows: Initially, subalignments for each order (or several small orders as well as taxa not included in current orders) were constructed using the Clustal V algorithm as implemented in MegAlign (DNA Star Software package, LaserGene, Madison, WI, USA). These subalignments were visually inspected for apparent sequence artifacts: the
<italic>rbcL</italic>
is highly length-conserved among angiosperms, hence, any gap (or additional base) can be considered to be an artifact or to represent a pseudogene. Sequences containing a high degree of artifacts were eliminated. For the use with GRTS, the sequence name labels were transformed into a 5-digit code, followed by the gene bank accession number. The first three letters of the 5-digit code indicate the family or genus, to accommodate taxa that have not been assigned to a family, sensu APG II;
<xref ref-type="bibr" rid="b39-ebo-2010-073">39</xref>
the last two letters designate the order (or family or genus) sensu APG II, amended by information retrieved from the Angiosperm Phylogeny Website
<xref ref-type="bibr" rid="b38-ebo-2010-073">38</xref>
(APW) and additional web and print resources on taxonomy provided via APW. (Labeling was not automated since the current NCBI taxonomy contains several inconsistencies at the family and order level compared to taxonomic-systematic resources such as APG II and APW. A respective list of such inconsistencies can be found in the online appendix.) Note that, the coding reflects the systematic affinity of the
<italic>organism</italic>
and not the
<italic>sequence</italic>
, hence, we did not correct for misnamed/mislabeled sequences (see Results) at this stage of the analysis process. In a second step, the subalignments were successively merged (nested) into more comprehensive alignments, which were then used to conduct the phylogenetic analyses. The alignments contain the entire
<italic>rbcL</italic>
data for eurosids I (EURO1 matrix), eurosids II (EURO2), rosids (ROSID), and eudicots except euasterids (EUDIS). All respective NEXUS and PHYLIP alignment files are available for download at
<ext-link ext-link-type="uri" xlink:href="http://wwwkramer.in.tum.de/exelixis/rbcl.tar.bz2">http://wwwkramer.in.tum.de/exelixis/rbcl.tar.bz2</ext-link>
.</p>
</sec>
<sec>
<title>Comprehensive ML analyses</title>
<p>Comprehensive analyses of the full datasets were conducted with RAxML-VI-HPC version 2.2.3. (The most recent version 7.2.6 was not available at the time the phylogenetic inferences for this paper were conducted.) For each alignment we inferred 100 BS trees and conducted 20 ML searches to determine the best-scoring ML tree on distinct randomized stepwise addition MP starting trees using GTR and the CAT approximation of rate heterogeneity.
<xref ref-type="bibr" rid="b48-ebo-2010-073">48</xref>
This approximation serves as an efficient computational workaround for the significantly more memory- and floating point-intensive standard Γ model of rate heterogeneity. The CAT approximation simply provides a means to rapidly navigate into portions of the topological search space where tree topologies score well under GTR + Γ. The computations were conducted on the CIPRES (Cyberinfrastructure for Phylogenetic Research project,
<ext-link ext-link-type="uri" xlink:href="http://www.phylo.org">http://www.phylo.org</ext-link>
) project cluster located at the San Diego Supercomputer Center (SDSC) that is equipped with 16 8-way 2.4 Ghz AMD Opteron nodes and on the infiniband cluster located at the Technische Universität München (TUM) that comprises 36 4-way 2.4 GHz AMD Opteron nodes. We denote the BS support values obtained from the comprehensive analyses as
<italic>CA-BS.</italic>
</p>
<p>Finally, we used the rapid Bootstrapping algorithm implemented in RAxML version 7.0.4
<xref ref-type="bibr" rid="b29-ebo-2010-073">29</xref>
in combination with a Perl-script to assess the effect and applicability of the double Bootstrap procedure
<xref ref-type="bibr" rid="b49-ebo-2010-073">49</xref>
on the large 3,490 sequence alignment. Here we used the naïve approach, i.e., we computed 100 second-level BS analyses on 100 first-level BS replicates which amount to a total of 10,000 BS analyses. This analysis was carried out on 128 CPUs located at the Technische Universität München over a weekend. This first assessment of a double bootstrap procedure on a large single-gene dataset was included as an alternative to GRTS to evaluate whether it can be used to improve support values for such large analyses.</p>
</sec>
<sec>
<title>Group-based randomized taxon subsampling (GRTS) analyses</title>
<p>The group-based randomized taxon subsampling (GRTS) procedure was implemented via appropriate Perl-scripts. As mentioned above, taxon names in the alignments were assigned in a way such that grouping information, a taxonomic unit (TU), is encoded by certain characters in the taxon name. This grouping information was then used to reduce alignment sizes via directed taxon jackknifing with various alignment size reduction factors ranging from 1/2, 1/4, 1/8, down to 1/64 (where applicable) of the original number of taxa (procedure illustrated in
<xref ref-type="fig" rid="f1-ebo-2010-073">Fig. 1</xref>
). For instance, an alignment of 1,024 taxa will successively be reduced via taxon jackknifing to sizes of 512, 256, 128, ..., 16 taxa. However, since we conduct group-based taxon jackknifing (based on the meta-information contained in the sequence names) in order to maintain the taxon diversity and composition during the reduction process, every taxonomic group will be reduced proportionally to the number of representatives in the original alignment. For example, if a group has 256 members in an original 1,024 sequence alignment, it will successively be reduced to 128, 64, 32, ..., 4 members in the GRTS samples. This means that the jackknifing process is not completely conducted at random, but maintains the structure of the original alignment in terms of its taxonomic breadth and composition. In addition, the reduction factor was limited in such a way that at least two sequences were sampled per predefined group. The assignment of orders in addition to families as TU (Taxonomic Unit) ensured that the majority of TU comprised enough members to allow for application of large reduction factors such as 1/32 or 1/64. For each alignment and each applicable reduction factor we computed 100 replicates. For each of those 100 replicates we inferred 10 ML trees using the GTR + CAT approximation and determined the respective best-scoring tree under GTR + Γ. Thus, for every reduction factor and dataset, we computed 100 best-scoring ML trees for distinct randomized group-based subsamples. Finally, in order to investigate the effect of the GRTS approach on bipartition support values, we also conducted 100 standard BS replicates for only 10 out of 100 GRTS replicates per reduction factor (1,000 trees per dataset and reduction factor), in order to keep the computational requirements within acceptable limits. However, for a reduction factor of 1/32 we also computed 100 BS replicates on all 100 GRTS replicates (a total of 10,000 trees per dataset) to assess the effect of using all 100 GRTS replicates instead of only 10. This effect was, however, negligible. We carried out BS analyses of GRTS replicates for the three smaller alignments (EURO1, EURO2, ROSID) with reduction factors of 1/4, 1/8, ..., 1/64 (were applicable). The rationale behind the GRTS approach is to reduce those large alignments to subalignments with a significantly lower and hence, more favorable number of taxa for ML-based analyses with respect to the signal in the data and the NP-hard optimization problem, while maintaining the relative taxon composition. This can be regarded as zooming into the alignment while maintaining the relative taxon composition. This allows us to better assess the scalability of the tree search algorithms and conduct topological comparisons with respect to the placement of well-established groups. At the same time this provides a mechanism to assess the change (decrease/increase) of BS support values with increasing number of taxa. Henceforth, we denote GRTS support values obtained from the ML-searches on replicates (100 trees for 100 GRTS replicates) as
<italic>GRTS-ML</italic>
and GRTS support values obtained from 100 BS replicates on 10 GRTS replicates as
<italic>GRTS-BS.</italic>
</p>
</sec>
<sec>
<title>Result analyses: comparing trees with distinct sets of leaves</title>
<p>We implemented three distinct tree comparison methods to extract comparable bipartition support values from the respective collections of trees:
<list list-type="roman-lower">
<list-item>
<p>a comparison (
<italic>CA-BS</italic>
with
<italic>CA-BS</italic>
) of the nested trees obtained via the comprehensive analyses</p>
</list-item>
<list-item>
<p>a comparison of
<italic>GRTS-ML</italic>
with
<italic>GRTS-BS</italic>
replicates among each other</p>
</list-item>
<list-item>
<p>a comparison of
<italic>GRTS-ML/GRTS-BS</italic>
replicates with
<italic>CA-BS</italic>
replicates</p>
</list-item>
</list>
</p>
<p>In order to compare
<italic>CA-BS</italic>
values from the comprehensive analyses with different nested sets of leaves, e.g., a eudicot tree with a rosid tree, leaves not present in the less comprehensive rosid tree were pruned from the larger eudicot phylogeny in all
<italic>CA-BS</italic>
replicates. “Nested” means in this context that all taxa in the smaller trees and alignments are also contained in the respective larger and more comprehensive trees. With respect to the taxon label sets (TLS) induced by the trees and alignments we have:
<disp-formula>
<mml:math id="M1">
<mml:mrow>
<mml:mtext>TLS</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>eudicots</mml:mtext>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo></mml:mo>
<mml:mo></mml:mo>
<mml:mo></mml:mo>
<mml:mtext>TLS</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>rosids</mml:mtext>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo></mml:mo>
<mml:mo></mml:mo>
<mml:mo></mml:mo>
<mml:mtext>TLS</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>eurosids</mml:mtext>
<mml:mo></mml:mo>
<mml:mtext>I</mml:mtext>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>/</mml:mo>
<mml:mtext>TLS</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>eurosids</mml:mtext>
<mml:mo></mml:mo>
<mml:mtext>II</mml:mtext>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>To prune the trees we used our own script newick.tcl that is freely available at
<ext-link ext-link-type="uri" xlink:href="http://www.goeker.org/mg/distance/">http://www.goeker.org/mg/distance/</ext-link>
. This script is equivalent to the DELETE/PRUNE command in PAUP
<sup>*</sup>
.
<xref ref-type="bibr" rid="b50-ebo-2010-073">50</xref>
</p>
<p>The comparison of trees inferred via
<italic>GRTS-BS</italic>
and
<italic>GRTS-ML</italic>
is slightly more complicated because each taxonomic unit (TU) that is used for sampling is represented by a distinct set of sequence name labels in every GRTS replicate. Moreover, the representatives of a TU, which reflect an accepted angiosperm order or family, are not necessarily recognized as being monophyletic based on
<italic>rbcL</italic>
data alone. To reduce each TU to a single leaf representing this TU in every tree, that is, to extract the “big picture”, we used the following algorithm: Each homogeneous subtree that only comprises members of a single TU is reduced to a single leaf and the number of leaves in this homogeneous subtree is stored. If there is more than a single subtree per TU, i.e., if the TU is not monophyletic within the tree, all homogeneous subtrees for this TU except the largest one are pruned. The rationale for this is that the largest homogeneous subtree of a TU is most likely the best representative for the specific TU and will most probably contain true (e.g., non-mislabeled) members of this TU. On the other hand small and deviating clades, most likely comprise sequences from misidentified specimens or from wetlab artifacts. For comparison of
<italic>GRTS-BS/GRTS-ML</italic>
replicates with trees from comprehensive
<italic>CA-BS</italic>
analyses, the
<italic>CA-BS</italic>
trees as well as
<italic>GRTS-BS</italic>
and
<italic>GRTS-ML</italic>
were reduced to TU trees using the same topological reduction algorithm.</p>
<p>If several equally large (maximum size) homogeneous subtrees exist for a TU, one of them is chosen at random. The rationale for this random selection is that GRTS, like bootstrapping, includes random decisions and that the deviations induced by applying the above algorithm will average out if a sufficiently large number of GRTS samples is used for comparison.</p>
<p>This topology reduction algorithm is also implemented in the newick.tcl script, which includes a flexible mechanism to recognize the TU assignments of sequences that are encoded in their labels.</p>
<p>After application of these transformations for the three types of comparisons (
<italic>CA-BS</italic>
versus
<italic>CA-BS</italic>
,
<italic>GRTS-BS</italic>
versus
<italic>GRTS-ML</italic>
,
<italic>GRTS-ML/GRTS-BS</italic>
versus
<italic>CA-BS</italic>
) to the respective replicates in order to obtain collections of trees with consistent leave sets of equal size, we computed the Pearson correlation coefficient ρ between
<italic>all</italic>
bipartition frequencies, induced by the respective replicate sets (via the respective RAxML command line switch −f m). In addition, we computed the Pearson correlation coefficient, the slope, and the offset using an appropriate Perl script and the respective RAxML option (−f b) on the respective best-scoring ML trees with support values. This allowed us to compare scalability of support values induced by
<italic>CA-BS</italic>
analyses on the best-scoring ML trees of the respective smaller nested datasets. Finally, we computed the extended majority rule consensus tree (a bifurcating tree) for
<italic>CA-BS</italic>
replicates using consense from the PHYLIP package.</p>
</sec>
<sec>
<title>Visualization</title>
<p>We used the following tools to visualize the results of the comprehensive ML and BS, as well as
<italic>GRTS-BS</italic>
and
<italic>GRTS-ML</italic>
trees
<italic>,</italic>
analyses: Dendroscope
<xref ref-type="bibr" rid="b51-ebo-2010-073">51</xref>
(version 0.22) was used to draw and color the comprehensive ML trees; modules implemented in SplitsTree
<xref ref-type="bibr" rid="b52-ebo-2010-073">52</xref>
(versions 4.8 and 4.10) were used to analyze the bipartition patterns observed in the
<italic>CA-BS</italic>
and
<italic>GRTS-BS/GRTS-ML</italic>
analyses: The consensus network approach
<xref ref-type="bibr" rid="b53-ebo-2010-073">53</xref>
allowed us to visualize the distinct sets of tree replicates:
<italic>CA-BS</italic>
and
<italic>GRTS-BS/GRTS-ML</italic>
replicates were used to reconstruct splits graphs in which the length of each edge reflected the frequency of the respective bipartition in a sample of input trees (“bipartition network”, edge weight set to count);
<xref ref-type="bibr" rid="b54-ebo-2010-073">54</xref>
or in which the length of each edge corresponded to the mean branch length over all replicates (“confidence network”, edge weight set to mean).
<xref ref-type="bibr" rid="b55-ebo-2010-073">55</xref>
In the latter case, only those edges are represented in the splits graph occurring in more than a defined percentage of the replicate trees. For example, a 25% confidence network represents all splits that occur in more than 25% of the (BS/GRTS) replicates.</p>
</sec>
</sec>
<sec sec-type="results|discussion">
<title>Results and Discussion</title>
<sec>
<title>Scalability of nested comprehensive ML analyses</title>
<p>
<xref ref-type="fig" rid="f2-ebo-2010-073">Figure 2</xref>
shows a condensed representation (family-level) of the ML tree inferred from the EUDIS matrix; the full tree as well as trees inferred from the other three matrices can be accessed via
<ext-link ext-link-type="uri" xlink:href="http://www.kramer.in.tum.de/rbcl.html">http://www.kramer.in.tum.de/rbcl.html</ext-link>
. (This link provides also the complete results from the
<italic>CA-BS, GRTS-BS</italic>
, and
<italic>GRTS-ML</italic>
analyses.) Most orders and families represented by more than one
<italic>rbcL</italic>
accession (sensu APG II; in total 143 taxa) formed clades in the comprehensive ML trees and received moderate to high BS support by
<italic>CA-BS</italic>
, which highlights the systematic value of
<italic>rbcL</italic>
sequences for angiosperm systematics at the ordinal and subordinal level.
<xref ref-type="bibr" rid="b37-ebo-2010-073">37</xref>
,
<xref ref-type="bibr" rid="b44-ebo-2010-073">44</xref>
,
<xref ref-type="bibr" rid="b56-ebo-2010-073">56</xref>
<xref ref-type="bibr" rid="b59-ebo-2010-073">59</xref>
Exceptions were mainly due to mislabeled sequences or sequences representing organisms of controversial systematic affiliation. (Details are provided in
<xref ref-type="supplementary-material" rid="SD1">Online Supplement [OS] 1</xref>
.) Our
<italic>rbcL</italic>
data does not support the monophyly (
<italic>CA-BS</italic>
≥ 50; misplaced or controversial sequences not considered) of 20 families and one genus (
<italic>Nelumbo</italic>
; monotypic Nelumbaceae) that have been defined as TUs (putatively monophyletic according to APG II and APW; details provided in
<xref ref-type="supplementary-material" rid="SD2">Online Supplement [OS] 2</xref>
). However, members of such TUs occasionally formed clades in the best-known ML trees as well as in the majority of GRTS-based ML trees (see below). Of the 31 currently accepted order-level TUs (orders and unplaced families) covered by our data, 25 received moderate to high
<italic>CA-BS</italic>
support (Table S1 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
).</p>
<p>The Pearson correlation coefficient
<italic>ρ</italic>
between
<italic>CA-BS</italic>
values from the distinct nested datasets, calculated after pruning leaves down to the leave set of the smaller tree from the respective larger and more comprehensive tree via newick.tcl, are shown in
<xref ref-type="table" rid="t1-ebo-2010-073">Table 1</xref>
. For the
<italic>CA-BS</italic>
analyses we do not prune down trees to TUs, but just prune down the respective larger trees to the taxon set of the nested smaller trees. Column
<italic># bipartitions</italic>
in
<xref ref-type="table" rid="t1-ebo-2010-073">Table 1</xref>
provides the number of bipartitions induced by the respective (pruned-down) tree collections. Column
<italic>ρ-best</italic>
provides the correlations of the support values on the respective smaller, best-scoring tree, e.g., the correlation between
<italic>CA-BS</italic>
values of the pruned-down eudicot replicates and the rosid replicates, drawn on the best-scoring rosid ML tree. The computed slope and the offset for the comparisons of support values on the best-scoring trees showed only insignificant variations. The slope lies between 1.0019 and 0.9861, while the offset varies between −0.4087 and 1.8442. The average support on the respective best-scoring ML trees for the four datasets (original support values
<italic>and</italic>
pruned-down support values) is highly stable regardless of the number of taxa, and varies between 59.90 and 61.29. This also holds for the average support (80.98 and 81.66) on the respective extended majority rule (binary) consensus trees extracted from the replicates. The higher average support on
<italic>CA-BS</italic>
consensus trees compared to the best-scoring ML trees is not surprising, since the extended majority rule consensus algorithm in essence just maximizes the average support for a given set of bipartitions. In general, correlations appear to slightly decrease with increasing differences in the number of leaves, but are nonetheless very high (minimum: 0.987 for all bipartitions; 0.982 for bipartitions on best-scoring tree). In addition, the number of bipartitions induced by the pruned-down trees from the respective large datasets is slightly higher (2.5%) than for the nonpruned datasets.</p>
<p>The overall high correlation coefficients show that the number of taxa in the alignments, which reflects the taxonomic breadth of the data set, has little effect on the bipartitions induced by the BS analyses. In other words, the support of a node defining any eurosid I clade X, is not influenced by inclusion of eurosids II, other rosids, or other eudicots in the alignment. This is in agreement with the observation that, with respect to branch support, a large and dense outgroup provides a better subtree rooting than a sparse outgroup.
<xref ref-type="bibr" rid="b36-ebo-2010-073">36</xref>
However, there is a slightly higher correlation among the rosids, eurosids I, and eurosids II data sets than within the eudicots data set. The inclusion of most eudicots (outgroups from the perspective of eurosids I, eurosids II, and rosids) has some, although small, effect on the BS support of nodes within the rosids, eurosids I, and eurosids II subclades (Tables S1–S3 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). There are three possible explanations: First, it might be that the effect of the less favorable number of taxa to number of base pair ratio becomes more prevalent.
<xref ref-type="bibr" rid="b30-ebo-2010-073">30</xref>
,
<xref ref-type="bibr" rid="b31-ebo-2010-073">31</xref>
As a consequence, ‘correct’ relationships, according to the
<italic>rbcL</italic>
genealogy, are less resolved. Second, exactly the reverse phenomenon might occur: the eurosids I, eurosids II, and rosids analyses might have yielded support for ‘incorrect’ bipartitions, which were correctly resolved in the eudicots analyses due to the more comprehensive taxon sampling. Third, the support of a few isolated nodes may vary significantly depending on the underlying data. A case-by-case investigation of changing support with reference to the well established phylogenetic framework for Angiosperms shows that all potential explanations apply (
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
).</p>
<p>The general agreement between the inferred topologies and current knowledge (outlined in
<xref ref-type="fig" rid="f2-ebo-2010-073">Figs. 2</xref>
,
<xref ref-type="fig" rid="f3-ebo-2010-073">3</xref>
), the comparably high
<italic>CA-BS</italic>
support of commonly accepted nodes (
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
), and the general high correlation (
<xref ref-type="table" rid="t1-ebo-2010-073">Table 1</xref>
) indicates that ML scales well in the studied case of large
<italic>rbcL</italic>
data sets. However, deeper (interordinal or backbone) and well-supported relationships based on several to many genes, e.g.,
<xref ref-type="bibr" rid="b14-ebo-2010-073">14</xref>
,
<xref ref-type="bibr" rid="b43-ebo-2010-073">43</xref>
,
<xref ref-type="bibr" rid="b44-ebo-2010-073">44</xref>
generally lack
<italic>CA-BS</italic>
support from
<italic>rbcL</italic>
alone (Table S3 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). Only the nitrogen fixing clade, received moderate BS support (
<xref ref-type="fig" rid="f3-ebo-2010-073">Fig. 3</xref>
;
<italic>CA-BS</italic>
<sub>All matrices</sub>
= 54).</p>
</sec>
<sec>
<title>The bootstrap of the bootstrap on 3,491 eudicots rbcL sequences</title>
<p>We mainly assessed the effects of applying the double BS procedure to large single-gene datasets since it had become computationally feasible due to the recent implementation of a rapid bootstrapping algorithm in RAxML
<xref ref-type="bibr" rid="b29-ebo-2010-073">29</xref>
and had never been tested in practice on large datasets before. We therefore conducted an empirical assessment of this approach to determine if it can be deployed as an alternative to improve the quality of support values on large trees. Despite the theoretically favorable statistical properties of second-level bootstrap procedures, in practice, and in particular on large single-gene alignments, a double bootstrap procedure does not appear to be applicable. The main reason is the significant reduction of the number of distinct (unique) alignment column patterns with respect to the original alignment and hence, phylogenetic signal in the BS replicates and to an even larger extent in the second-level BS replicates. In our experiments with the large eudicot dataset, the original alignment has 1,370 distinct column patterns, while the 100 first level bootstrap alignments have an average of 868 patterns per replicate and the 10,000 second level replicates only contain 641 patterns per replicate, less than half the length of the original alignment. In addition, first-level as well as second-level BS replicates contain a relatively high number of sequences that are exactly identical under the ML model, while the original alignment does not contain duplicate sequences. For first-level replicates there are on average 67 identical, thus essentially indistinguishable, sequences per replicate while for second-level BS replicates this number increases to 140 on average. Therefore, the phylogenetic signal contained in second-level BS replicates is reduced significantly and thus does not represent a good solution to infer support values for large single-gene analyses. This is depicted in
<xref ref-type="fig" rid="f4-ebo-2010-073">Figure 4</xref>
, where we plot the support values on the best-scoring ML tree of first-level BS replicates against support values from second-level BS analyses. The second-level BS values are significantly lower than first-level BS above a threshold of 75%. Thus, second-level BS does not scale as well as first-level BS (see results of
<italic>CA-BS</italic>
above) in the case of the herein analyzed data sets.</p>
</sec>
<sec>
<title>CA-BS versus GRTS-BS/GRTS-ML</title>
<p>With a reduction factor of 1/4, the EUDIS matrix was reduced to 440 terminal taxa by GRTS. In the case of our focus groups rosids (matrix ROSID) and eurosids I (matrix EUROS 1;
<xref ref-type="fig" rid="f5-ebo-2010-073">Fig. 5</xref>
), datasets were reduced down to 282 and 199 taxa respectively. Nodes that received high BS support (
<italic>CA-BS</italic>
≥ 70) based on the comprehensive data matrices were recovered in most (>50%) to all best-scoring
<italic>GRTS-ML</italic>
trees (Tables S1, S2 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). Representatives (subsamples) of the same TU clustered together when the same group was supported by moderate (>50) to high
<italic>CA-BS</italic>
values (
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). The consensus networks of the
<italic>GRTS-ML</italic>
trees indicated further relationships between predefined TUs that received low
<italic>CA-BS</italic>
(<50; example given in
<xref ref-type="fig" rid="f6-ebo-2010-073">Fig. 6</xref>
). For instance, large-scale multigene analyses
<xref ref-type="bibr" rid="b43-ebo-2010-073">43</xref>
,
<xref ref-type="bibr" rid="b44-ebo-2010-073">44</xref>
supported an eurosid I subclade including the Celastrales, Malpighiales, Huaceae, and Oxalidales.
<italic>CA-BS</italic>
support for this subclade is low, even when mislabeled sequences are not considered, but the representatives of these TU consistently group using
<italic>GRTS-ML</italic>
(Table S3 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). The high overall similarity between
<italic>CA-BS</italic>
and
<italic>GRTS-ML</italic>
based topologies, irrespective of the actual support values, can be visualized via consensus networks (bipartition networks) of the 100
<italic>GRTS-ML</italic>
trees, which depict the same relationships (
<xref ref-type="fig" rid="f5-ebo-2010-073">Fig. 5</xref>
), or alternative relationships, as indicated by
<italic>CA-BS</italic>
analyses of the comprehensive matrices (example shown in Fig. 6; see also
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
): Equally
<italic>CA-BS</italic>
supported topological alternatives could be found with a certain frequency also in the
<italic>GRTS-ML</italic>
replicates.</p>
<p>Pearson correlation coefficients between all bipartition support values obtained via
<italic>CA-BS</italic>
and
<italic>GRTS-ML</italic>
(reduction factor 1/4), all pruned-down to family-level TU trees, are shown in
<xref ref-type="table" rid="t2-ebo-2010-073">Table 2</xref>
. Column
<italic># bipartitions</italic>
indicates the total number of bipartitions induced by the pruned-down
<italic>CA-BS</italic>
trees and the GRTS trees. Correlations vary between 0.90 and 0.93 and are relatively high, but lower than for the
<italic>CA-BS</italic>
support values among the nested comprehensive datasets shown above. In
<xref ref-type="table" rid="t2-ebo-2010-073">Table 2</xref>
there is no prevalent tendency for the correlation to increase or decrease with increasing number of leaves in the
<italic>GRTS-ML</italic>
replicates. The number of bipartitions induced by the
<italic>GRTS-ML</italic>
replicates is significantly smaller than the respective number of bipartitions induced by the comprehensive trees, which has a direct effect on the average support on the pruned-down best-scoring ML trees from the comprehensive analyses:
<italic>GRTS-ML</italic>
support values (average: 61.16) are significantly higher than support values obtained via
<italic>CA-BS</italic>
(average: 53.78). This is also reflected by larger offsets between 9.48 and 11.05 compared to the offsets among support values induced by comparison of
<italic>CA-BS</italic>
between each other. Likewise, average support values as calculated from the majority rule consensus trees range between 82.15 and 87.69 for
<italic>GRTS-ML</italic>
, but between 77.65 and 83.10 for
<italic>CA-BS</italic>
. The reason for this is that
<italic>GRTS-ML</italic>
tends to be more decisive than
<italic>CA-BS</italic>
and to favor topological alternatives to different degrees (an example is given in
<xref ref-type="fig" rid="f6-ebo-2010-073">Fig. 6</xref>
). Such decisiveness could be positive or negative: Positive would mean that
<italic>GRTS-ML</italic>
has a higher chance than
<italic>CA-BS</italic>
to find support for ‘correct’ nodes; put in a negative context, the higher decisiveness could indicate that
<italic>GRTS-ML</italic>
exhibits a higher risk of yielding too high support for ‘wrong’ nodes. In our case, the effect appears to be positive rather than negative: The relationships indicated by
<italic>GRTS-ML</italic>
are in good agreement with the inferred comprehensive ML trees (e.g., example shown in
<xref ref-type="fig" rid="f5-ebo-2010-073">Fig. 5</xref>
) and earlier multigene analysis (Tables S1, S2 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). Using higher reduction factors (only feasible if order-level TUs are used; see below) seem to even increase the positive effect considering the recovered relationships.</p>
</sec>
<sec>
<title>Effects of different reduction factors</title>
<p>As mentioned in Material and Methods, numerous family-level TUs are represented by a few accessions only. Therefore, we used the order-level TUs to be able to apply larger reduction factors. For the ROSID matrix, order-level GRTS was based on 18 TUs (15 rosid orders, 2 families, one outgroup order/family), the respective subsets (matrices EURO1 and EURO2) contained nine and five TUs. The usage of these more coarse-grained TUs is feasible because, according to APG II as well as APW, the according clades are generally well supported (see also Table S1 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
and cited literature; for a review see Soltis and Soltis).
<xref ref-type="bibr" rid="b42-ebo-2010-073">42</xref>
</p>
<p>The effect of varying reduction factors on the correlation between
<italic>CA-BS</italic>
and
<italic>GRTS-ML</italic>
as well as
<italic>GRTS-BS</italic>
support values is shown in
<xref ref-type="table" rid="t3-ebo-2010-073">Tables 3</xref>
and
<xref ref-type="table" rid="t4-ebo-2010-073">4</xref>
, respectively, which also allows for a comparison between the
<italic>GRTS-BS</italic>
and
<italic>GRTS-ML</italic>
approach. Here the trees were pruned to order-level TU topologies instead of family-level TUs. This more coarse-grained reduction allows for assessing the effect of high reduction factors (see Material and Methods). Except for
<italic>GRTS-ML</italic>
values for rosids (
<xref ref-type="table" rid="t4-ebo-2010-073">Table 4</xref>
), the correlation with
<italic>CA-BS</italic>
values generally decreases with increasing reduction factors. Overall,
<italic>GRTS-BS</italic>
values are more in agreement with the respective
<italic>CA-BS</italic>
values, than is the case for
<italic>GRTS-ML</italic>
(
<xref ref-type="table" rid="t3-ebo-2010-073">Tables 3</xref>
vs. 4). In combination with a moderate reduction factor of 1/4 or 1/8, the correlation between
<italic>GRTS-BS</italic>
and
<italic>CA-BS</italic>
can be as high as 0.999. These highest correlations are obtained for a data set (eurosid II) with well-sampled and well-supported TUs. The data set includes only five TUs that are mutually monophyletic. Four of these groups are extremely well represented by numerous rbcL sequences. Based on the overall good correlation,
<italic>GRTS-BS</italic>
may be a valid alternative to
<italic>CA-BS</italic>
to investigate support of backbone relationships of large datasets. In particular in the light of misplaced and mislabeled sequences: The subsampling of GRTS can handle a certain amount of misplaced and mislabeled representatives per TU if compensated by correctly placed and labeled representatives. For example, if the data includes 99 sequences of a TU that are correctly placed in the phylogenetic inference, the single misplaced (erroneous) sequence has only a probability of
<italic>0.01* samplingFactor</italic>
to be sampled per replicate.</p>
<p>As mentioned before the decreased correlation between
<italic>GRTS-ML, GRTS-BS</italic>
(to a lesser degree) and
<italic>CA-BS</italic>
(
<xref ref-type="table" rid="t3-ebo-2010-073">Tables 3</xref>
,
<xref ref-type="table" rid="t4-ebo-2010-073">4</xref>
) is coupled with the observation that
<italic>GRTS-ML</italic>
(and
<italic>GRTS-BS</italic>
) increasingly recover and support commonly accepted (inter-)ordinal relationships that received only low
<italic>CA-BS</italic>
support (
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). This is exemplarily illustrated in
<xref ref-type="fig" rid="f7-ebo-2010-073">Figure 7</xref>
using the results of the
<italic>GRTS-ML</italic>
analyses based on the ROSID matrix. A reduction factor of 1/32 implied that from the 2,445 original accessions four are sampled per TU, plus the two rbcL accessions representing Picramminaceae (in total 70 terminal taxa per GRTS matrix). It has been demonstrated, especially for angiosperms, that increased taxon sampling is favorable.
<xref ref-type="bibr" rid="b34-ebo-2010-073">34</xref>
,
<xref ref-type="bibr" rid="b46-ebo-2010-073">46</xref>
However, we obtain increased support for relationships, which were originally poorly supported, by reducing the number of leaves. Does this mean that fewer leaves are more prospective than many? Notably, only ‘correct’—or better, unchallenged—relationships gained support. Moreover, we have to keep in mind that the predefined TUs represented ‘good’ taxonomic units (well-supported clades based on multigene data). This is a major difference to ‘blind’ (unguided) random taxon jackknifing and/or using arbitrarily selected placeholders. An apparent effect of GRTS seems to be that a stable, ‘correct’ signal is maintained over the replicates while inconsistent, ‘wrong’ signal, is filtered out.</p>
</sec>
</sec>
<sec>
<title>Conclusion and Outlook</title>
<p>Comprehensive and GRTS-based analyses were conducted on large
<italic>rbcL</italic>
datasets to investigate whether the broad sampling of a single genetic marker is useful for large-scale, in terms of number of taxa, phylogeny reconstruction. The good correlations among the
<italic>CA-BS</italic>
results (
<xref ref-type="table" rid="t1-ebo-2010-073">Table 1</xref>
) of the nested large matrices show that comprehensive ML analyses with RAxML scale well in terms of accuracy and support values, despite the fact that the amount of signal available in the data is, in principle, less favorable. Furthermore, ML-based
<italic>CA-BS</italic>
and to a greater extent, GRTS are able to recover higher-level relationships obtained via multigene studies; relationships that received poor support based on single-gene
<italic>rbcL</italic>
analyses in previous studies (
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
).</p>
<p>The overall high correlation between
<italic>CA-BS</italic>
and
<italic>GRTS-ML</italic>
(
<xref ref-type="table" rid="t2-ebo-2010-073">Tables 2</xref>
,
<xref ref-type="table" rid="t3-ebo-2010-073">3</xref>
) and in particular, between
<italic>CA-BS</italic>
and
<italic>GRTS-BS</italic>
(
<xref ref-type="table" rid="t4-ebo-2010-073">Table 4</xref>
) demonstrates that the predefined family-level TUs and order-level TUs were well chosen. One may expect significantly worse correlations between
<italic>CA-BS</italic>
and
<italic>GRTS-ML/GRTS-BS</italic>
if the selected TUs do not represent well-supported clades. GRTS-based analyses yield largely the same phylogenetic relationships as inferred via comprehensive large-scale analyses (
<xref ref-type="fig" rid="f5-ebo-2010-073">Figs. 5</xref>
<xref ref-type="fig" rid="f7-ebo-2010-073">7</xref>
; Tables S1–S3 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
). However, despite these promising results (e.g.,
<xref ref-type="fig" rid="f7-ebo-2010-073">Fig. 7</xref>
), the statistical properties of
<italic>GRTS-ML</italic>
in comparison to
<italic>CA-BS</italic>
need to be investigated in more detail via computational experiments with simulated and additional real-world data sets prior to using
<italic>GRTS-ML</italic>
as additional means in phylogenetic inference based on large datasets. On the other hand, the high correlation between
<italic>GRTSBS</italic>
and
<italic>CA-BS</italic>
values provide a useful tool for the interpretation of the
<italic>CA-BS</italic>
values. The execution times for
<italic>CA-BS</italic>
and
<italic>GRTS-BS</italic>
are roughly similar because the taxon subsampling approaches require, in total, a larger number of ML searches to be conducted. In addition even many-taxon single-gene alignments rarely require more than 500 bootstrap replicates to generate stable support values.
<xref ref-type="bibr" rid="b60-ebo-2010-073">60</xref>
The group-based taxon jackknifing in combination with bootstrapping (
<italic>GRTS-BS</italic>
) allows for comparing support for the relationship between the main clades, in terms of family- or order-level TUs, without investing too much effort (human and computational resources) on the optimization of intra-clade relationships. In addition, it can be assessed how, and which, BS values decrease with an increasing number of taxa, and thus, generally lower average support values on larger single-gene trees can be more easily interpreted. It may be of interest to apply the here introduced methods to multigene data: The basic concept of GRTS would allow combining gene data with only partly overlapping sets of species per predefined TU (e.g., well-supported families or orders) without the need to decide on a placeholder taxa and/or filtering of occasionally misplaced or misnamed accessions. Conversely, GRTS could be used to select placeholder sequences in conjunction with the new placement algorithm
<xref ref-type="bibr" rid="b61-ebo-2010-073">61</xref>
,
<xref ref-type="bibr" rid="b62-ebo-2010-073">62</xref>
that has recently been implemented in RAxML and that is particularly suited for the identification of short sequence reads that range between 100–400 bp in length. It appears reasonable to choose those sequences as placeholders that resulted in those trees closest to the GRTS majority-rule tree; such a preselection would further accelerate sequence identification and could be applied as long as within-group placement of query sequences is not of interest.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Data</title>
<supplementary-material content-type="local-data" id="SD1">
<label>Online Supplement 1:</label>
<caption>
<p>Misplaced
<italic>rbcL</italic>
accessions and upgrade to NCBI taxonomy tree.
<ext-link ext-link-type="uri" xlink:href="http://wwwkramer.in.tum.de/exelixis/Stam_et_al_OS1.pdf">http://wwwkramer.in.tum.de/exelixis/Stam_et_al_OS1.pdf</ext-link>
</p>
</caption>
<media xlink:href="Stam_et_al_OS1.pdf" xlink:type="simple" id="d32e1134" position="anchor" mimetype="application" mime-subtype="pdf"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="SD2">
<label>Online Supplement 2:</label>
<caption>
<p>Details of the results of comprehensive ML analyses.
<ext-link ext-link-type="uri" xlink:href="http://wwwkramer.in.tum.de/exelixis/Stam_et_al_OS2.pdf">http://wwwkramer.in.tum.de/exelixis/Stam_et_al_OS2.pdf</ext-link>
</p>
</caption>
<media xlink:href="Stam_et_al_OS2.pdf" xlink:type="simple" id="d32e1143" position="anchor" mimetype="application" mime-subtype="pdf"></media>
</supplementary-material>
</sec>
</body>
<back>
<fn-group>
<fn>
<p>
<bold>Disclosures</bold>
</p>
<p>This manuscript has been read and approved by all authors. This paper is unique and not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers of this paper report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="b1-ebo-2010-073">
<label>1.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Zwickl</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<source>Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets Under the Maximum Likelihood Criterion</source>
<publisher-loc>Austin</publisher-loc>
<publisher-name>University of Texas at Austin</publisher-name>
<year>2006</year>
</mixed-citation>
</ref>
<ref id="b2-ebo-2010-073">
<label>2.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ludwig</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Meier</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>RAxML-III: A fast program for Maximum Likelihood-based inference of large phylogenetic trees</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>456</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="pmid">15608047</pub-id>
</mixed-citation>
</ref>
<ref id="b3-ebo-2010-073">
<label>3.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>RAxML-VI-HPC: Maximum-Likelihood-based phylogenetic analyses with thousands of taxa and mixed models</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>2688</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="pmid">16928733</pub-id>
</mixed-citation>
</ref>
<ref id="b4-ebo-2010-073">
<label>4.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Blagojevic</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Antonopoulos</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Nikolopoulos</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>Exploring new search algorithms and hardware for phylogenetics: RAxML meets the IBM Cell</article-title>
<source>Journal of VLSI Signal Processing Systems</source>
<year>2007</year>
<volume>48</volume>
<fpage>271</fpage>
<lpage>86</lpage>
</mixed-citation>
</ref>
<ref id="b5-ebo-2010-073">
<label>5.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Minh</surname>
<given-names>BQ</given-names>
</name>
<name>
<surname>Vinh</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>von Haeseler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>HA</given-names>
</name>
</person-group>
<article-title>pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>3794</fpage>
<lpage>96</lpage>
<pub-id pub-id-type="pmid">16046495</pub-id>
</mixed-citation>
</ref>
<ref id="b6-ebo-2010-073">
<label>6.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guindon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gascuel</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood</article-title>
<source>Syst Biol</source>
<year>2003</year>
<volume>52</volume>
<fpage>696</fpage>
<lpage>704</lpage>
<pub-id pub-id-type="pmid">14530136</pub-id>
</mixed-citation>
</ref>
<ref id="b7-ebo-2010-073">
<label>7.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hordijk</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Gascuel</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>4338</fpage>
<lpage>47</lpage>
<pub-id pub-id-type="pmid">16234323</pub-id>
</mixed-citation>
</ref>
<ref id="b8-ebo-2010-073">
<label>8.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jobb</surname>
<given-names>G</given-names>
</name>
<name>
<surname>von Haeseler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Strimmer</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Treefinder: a powerful graphical analysis environment for molecular phylogenetics</article-title>
<source>BMC Evol Biol</source>
<year>2004</year>
<volume>4</volume>
<fpage>18</fpage>
<pub-id pub-id-type="pmid">15222900</pub-id>
</mixed-citation>
</ref>
<ref id="b9-ebo-2010-073">
<label>9.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huelsenbeck</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Ronquist</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>MrBayes: Bayesian inference of phylogeny</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>754</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="pmid">11524383</pub-id>
</mixed-citation>
</ref>
<ref id="b10-ebo-2010-073">
<label>10.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ronquist</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Huelsenbeck</surname>
<given-names>JP</given-names>
</name>
</person-group>
<article-title>MrBayes 3: Bayesian phylogenetic inference under mixed models</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>1572</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="pmid">12912839</pub-id>
</mixed-citation>
</ref>
<ref id="b11-ebo-2010-073">
<label>11.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dunn</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Hejnol</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Matus</surname>
<given-names>DQ</given-names>
</name>
<etal></etal>
</person-group>
(18 co-authors).
<article-title>Broad phylogenomic sampling improves resolution of the animal tree of life</article-title>
<source>Nature</source>
<year>2008</year>
<volume>452</volume>
<fpage>745</fpage>
<lpage>49</lpage>
<pub-id pub-id-type="pmid">18322464</pub-id>
</mixed-citation>
</ref>
<ref id="b12-ebo-2010-073">
<label>12.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McMahon</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes</article-title>
<source>Syst Biol</source>
<year>2006</year>
<volume>55</volume>
<fpage>818</fpage>
<lpage>36</lpage>
<pub-id pub-id-type="pmid">17060202</pub-id>
</mixed-citation>
</ref>
<ref id="b13-ebo-2010-073">
<label>13.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gueidan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Roux</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lutzoni</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Using a multigene phylogenetic analysis to assess generic delineation and character evolution in Verrucariaceae (Verrucariales, Ascomycota)</article-title>
<source>Mycol Res</source>
<year>2007</year>
<volume>111</volume>
<fpage>1145</fpage>
<lpage>68</lpage>
<pub-id pub-id-type="pmid">17981450</pub-id>
</mixed-citation>
</ref>
<ref id="b14-ebo-2010-073">
<label>14.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jansen</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Raubeson</surname>
<given-names>LA</given-names>
</name>
<etal></etal>
</person-group>
(16 co-authors).
<article-title>Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>2007</year>
<volume>104</volume>
<fpage>19369</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="pmid">18048330</pub-id>
</mixed-citation>
</ref>
<ref id="b15-ebo-2010-073">
<label>15.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hackett</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Kimball</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Reddy</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
(18 co-authors).
<article-title>A phylogenomic study of birds reveals their evolutionary history</article-title>
<source>Science</source>
<year>2008</year>
<volume>320</volume>
<fpage>1763</fpage>
<lpage>68</lpage>
<pub-id pub-id-type="pmid">18583609</pub-id>
</mixed-citation>
</ref>
<ref id="b16-ebo-2010-073">
<label>16.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoon</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Grant</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tekle</surname>
<given-names>YI</given-names>
</name>
<etal></etal>
</person-group>
(10 co-authors).
<article-title>Broadly sampled multigene trees of eukaryotes</article-title>
<source>BMC Evol Biol</source>
<year>2008</year>
<volume>8</volume>
<fpage>14</fpage>
<pub-id pub-id-type="pmid">18205932</pub-id>
</mixed-citation>
</ref>
<ref id="b17-ebo-2010-073">
<label>17.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gee</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Ending incongruence</article-title>
<source>Nature</source>
<year>2003</year>
<volume>425</volume>
<fpage>782</fpage>
<pub-id pub-id-type="pmid">14574398</pub-id>
</mixed-citation>
</ref>
<ref id="b18-ebo-2010-073">
<label>18.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Snell</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Bapteste</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Holland</surname>
<given-names>PWH</given-names>
</name>
<name>
<surname>Casane</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Phylogenomics of Eukaryotes: Impact of missing data on large alignments</article-title>
<source>Mol Biol Evol</source>
<year>2004</year>
<volume>21</volume>
<fpage>1740</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="pmid">15175415</pub-id>
</mixed-citation>
</ref>
<ref id="b19-ebo-2010-073">
<label>19.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jeffroy</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Brinkmann</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Delsuc</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Phylogenomics: the beginning of incongruence?</article-title>
<source>Trends Genet</source>
<year>2006</year>
<volume>22</volume>
<fpage>225</fpage>
<lpage>31</lpage>
<pub-id pub-id-type="pmid">16490279</pub-id>
</mixed-citation>
</ref>
<ref id="b20-ebo-2010-073">
<label>20.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mc Guire</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Witt</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Remsen</surname>
<given-names>JV</given-names>
<suffix>Jr</suffix>
</name>
</person-group>
<article-title>Phylogenetic systematics and biogeography of humming birds:Bayesian and maximum likelihood analyses of partitioned data and selection of an appropriate partitioning strategy</article-title>
<source>Syst Biol</source>
<year>2007</year>
<volume>56</volume>
<fpage>837</fpage>
<lpage>56</lpage>
<pub-id pub-id-type="pmid">17934998</pub-id>
</mixed-citation>
</ref>
<ref id="b21-ebo-2010-073">
<label>21.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Kolokotronis</surname>
<given-names>SO</given-names>
</name>
</person-group>
<source>Molecular evolution of Elephantidae and their tuberculosis pathogens</source>
<publisher-loc>New York</publisher-loc>
<publisher-name>Columbia University</publisher-name>
<year>2008</year>
</mixed-citation>
</ref>
<ref id="b22-ebo-2010-073">
<label>22.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Beaulieu</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches</article-title>
<source>BMC Evol Biol</source>
<year>2009</year>
<volume>9</volume>
<fpage>37</fpage>
<pub-id pub-id-type="pmid">19210768</pub-id>
</mixed-citation>
</ref>
<ref id="b23-ebo-2010-073">
<label>23.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ripplinger</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Does choice in model selection affect maximum likelihood analysis?</article-title>
<source>Syst Biol</source>
<year>2008</year>
<volume>57</volume>
<fpage>76</fpage>
<lpage>85</lpage>
<pub-id pub-id-type="pmid">18275003</pub-id>
</mixed-citation>
</ref>
<ref id="b24-ebo-2010-073">
<label>24.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tavare</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Some probabilistic and statistical problems on the analysis of DNA sequences</article-title>
<source>Lect Math Life Sci</source>
<year>1986</year>
<volume>17</volume>
<fpage>57</fpage>
<lpage>86</lpage>
</mixed-citation>
</ref>
<ref id="b25-ebo-2010-073">
<label>25.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites</article-title>
<source>J Mol Evol</source>
<year>1994</year>
<volume>39</volume>
<fpage>306</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="pmid">7932792</pub-id>
</mixed-citation>
</ref>
<ref id="b26-ebo-2010-073">
<label>26.</label>
<mixed-citation publication-type="webpage">
<person-group person-group-type="author">
<name>
<surname>Ott</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zola</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Aluru</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Large-scale Maximum Likelihood-based phylogenetic analysis on the IBM BlueGene/L</article-title>
<source>On-Line Procceedings of IEEE/ACM Supercomputing Conference</source>
<year>2007</year>
Available at:
<ext-link ext-link-type="uri" xlink:href="http://www.sc07.supercomputing.org/schedule/pdf/pap271.pdf">http://www.sc07.supercomputing.org/schedule/pdf/pap271.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="b27-ebo-2010-073">
<label>27.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ott</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Exploiting fine-grained parallelism in the phylogenetic likelihood function with MPI, Pthreads, and OpenMP: A performance study</article-title>
<source>Proceedings of third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2008)</source>
<year>2008</year>
<fpage>424</fpage>
<lpage>35</lpage>
</mixed-citation>
</ref>
<ref id="b28-ebo-2010-073">
<label>28.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Felsenstein</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Confidence limits on phylogenies: an approach using the bootstrap</article-title>
<source>Evolution</source>
<year>1985</year>
<volume>39</volume>
<fpage>783</fpage>
<lpage>91</lpage>
</mixed-citation>
</ref>
<ref id="b29-ebo-2010-073">
<label>29.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hoover</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Rougemont</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>A rapid bootstrap algorithm for the RAxML web servers</article-title>
<source>Syst Biol</source>
<year>2008</year>
<volume>57</volume>
<fpage>758</fpage>
<lpage>71</lpage>
<pub-id pub-id-type="pmid">18853362</pub-id>
</mixed-citation>
</ref>
<ref id="b30-ebo-2010-073">
<label>30.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bininda-Emonds</surname>
<given-names>ORP</given-names>
</name>
<name>
<surname>Brady</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>King</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Scaling of accuracy in extremely large phylogenetic trees</article-title>
<source>Procceedings of Pacific Symposium on Biocomputing</source>
<year>2001</year>
<volume>6</volume>
<fpage>547</fpage>
<lpage>58</lpage>
</mixed-citation>
</ref>
<ref id="b31-ebo-2010-073">
<label>31.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Moret</surname>
<given-names>BME</given-names>
</name>
<name>
<surname>Roshan</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Warnow</surname>
<given-names>T</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Sequence-length requirements for phylogenetic methods</article-title>
<person-group person-group-type="editor">
<name>
<surname>Goos</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hartmanis</surname>
<given-names>J</given-names>
</name>
<name>
<surname>van Leeuwen</surname>
<given-names>J</given-names>
</name>
</person-group>
<source>Procceedings of Algorithms in Bioinformatics: Second International Workshop, WABI 2002</source>
<publisher-loc>Heidelberg, New York</publisher-loc>
<publisher-name>Springer</publisher-name>
<year>2002</year>
<fpage>343</fpage>
<lpage>56</lpage>
Lecture Notes in Computer Science;
<volume>2452</volume>
</mixed-citation>
</ref>
<ref id="b32-ebo-2010-073">
<label>32.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Statistical properties of the Maximum Likelihood method of phylogenetic estimation and comparison with distance matrix methods</article-title>
<source>Syst Biol</source>
<year>1994</year>
<volume>43</volume>
<fpage>329</fpage>
<lpage>42</lpage>
</mixed-citation>
</ref>
<ref id="b33-ebo-2010-073">
<label>33.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goloboff</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Catalano</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Marcos Mirande</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
(7 co-authors).
<article-title>Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups</article-title>
<source>Cladistics</source>
<year>2009</year>
<volume>25</volume>
<fpage>211</fpage>
<lpage>30</lpage>
</mixed-citation>
</ref>
<ref id="b34-ebo-2010-073">
<label>34.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zwickl</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Hillis</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>Increased taxon sampling greatly reduces phylogenetic error</article-title>
<source>Syst Biol</source>
<year>2002</year>
<volume>51</volume>
<fpage>588</fpage>
<lpage>98</lpage>
<pub-id pub-id-type="pmid">12228001</pub-id>
</mixed-citation>
</ref>
<ref id="b35-ebo-2010-073">
<label>35.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Delsuc</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Brinkmann</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Philippe</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Phylogenomics and the reconstruction of the tree of live</article-title>
<source>Nat Rev Genet</source>
<year>2005</year>
<volume>6</volume>
<fpage>361</fpage>
<lpage>75</lpage>
<pub-id pub-id-type="pmid">15861208</pub-id>
</mixed-citation>
</ref>
<ref id="b36-ebo-2010-073">
<label>36.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Graham</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Olmstead</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Barrett</surname>
<given-names>SCH</given-names>
</name>
</person-group>
<article-title>Rooting phylogenetic trees with distant outgroups: A case study from the Commelinoid monocots</article-title>
<source>Mol Biol Evol</source>
<year>2002</year>
<volume>19</volume>
<fpage>1769</fpage>
<lpage>81</lpage>
<pub-id pub-id-type="pmid">12270903</pub-id>
</mixed-citation>
</ref>
<ref id="b37-ebo-2010-073">
<label>37.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Savolainen</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Chase</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Hoot</surname>
<given-names>SB</given-names>
</name>
<etal></etal>
</person-group>
(10 co-authors).
<article-title>Phylogenetics of flowering plants based on combined analysis of plastid
<italic>atpB</italic>
and
<italic>rbcL</italic>
gene sequences</article-title>
<source>Syst Biol</source>
<year>2000</year>
<volume>49</volume>
<fpage>306</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="pmid">12118410</pub-id>
</mixed-citation>
</ref>
<ref id="b38-ebo-2010-073">
<label>38.</label>
<mixed-citation publication-type="webpage">
<person-group person-group-type="author">
<name>
<surname>Stevens</surname>
<given-names>PF</given-names>
</name>
</person-group>
<year>2001</year>
onwards.
<source>Angiosperm Phylogeny Website</source>
Version 8, June 2007 (and more or less continuously updated since). Available at:
<ext-link ext-link-type="uri" xlink:href="http://www.mobot.org/MOBOT/research/APweb/welcome.html">http://www.mobot.org/MOBOT/research/APweb/welcome.html</ext-link>
Accessed October 10 2009.</mixed-citation>
</ref>
<ref id="b39-ebo-2010-073">
<label>39.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<collab>Angiosperm Phylogeny Group II</collab>
</person-group>
<article-title>An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II</article-title>
<source>Bot J Linn Soc</source>
<year>2003</year>
<volume>141</volume>
<fpage>399</fpage>
<lpage>436</lpage>
</mixed-citation>
</ref>
<ref id="b40-ebo-2010-073">
<label>40.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hilu</surname>
<given-names>KW</given-names>
</name>
<name>
<surname>Borsch</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Müller</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
(16 co-authors).
<article-title>Angiosperm phylogeny based on
<italic>matK</italic>
sequence information</article-title>
<source>Am J Bot</source>
<year>2003</year>
<volume>90</volume>
<fpage>1758</fpage>
<lpage>76</lpage>
</mixed-citation>
</ref>
<ref id="b41-ebo-2010-073">
<label>41.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davies</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Barradough</surname>
<given-names>TG</given-names>
</name>
<name>
<surname>Chase</surname>
<given-names>MW</given-names>
</name>
<etal></etal>
</person-group>
(6 co-authors).
<article-title>Darwin’s abominable mystery: Insights from a supertree of the angiosperms</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>2004</year>
<volume>101</volume>
<fpage>1904</fpage>
<lpage>09</lpage>
<pub-id pub-id-type="pmid">14766971</pub-id>
</mixed-citation>
</ref>
<ref id="b42-ebo-2010-073">
<label>42.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soltis</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Soltis</surname>
<given-names>DE</given-names>
</name>
</person-group>
<article-title>The origin and diversification of angiosperms</article-title>
<source>Am J Bot</source>
<year>2004</year>
<volume>91</volume>
<fpage>1614</fpage>
<lpage>26</lpage>
</mixed-citation>
</ref>
<ref id="b43-ebo-2010-073">
<label>43.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname>
<given-names>YL</given-names>
</name>
<name>
<surname>Dombrovska</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
(20 co-authors).
<article-title>Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes</article-title>
<source>Int J Plant Sci</source>
<year>2005</year>
<volume>166</volume>
<fpage>815</fpage>
<lpage>42</lpage>
</mixed-citation>
</ref>
<ref id="b44-ebo-2010-073">
<label>44.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Gitzendanner</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Soltis</surname>
<given-names>PS</given-names>
</name>
</person-group>
<article-title>A 567-taxon data set for angiosperms: The challenges posed by Bayesian analyses of large data sets</article-title>
<source>Int J Plant Sci</source>
<year>2007</year>
<volume>168</volume>
<fpage>137</fpage>
<lpage>57</lpage>
</mixed-citation>
</ref>
<ref id="b45-ebo-2010-073">
<label>45.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Wojciechowski</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>J-M</given-names>
</name>
<name>
<surname>Sher Khan</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Brady</surname>
<given-names>SG</given-names>
</name>
</person-group>
<article-title>Error, bias, and long-branch attraction in data of two chloroplast photosystem genes in seed plants</article-title>
<source>Mol Biol Evol</source>
<year>2005</year>
<volume>17</volume>
<fpage>782</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="pmid">10779539</pub-id>
</mixed-citation>
</ref>
<ref id="b46-ebo-2010-073">
<label>46.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rydin</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Källersjö</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Taxon sampling and seed plant phylogeny</article-title>
<source>Cladistics</source>
<year>2002</year>
<volume>18</volume>
<fpage>485</fpage>
<lpage>513</lpage>
</mixed-citation>
</ref>
<ref id="b47-ebo-2010-073">
<label>47.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lanyon</surname>
<given-names>SM</given-names>
</name>
</person-group>
<article-title>Detecting internal inconsistencies in distance data</article-title>
<source>Syst Zool</source>
<year>1985</year>
<volume>34</volume>
<fpage>397</fpage>
<lpage>403</lpage>
</mixed-citation>
</ref>
<ref id="b48-ebo-2010-073">
<label>48.</label>
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective</article-title>
<conf-name>Procceedings of 20th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2006)</conf-name>
<year>2006</year>
; no page numbers [Proceedings on CD].</mixed-citation>
</ref>
<ref id="b49-ebo-2010-073">
<label>49.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Efron</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Halloran</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Bootstrap confidence levels for phylogenetic trees</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>1996</year>
<volume>93</volume>
<fpage>13429</fpage>
<pub-id pub-id-type="pmid">8917608</pub-id>
</mixed-citation>
</ref>
<ref id="b50-ebo-2010-073">
<label>50.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Swofford</surname>
<given-names>DL</given-names>
</name>
</person-group>
<source>PAUP*: Phylogenetic analysis using parsimony (*and other methods)</source>
<edition>4 ed</edition>
<publisher-loc>Champaign, USA</publisher-loc>
<publisher-name>Sinauer Associates</publisher-name>
<year>2002</year>
</mixed-citation>
</ref>
<ref id="b51-ebo-2010-073">
<label>51.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Dezulian</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rausch</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rupp</surname>
<given-names>R</given-names>
</name>
</person-group>
<source>Dendroscope 0.22: Visualization of large trees</source>
<publisher-loc>Tübingen</publisher-loc>
<publisher-name>University of Tübingen, ZBIT, Department Algorithms in Bioinformatics</publisher-name>
<year>2007</year>
</mixed-citation>
</ref>
<ref id="b52-ebo-2010-073">
<label>52.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Application of phylogenetic networks in evolutionary studies</article-title>
<source>Mol Biol Evol</source>
<year>2006</year>
<volume>23</volume>
<fpage>254</fpage>
<lpage>67</lpage>
<pub-id pub-id-type="pmid">16221896</pub-id>
</mixed-citation>
</ref>
<ref id="b53-ebo-2010-073">
<label>53.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Holland</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Moulton</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Consensus Networks: A Method for Visualising Incompatibilities in Collections of Trees</article-title>
<person-group person-group-type="editor">
<name>
<surname>Benson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Page</surname>
<given-names>R</given-names>
</name>
</person-group>
<source>Algorithms in Bioinformatics: Third International Workshop, WABI, Budapest, Hungary Proceedings</source>
<publisher-loc>Berlin, Heidelberg, Stuttgart</publisher-loc>
<publisher-name>Springer Verlag</publisher-name>
<year>2003</year>
<fpage>165</fpage>
<lpage>76</lpage>
</mixed-citation>
</ref>
<ref id="b54-ebo-2010-073">
<label>54.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grimm</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Renner</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hemleben</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>A nuclear ribosomal DNA phylogeny of
<italic>Acer</italic>
inferred with maximum likelihood, splits graphs, and motif analyses of 606 sequences</article-title>
<source>Evol Bioinform</source>
<volume>2</volume>
<fpage>7</fpage>
<lpage>22</lpage>
</mixed-citation>
</ref>
<ref id="b55-ebo-2010-073">
<label>55.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holland</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>KT</given-names>
</name>
<name>
<surname>Moulton</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Lockhart</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>Using consensus networks to visualize contradictory evidence for species phylogeny</article-title>
<source>Mol Biol Evol</source>
<year>2004</year>
<volume>21</volume>
<fpage>1459</fpage>
<lpage>61</lpage>
<pub-id pub-id-type="pmid">15084681</pub-id>
</mixed-citation>
</ref>
<ref id="b56-ebo-2010-073">
<label>56.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nandi</surname>
<given-names>OI</given-names>
</name>
<name>
<surname>Chase</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Endress</surname>
<given-names>PK</given-names>
</name>
</person-group>
<article-title>A combined cladistic analysis of angiosperms using
<italic>rbc</italic>
L and non-molecular data sets</article-title>
<source>Ann MO Bot Gard</source>
<year>1998</year>
<volume>85</volume>
<fpage>137</fpage>
<lpage>212</lpage>
</mixed-citation>
</ref>
<ref id="b57-ebo-2010-073">
<label>57.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname>
<given-names>YL</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Bernasconi-Quadroni</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
(10 co-authors).
<article-title>The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes</article-title>
<source>Nature</source>
<year>1999</year>
<volume>402</volume>
<fpage>404</fpage>
<lpage>07</lpage>
<pub-id pub-id-type="pmid">10586879</pub-id>
</mixed-citation>
</ref>
<ref id="b58-ebo-2010-073">
<label>58.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Senters</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Zanis</surname>
<given-names>MJ</given-names>
</name>
<etal></etal>
</person-group>
(9 co-authors).
<article-title>Gunnerales are sister to other core eudicots: Implications for the evolution of pentamery</article-title>
<source>Am J Bot</source>
<year>2003</year>
<volume>90</volume>
<fpage>461</fpage>
<lpage>70</lpage>
</mixed-citation>
</ref>
<ref id="b59-ebo-2010-073">
<label>59.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Soltis</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Zanis</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Suh</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Phylogenetic relationships among early-diverging eudicots based on four genes: were the eudicots ancestrally woody</article-title>
<source>Mol Phylogenet Evol</source>
<year>2004</year>
<volume>31</volume>
<fpage>16</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="pmid">15019605</pub-id>
</mixed-citation>
</ref>
<ref id="b60-ebo-2010-073">
<label>60.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pattengale</surname>
<given-names>ND</given-names>
</name>
<name>
<surname>Alipour</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bininda-Emonds</surname>
<given-names>ORP</given-names>
</name>
<name>
<surname>Moret</surname>
<given-names>BME</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>How Many Bootstrap Replicates are Necessary?</article-title>
<source>Proceedings of RECOMB 2009, Springer Lecture Notes in Bioinformatics</source>
<year>2009</year>
<volume>5541</volume>
<fpage>184</fpage>
<lpage>200</lpage>
</mixed-citation>
</ref>
<ref id="b61-ebo-2010-073">
<label>61.</label>
<mixed-citation publication-type="webpage">
<person-group person-group-type="author">
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Komornik</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>SA</given-names>
</name>
</person-group>
Evolutionary Placement of Short Sequence Reads on Multi-Core Architectures Accepted for publication at 8th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-10). PDF:
<ext-link ext-link-type="uri" xlink:href="http://wwwkramer.in.tum.de/exelixis/pubs/Exelixis-RRDR-2009-2.pdf">http://wwwkramer.in.tum.de/exelixis/pubs/Exelixis-RRDR-2009-2.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="b62-ebo-2010-073">
<label>62.</label>
<mixed-citation publication-type="webpage">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
Evolutionary Placement of Short Sequence Reads, Exelixis Rapid Research Dissemination Report 2009–3, TU Munich, November 2009. PDF:
<ext-link ext-link-type="uri" xlink:href="http://wwwkramer.in.tum.de/exelixis/pubs/Exelixis-RRDR-2009-3.pdf">http://wwwkramer.in.tum.de/exelixis/pubs/Exelixis-RRDR-2009-3.pdf</ext-link>
</mixed-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-ebo-2010-073" position="float">
<label>Figure 1.</label>
<caption>
<p>Scheme illustrating the GRTS procedure (this study) in comparison to random taxon jackknifing.
<xref ref-type="bibr" rid="b47-ebo-2010-073">47</xref>
In contrast to random jackknifing, GRTS assures that each replicate includes always members of all pre-defined TU (in the given example TU 3 is missing in the random jackknife replicate) and that each TU is sampled proportionally to its original size. For instance, TU 1 includes 16 accessions. Thus, using a reduction factor of 2 each GRTS replicate will include exactly 8 members of TU 1. The number in random jackknife replicates may vary, resulting in an over- (TU 1 in given example) or underrepresentation of TUs (TUs 4 and 5).</p>
</caption>
<graphic xlink:href="ebo-2010-073f1"></graphic>
</fig>
<fig id="f2-ebo-2010-073" position="float">
<label>Figure 2.</label>
<caption>
<p>A family-level representation of the best-known ML tree inferred from the EUDIS matrix. The basic tree includes more than 3,500 leaves and has here been reduced to family-level TU (see Material and Methods); the latter have also been used for the GRTS analyses (following chapter).</p>
</caption>
<graphic xlink:href="ebo-2010-073f2"></graphic>
</fig>
<fig id="f3-ebo-2010-073" position="float">
<label>Figure 3.</label>
<caption>
<p>A circle cladogram of the best-known ML tree inferred from the ROSI D matrix. By far the most sequences are placed according to well-known clades (occasional mislabeled sequences not addressed). Note that the
<italic>CA-BS</italic>
support of these higher order clades is often low (<50; Tables S2, S3 in
<xref ref-type="supplementary-material" rid="SD2">OS 2</xref>
).</p>
</caption>
<graphic xlink:href="ebo-2010-073f3"></graphic>
</fig>
<fig id="f4-ebo-2010-073" position="float">
<label>Figure 4.</label>
<caption>
<p>Support values of second-level BS values over first-level BS support values on the best-scoring ML tree for the eudicots.</p>
</caption>
<graphic xlink:href="ebo-2010-073f4"></graphic>
</fig>
<fig id="f5-ebo-2010-073" position="float">
<label>Figure 5.</label>
<caption>
<p>Potential of
<italic>GRTS-ML</italic>
to recover ‘correct’ relationships. Top, ML phylogram based on EURO1 matrix. Bottom, bipartition network based on 100
<italic>GRTS-ML</italic>
replicates (family-level TU, reduction factor 1/4).</p>
</caption>
<graphic xlink:href="ebo-2010-073f5"></graphic>
</fig>
<fig id="f6-ebo-2010-073" position="float">
<label>Figure 6.</label>
<caption>
<p>Competing topological alternatives in GRTS-ML and CA-BS replicates. Unlabeled circles represent ‘candidate’ common ancestors (topological alternatives); support indicated by intensity of gray tones.</p>
</caption>
<graphic xlink:href="ebo-2010-073f6"></graphic>
</fig>
<fig id="f7-ebo-2010-073" position="float">
<label>Figure 7.</label>
<caption>
<p>
<italic>GRTS-ML-</italic>
based bipartition networks using different reduction factors and order-level TUs. Red, eurosid I clades, blue, eurosid II clades; important changes in the recognition of well-known groups are highlighted by arrows. The analyses were done based on the ROSID matrix.
<bold>A</bold>
) Reduction factor of ¼. The nitrogen fixing clade is recovered in most replicate trees.
<bold>B</bold>
) Reduction factor of 1/8. Sapindales are separated from Zygophyllales and Celastrales from Malpighiales.
<bold>C</bold>
) Reduction factor of 1/16. Zygophyllales are placed with other eurosids I.
<bold>D</bold>
) Reduction factor of 1/32. Eurosid 2 clades and Crossosomatales, respectively, are grouped.</p>
</caption>
<graphic xlink:href="ebo-2010-073f7"></graphic>
</fig>
<table-wrap id="t1-ebo-2010-073" position="float">
<label>Table 1.</label>
<caption>
<p>Correlation between bootstrap support values for comprehensive analyses and number of bipartitions in the respective tree collections of nested
<italic>rbcL</italic>
datasets.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>Data sets</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold># taxa</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-all</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold># bipartitions</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-best</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>WRF</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">Eudicots ⊇</td>
<td align="left" valign="top" rowspan="1" colspan="1">3,490</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.989</td>
<td align="left" valign="top" rowspan="1" colspan="1">Eudicots: 31,124</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.986</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.021</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">rosids</td>
<td align="left" valign="top" rowspan="1" colspan="1">2,259</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">Rosids: 30,338</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">Eudicots ⊇</td>
<td align="left" valign="top" rowspan="1" colspan="1">3,490</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.987</td>
<td align="left" valign="top" rowspan="1" colspan="1">Eudicots: 22,060</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.982</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.032</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">eurosids I</td>
<td align="left" valign="top" rowspan="1" colspan="1">1,590</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">Eurosids I: 21,688</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">Eudicots ⊇</td>
<td align="left" valign="top" rowspan="1" colspan="1">3,490</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.988</td>
<td align="left" valign="top" rowspan="1" colspan="1">Eudicots: 6,110</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.983</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.048</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">eurosids II</td>
<td align="left" valign="top" rowspan="1" colspan="1">436</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">Eurosids II: 5,894</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">Rosids ⊇</td>
<td align="left" valign="top" rowspan="1" colspan="1">2,259</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.993</td>
<td align="left" valign="top" rowspan="1" colspan="1">Rosids: 22,060</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.993</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.019</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">eurosids I</td>
<td align="left" valign="top" rowspan="1" colspan="1">1,590</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">Eurosids I: 21,983</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">Rosids ⊇</td>
<td align="left" valign="top" rowspan="1" colspan="1">2,259</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.992</td>
<td align="left" valign="top" rowspan="1" colspan="1">Rosids: 6,110</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.990</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.053</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">eurosids II</td>
<td align="left" valign="top" rowspan="1" colspan="1">436</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">Eurosids II: 6,019</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="t2-ebo-2010-073" position="float">
<label>Table 2.</label>
<caption>
<p>Correlation between family-level TU
<italic>CA-BS</italic>
and family-level TU
<italic>GRTS-ML</italic>
, using a reduction factor of 1/4, support values for the distinct datasets.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>Datatset</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold># taxa</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-all</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold># bipartitions</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-best</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>WRF</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="2" colspan="1">Eudicots</td>
<td align="left" valign="top" rowspan="1" colspan="1">256</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.925</td>
<td align="left" valign="top" rowspan="1" colspan="1">CA-BS: 4,427</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.883</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.079</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">GRTS-ML: 2,987</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2" colspan="1">Rosids</td>
<td align="left" valign="top" rowspan="1" colspan="1">153</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.915</td>
<td align="left" valign="top" rowspan="1" colspan="1">CA-BS: 2,587</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.889</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.092</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">GRTS-ML: 1,390</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2" colspan="1">Eurosids I</td>
<td align="left" valign="top" rowspan="1" colspan="1">80</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.908</td>
<td align="left" valign="top" rowspan="1" colspan="1">CA-BS: 1,683</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.899</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.094</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">GRTS-ML: 771</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2" colspan="1">Eurosids II</td>
<td align="left" valign="top" rowspan="1" colspan="1">43</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.924</td>
<td align="left" valign="top" rowspan="1" colspan="1">CA-BS: 322</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.824</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.229</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">GRTS-ML: 139</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="t3-ebo-2010-073" position="float">
<label>Table 3.</label>
<caption>
<p>Correlation between order-level TU
<italic>CA-BS</italic>
and order-level TU
<italic>GRTS-ML</italic>
support values for the distinct datasets and reduction factors.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>Dataset</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>Reduction factor</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold># taxa</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-all</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-best</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>WRF</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="4" colspan="1">Rosids</td>
<td align="left" valign="top" rowspan="1" colspan="1">1/4</td>
<td align="left" valign="top" rowspan="1" colspan="1">565</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.836</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.842</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.262</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/8</td>
<td align="left" valign="top" rowspan="1" colspan="1">282</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.845</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.847</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.240</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/16</td>
<td align="left" valign="top" rowspan="1" colspan="1">141</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.882</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.869</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.213</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/32</td>
<td align="left" valign="top" rowspan="1" colspan="1">71</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.884</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.868</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.160</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="4" colspan="1">Eurosids I</td>
<td align="left" valign="top" rowspan="1" colspan="1">1/4</td>
<td align="left" valign="top" rowspan="1" colspan="1">398</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.828</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.781</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.283</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/8</td>
<td align="left" valign="top" rowspan="1" colspan="1">199</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.828</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.584</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.253</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/16</td>
<td align="left" valign="top" rowspan="1" colspan="1">99</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.693</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.390</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.206</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/32</td>
<td align="left" valign="top" rowspan="1" colspan="1">50</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.639</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.345</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.229</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="t4-ebo-2010-073" position="float">
<label>Table 4.</label>
<caption>
<p>Correlation between order-level TU
<italic>CA-BS</italic>
and order-level TU
<italic>GRTS-BS</italic>
support values for the distinct datasets and reduction factors.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>Dataset</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>Reduction factor</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold># taxa</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-all</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>
<italic>ρ</italic>
-best</bold>
</th>
<th align="left" valign="top" rowspan="1" colspan="1">
<bold>WRF</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="5" colspan="1">Rosids</td>
<td align="left" valign="top" rowspan="1" colspan="1">1/4</td>
<td align="left" valign="top" rowspan="1" colspan="1">565</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.977</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.974</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.162</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/8</td>
<td align="left" valign="top" rowspan="1" colspan="1">282</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.966</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.959</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.120</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/16</td>
<td align="left" valign="top" rowspan="1" colspan="1">141</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.946</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.940</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.119</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/32</td>
<td align="left" valign="top" rowspan="1" colspan="1">71</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.950</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.938</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.123</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/64</td>
<td align="left" valign="top" rowspan="1" colspan="1">35</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.891</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.875</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.055</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="5" colspan="1">Eurosids I</td>
<td align="left" valign="top" rowspan="1" colspan="1">1/4</td>
<td align="left" valign="top" rowspan="1" colspan="1">398</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.977</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.878</td>
<td align="left" valign="top" rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/8</td>
<td align="left" valign="top" rowspan="1" colspan="1">199</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.932</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.714</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.136</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/16</td>
<td align="left" valign="top" rowspan="1" colspan="1">99</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.918</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.640</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.097</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/32</td>
<td align="left" valign="top" rowspan="1" colspan="1">50</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.865</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.500</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.053</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/64</td>
<td align="left" valign="top" rowspan="1" colspan="1">25</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.836</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.522</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.146</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="5" colspan="1">Eurosids II</td>
<td align="left" valign="top" rowspan="1" colspan="1">1/4</td>
<td align="left" valign="top" rowspan="1" colspan="1">109</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.999</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">0.517</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/8</td>
<td align="left" valign="top" rowspan="1" colspan="1">55</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.999</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">0.517</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/16</td>
<td align="left" valign="top" rowspan="1" colspan="1">28</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.999</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">0.513</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/32</td>
<td align="left" valign="top" rowspan="1" colspan="1">14</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.993</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">0.485</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="1" colspan="1">1/64</td>
<td align="left" valign="top" rowspan="1" colspan="1">7</td>
<td align="left" valign="top" rowspan="1" colspan="1">0.970</td>
<td align="left" valign="top" rowspan="1" colspan="1"></td>
<td align="left" valign="top" rowspan="1" colspan="1">0.467</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000526 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000526 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2880847
   |texte=   Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:20535232" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024