CyberinfraV1, Pmc, Corpus, bibRecord, 000215

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

Identifieur interne : 000215 ( Pmc/Corpus ); précédent : 000214; suivant : 000216

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

Auteurs : Stephen A. Smith ; Jeremy M. Beaulieu ; Michael J. Donoghue

Source :

BMC Evolutionary Biology [ 1471-2148 ] ; 2009.

RBID : PMC:2645364

Abstract

Background

Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare.

Results

Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae) containing 4954 species and 12,033 sites and an rbcL matrix for green plants (Viridiplantae) with 13,533 species and 1,401 sites.

Conclusion

By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously required many more genes. These demonstrations underscore the importance of using large phylogenies to uncover important evolutionary patterns and we present a fast and simple method for constructing these phylogenies.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2645364

DOI: 10.1186/1471-2148-9-37
PubMed: 19210768
PubMed Central: 2645364

Links to Exploration step

PMC:2645364

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches</title>
<author><name sortKey="Smith, Stephen A" sort="Smith, Stephen A" uniqKey="Smith S" first="Stephen A" last="Smith">Stephen A. Smith</name>
<affiliation><nlm:aff id="I1">National Evolutionary Synthesis Center, 2024 W Main St A200, Durham, NC 27705, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Beaulieu, Jeremy M" sort="Beaulieu, Jeremy M" uniqKey="Beaulieu J" first="Jeremy M" last="Beaulieu">Jeremy M. Beaulieu</name>
<affiliation><nlm:aff id="I2">Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Donoghue, Michael J" sort="Donoghue, Michael J" uniqKey="Donoghue M" first="Michael J" last="Donoghue">Michael J. Donoghue</name>
<affiliation><nlm:aff id="I2">Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">19210768</idno>
<idno type="pmc">2645364</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2645364</idno>
<idno type="RBID">PMC:2645364</idno>
<idno type="doi">10.1186/1471-2148-9-37</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000215</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches</title>
<author><name sortKey="Smith, Stephen A" sort="Smith, Stephen A" uniqKey="Smith S" first="Stephen A" last="Smith">Stephen A. Smith</name>
<affiliation><nlm:aff id="I1">National Evolutionary Synthesis Center, 2024 W Main St A200, Durham, NC 27705, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Beaulieu, Jeremy M" sort="Beaulieu, Jeremy M" uniqKey="Beaulieu J" first="Jeremy M" last="Beaulieu">Jeremy M. Beaulieu</name>
<affiliation><nlm:aff id="I2">Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Donoghue, Michael J" sort="Donoghue, Michael J" uniqKey="Donoghue M" first="Michael J" last="Donoghue">Michael J. Donoghue</name>
<affiliation><nlm:aff id="I2">Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Evolutionary Biology</title>
<idno type="eISSN">1471-2148</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare.</p>
</sec>
<sec><title>Results</title>
<p>Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae) containing 4954 species and 12,033 sites and an <italic>rbcL </italic>
matrix for green plants (Viridiplantae) with 13,533 species and 1,401 sites.</p>
</sec>
<sec><title>Conclusion</title>
<p>By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously required many more genes. These demonstrations underscore the importance of using large phylogenies to uncover important evolutionary patterns and we present a fast and simple method for constructing these phylogenies.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Evol Biol</journal-id>
<journal-title>BMC Evolutionary Biology</journal-title>
<issn pub-type="epub">1471-2148</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">19210768</article-id>
<article-id pub-id-type="pmc">2645364</article-id>
<article-id pub-id-type="publisher-id">1471-2148-9-37</article-id>
<article-id pub-id-type="doi">10.1186/1471-2148-9-37</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches</article-title>
</title-group>
<contrib-group><contrib id="A1" corresp="yes" contrib-type="author"><name><surname>Smith</surname>
<given-names>Stephen A</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>sasmith@nescent.org</email>
</contrib>
<contrib id="A2" contrib-type="author"><name><surname>Beaulieu</surname>
<given-names>Jeremy M</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>jeremy.beaulieu@yale.edu</email>
</contrib>
<contrib id="A3" contrib-type="author"><name><surname>Donoghue</surname>
<given-names>Michael J</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>michael.donoghue@yale.edu</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
National Evolutionary Synthesis Center, 2024 W Main St A200, Durham, NC 27705, USA</aff>
<aff id="I2"><label>2</label>
Department of Ecology and Evolutionary Biology, Yale University, PO Box 208105, New Haven, CT 06520, USA</aff>
<pub-date pub-type="collection"><year>2009</year>
</pub-date>
<pub-date pub-type="epub"><day>11</day>
<month>2</month>
<year>2009</year>
</pub-date>
<volume>9</volume>
<fpage>37</fpage>
<lpage>37</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2148/9/37"></ext-link>
<history><date date-type="received"><day>22</day>
<month>10</month>
<year>2008</year>
</date>
<date date-type="accepted"><day>11</day>
<month>2</month>
<year>2009</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2009 Smith et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2009</copyright-year>
<copyright-holder>Smith et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> 
 Smith 
 A 
 Stephen 
  
  
 sasmith@nescent.org 
  
 Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches 
 2009BMC Evolutionary Biology 9(1): 37-. (2009)1471-2148(2009)9:1<37>urn:ISSN:1471-2148</pmc-comment>
        </license>
</permissions>
<abstract><sec><title>Background</title>
<p>Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare.</p>
</sec>
<sec><title>Results</title>
<p>Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae) containing 4954 species and 12,033 sites and an <italic>rbcL </italic>
matrix for green plants (Viridiplantae) with 13,533 species and 1,401 sites.</p>
</sec>
<sec><title>Conclusion</title>
<p>By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously required many more genes. These demonstrations underscore the importance of using large phylogenies to uncover important evolutionary patterns and we present a fast and simple method for constructing these phylogenies.</p>
</sec>
</abstract>
</article-meta>
</front>
<body><sec><title>Background</title>
<p>All species on Earth – current estimates exceed 1.8 million – are related through common ancestors in the evolutionary Tree of Life. The construction of this phylogeny is a major endeavor for biology and largely now depends on the unprecedented growth of molecular sequence data available in public databases. Efforts focused on single clades, whole genome sequencing, genomic library construction (ESTs, BACs), and large collaborative efforts, such as NSF's Assembling the Tree of Life project, are contributing to the fast-paced growth of public databases, with more than 92 million sequences stored in the current release of GenBank (release 167). Current efforts to infer really large phylogenetic trees center on data combination using so-called supertree [e.g., [<xref ref-type="bibr" rid="B1">1</xref>
]] and supermatrix methods [e.g., [<xref ref-type="bibr" rid="B2">2</xref>
-<xref ref-type="bibr" rid="B4">4</xref>
]] as opposed to using a single gene (or multiple genes) sampled very widely across taxa [e.g., [<xref ref-type="bibr" rid="B5">5</xref>
,<xref ref-type="bibr" rid="B6">6</xref>
]]. For example, recent large-scale database-enabled phylogenetic analyses employing these approaches have shed light on the radiation and early evolution of mammals [<xref ref-type="bibr" rid="B1">1</xref>
], and the phylogenetic diversity of Bacteria [<xref ref-type="bibr" rid="B3">3</xref>
]. Recent advances in phylogenetic tree-building methods have provided the necessary first steps in approaching the problem of producing large and comprehensive phylogenetic trees [<xref ref-type="bibr" rid="B7">7</xref>
-<xref ref-type="bibr" rid="B9">9</xref>
]. However, assembling large datasets from databases remains a critical problem upstream of the tree-building process.</p>
<p>Supertree methods compile many source trees with partially overlapping taxa into a single comprehensive tree [<xref ref-type="bibr" rid="B10">10</xref>
,<xref ref-type="bibr" rid="B11">11</xref>
]. Generally, each source topology is converted into a data matrix and combined with other topological matrices. Many different algorithms exist for creating the final supertree including MRP (matrix representation with parsimony; [<xref ref-type="bibr" rid="B12">12</xref>
,<xref ref-type="bibr" rid="B13">13</xref>
]), MRF (matrix representation with flipping; [<xref ref-type="bibr" rid="B14">14</xref>
]), MinCut [<xref ref-type="bibr" rid="B15">15</xref>
], and modified MinCut [<xref ref-type="bibr" rid="B16">16</xref>
]. Although straightforward, supertree methods are not without their limitations, including problems related to data independence (same data can contribute to more than one source tree), "signal enhancement" ([<xref ref-type="bibr" rid="B11">11</xref>
] novel relationships in supertrees contradicting one or several source trees), and the assessment of uncertainty and confidence in relationships [<xref ref-type="bibr" rid="B17">17</xref>
]. In addition, supertrees are strictly topological, thus requiring sequence data to obtain useful branch lengths [<xref ref-type="bibr" rid="B1">1</xref>
]. Most importantly however, supertrees do not directly rely on the primary data for tree inference, making novel topologies suspect. Perhaps due to these limitations, and despite active development of methodologies [e.g., [<xref ref-type="bibr" rid="B1">1</xref>
]], few large supertrees for diverse groups have been successfully constructed (but see [<xref ref-type="bibr" rid="B18">18</xref>
,<xref ref-type="bibr" rid="B19">19</xref>
]).</p>
<p>Supermatrix methods, on the other hand, are directly inferred from the sequence data through the construction of a large multiple sequence alignment for simultaneous analysis of the final data-matrix [<xref ref-type="bibr" rid="B20">20</xref>
]. Given the fact that few genes are sampled very completely across many taxa, supermatrix methods often sacrifice completeness in the interest of size. In fact, one of the largest supermatrices, with >2000 tips, had 95% missing data [<xref ref-type="bibr" rid="B4">4</xref>
]. Other supermatrix analyses have focused on the number of gene regions and not on the number of species [<xref ref-type="bibr" rid="B2">2</xref>
,<xref ref-type="bibr" rid="B3">3</xref>
]. The construction of a large supermatrix involves a number of computationally challenging steps including, but not limited to, database operations, BLAST comparisons, sequence clustering, multiple sequence alignment, and combining data sets. An exhaustive discussion of these steps is presented elsewhere [<xref ref-type="bibr" rid="B4">4</xref>
], but each will be briefly touched upon as it relates to the approach presented here. Typically, sequences have been deposited in a database and all-by-all sequence comparisons with BLAST are conducted to assemble sequence clusters based on similarity. Methods for this step include agglomerative procedures, like single linkage clustering (e.g. blastclust) and stochastic methods (e.g. Tribe-MCL; [<xref ref-type="bibr" rid="B21">21</xref>
]). Clustered sequences are then submitted to multiple sequence alignment (MSA). There are a host of other procedures that can be conducted once multiple sequence alignments are produced, especially related to identifying sequence orthology. Multi-locus datasets are created from individual alignments that do not have "too many" missing entries using a bipartite graph of taxa and loci and combining with bicliques or quasi-bicliques [<xref ref-type="bibr" rid="B22">22</xref>
,<xref ref-type="bibr" rid="B23">23</xref>
].</p>
<p>Each step described above is computationally difficult and rarely has been discussed in the context of what might be optimal for the final goal of tree construction. Despite the computational difficulties and potential shortcomings of specific steps in their construction, supermatrix methods allow for simultaneous data analysis. Also, unlike supertrees, they do not suffer from data independence or "signal enhancement" problems, and, at least in principle, confidence can be assessed using standard bootstrapping approaches. However, problems related to missing data and assessing the quality of the trees produced persists. Tools addressing certain steps of supermatrix construction are beginning to become available (e.g., Phylota; [<xref ref-type="bibr" rid="B24">24</xref>
]) and some notable large trees have been successfully produced [<xref ref-type="bibr" rid="B4">4</xref>
]. However, tools for constructing supermatrices are not readily available for comparative biologists, and rather few large matrices for specific clades have been successfully analyzed. Nevertheless, supermatrix methods have made enormous strides forward and recent discussions have begun to center on methods that combine elements of both supertree and supermatrix approaches [e.g. [<xref ref-type="bibr" rid="B17">17</xref>
,<xref ref-type="bibr" rid="B25">25</xref>
,<xref ref-type="bibr" rid="B26">26</xref>
]].</p>
<p>The method introduced here, to which we refer to as a "mega-phylogeny", is most similar to supermatrix methods, but differs from previous methods used to create large matrices in a number of ways. The mega-phylogeny method relies on the user identifying the gene regions of interest by presenting actual examples of the gene region and the breadth of molecular diversity of that gene within the clade of interest. Also, the mega-phylogeny method employs profile alignments to combine alignments of orthologous gene regions that would either be poorly aligned if done across a broad taxonomic group or would be broken up by clustering analyses. The mega-phylogeny approach can quickly create enormous phylogenetic matrices as more data from the same gene may be used, and the problems associated with sequence saturation are specifically attenuated. The first demonstration of this method [<xref ref-type="bibr" rid="B27">27</xref>
] produced phylogenies for plant clades from 366 species and 11,374 sites (Dipsacales) to 4657 species and 22,391 sites (Commelinidae).</p>
<p>Here, we describe this new approach and its current implementation. We also present two example phylogenies for two plant clades created using our method: an Asterales phylogeny containing 4954 species and five gene regions and an <italic>rbcL </italic>
phylogeny of green plants (Viridiplantae) comprising more than 13,533 species.</p>
</sec>
<sec sec-type="methods"><title>Methods</title>
<sec><title>Implemented Pipeline</title>
<p>The basic steps for a mega-phylogeny include (1) designating the clade of interest, (2) identifying the gene region(s) of interest, (3) recording the extent of molecular diversity of the gene region in the clade of interest, (4) recording the threshold of coverage and identity to be used for orthology tests, (5) narrowing the possible sequences with a very broad term search [optional], (6) remove all potential sequences that are not members of the clade of interest, (7) testing orthology by BLASTing each potential sequence to each gene region identified for the breadth and removing those sequences that differ by more than the established threshold, (8) identifying sequences that should be reverse complemented, (9) removing sequences for duplicate taxon names, keeping the sequence with the best coverage and identity, and (10) test for saturation. If the sequences are saturated, subdivide them using the next available subclade and perform additional tests of saturation (step 10). Finally, once all of the sequences are in an alignment or exist as singletons (i.e. are not found to be contained in any subdivisions), profile each alignment to a master alignment. This can be repeated an arbitrary number of times for each gene region of interest. If multiple gene regions are used, these are then concatenated into a large matrix and the phylogeny inferred.</p>
<p>We implemented this pipeline in Python (vers. 2.5) with the BioPython (vers. 1.48) module and using the BioSQL (vers. 1.0.1) database schema. Each mega-phylogeny matrix assembly analysis presented here was run on a Linux laptop with 1 GB RAM and a 2.4 Ghz dual-core processor. The phylogenetic analyses were conducted on an eight-way SMP Linux computer with 2.4 Ghz processors and 32 GB of RAM using RAxML (vers. 7.0.4; [<xref ref-type="bibr" rid="B8">8</xref>
]). The steps that are novel for matrix assembly are described briefly below.</p>
</sec>
<sec><title>Orthology</title>
<p>Determining whether sequences are orthologous is a challenge for large tree construction. Supermatrix methods have attempted to overcome this problem by identifying orthologous sets of sequences using clustering techniques [<xref ref-type="bibr" rid="B2">2</xref>
,<xref ref-type="bibr" rid="B4">4</xref>
], but these can be time consuming and are typically not developed with the goal of large phylogeny assembly [e.g., [<xref ref-type="bibr" rid="B28">28</xref>
]]. Here, we determine orthologous sequences using designated sequences representing the breadth of variation observed in the gene region of interest across the clade of interest. We BLAST all of the potential sequences from the database against these designated sequences and other potential sequences that are determined to match with a certain threshold (i.e. according to both coverage and identity). At this stage, reverse complements are corrected by determining which direction best matches the designated regions of interest. Instead of N × N comparisons between each potentially useful sequence, only N × n comparisons are necessary, where n is the number of example sequences used to represent the region. This dramatically shortened the run time of the algorithm as well as generally produced denser matrices.</p>
</sec>
<sec><title>Profile alignments</title>
<p>One major problem for large matrices using broadly sampled sequences or smaller matrices with quickly evolving sequences is that multiple sequence alignments become more challenging as sequences become more divergent [<xref ref-type="bibr" rid="B28">28</xref>
,<xref ref-type="bibr" rid="B29">29</xref>
]. Almost all multiple sequence alignment algorithms build a phylogeny during the estimation procedures [<xref ref-type="bibr" rid="B30">30</xref>
]. The phylogenies built for multiple sequence alignment are often based on model-corrected or raw pair-wise distances. These methods are susceptible to problems with saturation (i.e. multiple mutations at the same site for the same organism) and are therefore much less accurate for large and broadly sampled alignments that are likely to contain very distantly related sequences. The quality of multiple sequence alignments can have a dramatic impact on the accuracy of the phylogenies produced [e.g., [<xref ref-type="bibr" rid="B31">31</xref>
-<xref ref-type="bibr" rid="B33">33</xref>
]]. As a result, other supermatrix methods sidestep multiple sequence alignment by employing clustering techniques to determine "alignable regions." Such clustering techniques have allowed for the assembly of very large matrices of sufficiently similar sequences [<xref ref-type="bibr" rid="B2">2</xref>
,<xref ref-type="bibr" rid="B4">4</xref>
,<xref ref-type="bibr" rid="B34">34</xref>
]. This approach can be dramatically affected by the parameters used during clustering, sometimes resulting in multiple informative clusters for slower gene regions (e.g. two large <italic>rbcL </italic>
clusters for Ericales in Phylota).</p>
<p>We combine the analysis of sequence saturation with recent advances in multiple profile-to-profile alignment methodology. A profile alignment is an algorithmic approach to identifying structural elements that are highly conserved between different alignments [<xref ref-type="bibr" rid="B34">34</xref>
-<xref ref-type="bibr" rid="B36">36</xref>
]. To accomplish this, separate alignments are aligned together while preserving the columns in the individual alignments. Newer profile alignment programs allow for more flexibility in profile alignment procedures (e.g. MAFFT; [<xref ref-type="bibr" rid="B37">37</xref>
]). In our case, we separate sequences into subgroups of aligned sequences based on the degree of sequence saturation. For example, if the algorithm determines that the most inclusive group of sequences is saturated, then the group is broken up into less inclusive groups using the next level in the taxonomic hierarchy. In a Linnaean taxonomic system, if an "order" is found to be saturated, it would be broken into "families". Each smaller subset of sequences is then re-aligned and the saturation reassessed. This process continues iteratively to less inclusive groups until sequences no longer appear saturated and these alignments are then stored. We note, however, that the taxonomic groups used in this procedure need not correspond to ranks in the Linnaean hierarchy, but should simply be hierarchically nested (as in the NCBI taxonomy). Grouping using a rank-free classification (PhyloCode; [<xref ref-type="bibr" rid="B38">38</xref>
]) could be used and will be possible once a database of phylogenetic names is implemented and usable. After every sequence has been either placed in an alignment or placed as a "singleton," the individual alignments are then profiled to a larger alignment. The order of the profiling can be random, optimized to find the best order, or aided by a hierarchical "guide" tree (e.g. first aligning more closely related matrices). Currently, we employ highly conservative guide trees based on published studies to carry out profile alignments.</p>
</sec>
<sec><title>Assessing saturation</title>
<p>We introduce a simple method based on dispersion statistics to rapidly detect saturation across a set of sequence data. Dispersion (an indicator of spread) is assessed on the one-dimensional Euclidean distance between the raw pair-wise sequence distances and those corrected according to a Jukes-Cantor model of molecular substitution. A one-dimensional Euclidean distance is the absolute difference between two points. Our measure of dispersion is based on the median and is commonly referred to as the median absolute deviation (MAD) and given by</p>
<p><disp-formula>MAD = 1.4826 × Med (| x<sub>i </sub>
- Med (x)|),</disp-formula>
</p>
<p>where the median is estimated from the residual variation about the median of all pair-wise Euclidean distances. The constant 1.4826 is used to make MAD consistent for the standard deviation [<xref ref-type="bibr" rid="B39">39</xref>
]. Thus, in our use, the larger the MAD the larger the overall spread in the Euclidean distances – that is, above a certain value the assumed nucleotide substitution model is no longer adequately accounting for the rate variation exhibited by pair-wise distances among species.</p>
<p>We performed a simple simulation study to explore the behavior of MAD. First, we wanted to determine a threshold for subdividing sequences into smaller alignments. In addition, we wanted to compare MAD with alternative measures of dispersion based on the sample mean (i.e. mean square error, MSE; root mean square, RMSE). Sequence data were simulated across randomly constructed 20- and 100-tip phylogenies. Different rates of molecular evolution were simulated by incrementally scaling the total tree length by a factor of 0.10, starting from 0.10 and stopping at 2.0. All molecular simulations were carried out using Seq-gen (Ver 1.3.2; [<xref ref-type="bibr" rid="B40">40</xref>
]).</p>
<p>The results from these simulations clearly highlight the utility of MAD. First, unlike MSE or RMSE, MAD does not require an underlying Gaussian distribution, which is useful as the distribution of Euclidean distances becomes skewed as the degree of sequence divergence increases. A second advantage, and perhaps the most critical, is that MAD appears stable when sequence divergence is unrealistically high (e.g. tree length scaled by a factor of 2; Figure <xref ref-type="fig" rid="F1">1</xref>
). This situation is also analogous to the presence of outliers that have well-known influences on dispersion statistics based on the mean. Because MAD does not require an explicit distribution and is the 50<sup>th </sup>
percentile of the residual variation, it has the inherent property of being robust to outliers (Figure <xref ref-type="fig" rid="F1">1G, H</xref>
). Finally, our simulations indicate that a MAD exceeding ~0.01 provided a conservative indication of a saturation level necessitating a profile alignment scheme.</p>
<fig position="float" id="F1"><label>Figure 1</label>
<caption><p><bold>Simulation exploring the behavior of MAD in relation to alternative measures of dispersion</bold>
. Each panel is a simulation of sequence data on a balanced phylogeny of 20-(A, C, E, and G) and 100-tips (B, D, F, and H). A and B total tree length scaled to 0.10. C and D total tree length scaled to 0.25. E and F total tree length scaled to 0.50. G and H total tree length scaled to 2.00. Saturation was assessed by descriptors of dispersion on the one-dimensional Euclidean distance between the raw pair-wise sequence distances (uncorrected distance) and those corrected according to a Jukes-Cantor model of molecular substitution (corrected distance). Our simulations demonstrated that the use of the non-parametric median absolute deviation (MAD) had several advantages of detecting saturation over alternative measures of dispersion based on the sample mean (i.e. mean square error, MSE; root mean square, RMSE).</p>
</caption>
<graphic xlink:href="1471-2148-9-37-1"></graphic>
</fig>
</sec>
</sec>
<sec><title>Results</title>
<sec><title>Asterales</title>
<p>Nearly 10% of all angiosperms are contained within the Asterales; a clade that is mainly comprised of 12 recognized families with a majority of the diversity being attributed to just two families, Asteraceae (e.g. sunflowers, thistles) and Campanulaceae (e.g. <italic>Lobelia </italic>
and relatives). The monophyly of Asterales is well supported despite uncertainty in its position within the more inclusive Campanulidae clade [<xref ref-type="bibr" rid="B41">41</xref>
]. There are roughly 5200 species of Asterales represented in GenBank, or roughly 20% of the entire clade. However, aside from studies of carefully selected exemplar taxa representing major lineages of Asterales [<xref ref-type="bibr" rid="B41">41</xref>
-<xref ref-type="bibr" rid="B46">46</xref>
], a comprehensive phylogeny has never been produced for this clade. Here we apply our mega-phylogeny approach to the Asterales to reconstruct the most complete phylogeny of the group to date.</p>
<p>Our Asterales sequence matrix was comprised of <italic>rbcL</italic>
, <italic>mat</italic>
K, <italic>trnL-F</italic>
, <italic>trnK</italic>
, <italic>ndhF</italic>
, and ITS. The combined matrix of 12,033 sites was comprised of 90.959% gaps or missing sequence. However, the individual gene regions were more variable in gap or missing sequence composition: 98.043% in ETS, 36.348% in ITS, 98.188% in <italic>matK</italic>
, 90.338% in <italic>ndhF</italic>
, 92.597% in <italic>rbcL</italic>
, 98.002% in <italic>trnK</italic>
, and 81.445% in <italic>trnL-F</italic>
. Of the five gene regions sampled, ITS was the best represented taxonomically (with 4242 species) and was the only region identified by our procedure as requiring profile-to-profile alignments. The MAD score indicated that the degree of ITS saturation varied among groups, but within-group alignments were never carried out above the traditional "tribal" level. This resulted in 180 separate within-group alignment files of differing hierarchical level.</p>
<p>As an efficient means to direct the profile-to-profile alignments, we assembled a "guide" tree by compiling and grafting together published phylogenies (<italic>sensu </italic>
[<xref ref-type="bibr" rid="B47">47</xref>
]). Briefly, we first obtained a backbone phylogeny from Winkworth et al., [<xref ref-type="bibr" rid="B41">41</xref>
] and Lundberg and Bremer [<xref ref-type="bibr" rid="B44">44</xref>
] for the major lineages of Asterales. We then grafted trees based on more focused studies into the backbone tree. We started this process with the most inclusive clade and proceeding "inwards", adding more and more detailed analyses of included clades. Our final grafted tree was pruned down to correspond to the 180-alignment files output from our saturation analysis (see Additional file <xref ref-type="supplementary-material" rid="S1">1</xref>
). This guide tree was then traversed in a post-order fashion, performing profile-to-profile alignments starting at the "terminals" and working recursively back to the root. The phylogeny was then inferred using RAxML (vers. 7.0.4; [<xref ref-type="bibr" rid="B8">8</xref>
]), partitioning for each gene region using the GTR+GAMMA model of rate substitution.</p>
<p>Our final phylogeny includes 4954 tips with the branching within and among "families" being mostly consistent with previously published results (Figure <xref ref-type="fig" rid="F2">2</xref>
). One exception concerns the early branching lineages of Asterales, involving the placement of Rousseaceae+Carpodetaceae, Campanulaceae, and Pentaphragmataceae. The current consensus recognizes a basal trichotomy among these three clades ([<xref ref-type="bibr" rid="B48">48</xref>
]; but see [<xref ref-type="bibr" rid="B41">41</xref>
,<xref ref-type="bibr" rid="B44">44</xref>
]). Our analysis recovered Rousseaceae+Carpodetaceae as the sister groups of all other Asterales, within which Campanulaceae (including Lobeliaceae) is sister to the rest (Figure <xref ref-type="fig" rid="F2">2</xref>
). This result has been recovered before [<xref ref-type="bibr" rid="B44">44</xref>
], but other studies have suggested a basal split between Rousseaceae+Carpodetaceae plus Campanulaceae [<xref ref-type="bibr" rid="B41">41</xref>
,<xref ref-type="bibr" rid="B44">44</xref>
] and all the rest. Our analysis shows Pentaphragmataceae as sister to a clade comprising Stylidiaceae, Alseuosmiaceae, Argophyllaceae, Phellinaceae, Menyanthaceae, Goodeniaceae, Calyceraceae, and Asteraceae (Figure <xref ref-type="fig" rid="F2">2</xref>
). A recent combined analysis of chloroplast and nuclear genes found strong support for this relationship [<xref ref-type="bibr" rid="B41">41</xref>
].</p>
<fig position="float" id="F2"><label>Figure 2</label>
<caption><p><bold>Maximum-likelihood phylogeny for 4954 species of Asterales</bold>
. The data matrix was constructed using the mega-phylogeny method and includes DNA sequences for five genes: <italic>rbcL, matK, trnL-F, ndhF</italic>
, and ITS. Each of the 12 major families of Asterales is labeled. We also note the placement of the "Doronicum" clade in relation to the tribe Senecioneae; although we assumed a sister relationship <italic>a priori</italic>
, the phylogenetic analysis overruled this assumption, indicating that the two clades may be more distantly related. Pentaphragma, Pentaphragmataceae; Alseu, Alseuosmiaceae; Argo, Argophyllaceae; Phel, Phellinaceae.</p>
</caption>
<graphic xlink:href="1471-2148-9-37-2"></graphic>
</fig>
<p>Relationships within Asteraceae coincide well the subfamilial classification of Panero and Funk [<xref ref-type="bibr" rid="B46">46</xref>
]. However, we note that our phylogeny does not support the monophyly of Wunderlichioideae, Stifftioideae, Mutisioideae, or Gochnatioideae (<italic>sensu </italic>
[<xref ref-type="bibr" rid="B46">46</xref>
]). Instead we found these groups to be broken into smaller successive sister clades to the rest of Asteraceae.</p>
<p>Several relationships, within major clades, are worth noting as they highlight the utility of our method. Based on the NCBI taxonomy, the profile-alignment portion of our algorithm assumed that <italic>Campanula </italic>
was monophyletic. However, the MAD score detected extreme sequence variation that required profile alignments among species. This variation is an indication of extreme molecular differentiation, and in the case of <italic>Campanula</italic>
, paraphyly, which is consistent with more focused systematic studies of Campanulaceae [<xref ref-type="bibr" rid="B49">49</xref>
,<xref ref-type="bibr" rid="B50">50</xref>
]. In several cases we also assumed sister relationships where the primary literature suggested low nodal support. For example, we profiled the genus <italic>Doronicum </italic>
and the tribe Senecioneae as sister clades within the more inclusive Asteroideae, though there is generally low confidence in this hypothesis [<xref ref-type="bibr" rid="B51">51</xref>
]. The resulting tree showed these two clades to be more distantly related, as <italic>Doronicum </italic>
is placed near the early branching lineages of Asteroideae (Figure <xref ref-type="fig" rid="F2">2</xref>
). Taken together, these results show that even though we assume some phylogenetic relationships at the outset in doing the profile alignments, our results need not recover the same relationship within the final phylogenetic tree.</p>
</sec>
<sec><title>Green plants</title>
<p>The green plants (Viridiplantae) contain more than 350,000 species including green algae, liverworts, mosses, ferns, and seed plants, including the flowering plants. The early branches of the entire clade and of each major group of green plants have attracted extensive molecular work [<xref ref-type="bibr" rid="B52">52</xref>
-<xref ref-type="bibr" rid="B55">55</xref>
]. Two large clades of living green plants are supported: Streptophyta and Chlorophyta [<xref ref-type="bibr" rid="B52">52</xref>
]. Charophytes (stonewarts) have been supported as the sister group to land plants based on the inclusion of six genes [<xref ref-type="bibr" rid="B53">53</xref>
,<xref ref-type="bibr" rid="B55">55</xref>
]. Despite the large number of molecular studies that have focused on deep relationships within green plants, few studies with very large numbers of taxa have been conducted.</p>
<p>Here, we create a mega-phylogeny of the chloroplast ribulose-bisphosphate carboxylase (<italic>rbcL</italic>
) gene for all green plants. This well sampled gene has been extensively examined in smaller studies throughout plants, especially in flowering plants [beginning with 5]. Despite the widespread use of <italic>rbcL</italic>
, the addition of other genes has generally been necessary to confidently reconstruct many relationships. Here, our goal was to construct the largest <italic>rbcL </italic>
phylogeny for green plants while simultaneously accommodating saturation across the alignment.</p>
<p>Over 16,000 <italic>rbcL </italic>
sequences were found to be orthologous to the designated sequences sampled throughout flowering plants. Our final matrix with duplicate taxa removed consisted of 13,533 tips and 1401 nucleotide sites with 4.6238% of the matrix consisting of gaps. Our saturation analysis recognized 15 separate aligned subgroupings: Chlorophyta (486 sp.), Zygnesnophyceae (131 sp.), Coleochaetophyceae (21 sp.), Charophyceae (34 sp.), Marchantiophyta (462 sp.), Bryophyta (570 sp.), Anthocerotophyta (56 sp.), Lycopodiopsida (48 sp.), Isoetopsida (101 sp.), Equisetophyta (18 sp.), Marattiopsida (59 sp.), Ophioglossopsida (29 sp.), Filicopsida (1624 sp.), Psilotophyta (3 sp.), and Spermatophyta (9900). These aligned subgroups were combined using profile alignments across a guide tree based on Donoghue [<xref ref-type="bibr" rid="B56">56</xref>
] and Cantino et al. [<xref ref-type="bibr" rid="B38">38</xref>
] (see additional file <xref ref-type="supplementary-material" rid="S2">2</xref>
). The phylogeny was constructed using RAxML (vers. 7.0.4; [<xref ref-type="bibr" rid="B8">8</xref>
]) using the GTR+GAMMA model of rate substitution.</p>
<p>A recent analysis by Qiu et al. [<xref ref-type="bibr" rid="B55">55</xref>
] compiled the most comprehensive dataset to date, using six genes and 193 species to resolve relationships of the four major land plant lineages: liverworts, hornworts, mosses, and vascular plants. They recover, with strong support, a resolution of successive sister clades starting with liverworts, then mosses, then hornworts, and vascular plants. Our maximum-likelihood tree of more than 13,000 species recovers this same relationship with the use of <italic>rbcL </italic>
alone (Figure <xref ref-type="fig" rid="F3">3</xref>
). However, within vascular plants, our trees differ in the placement of lycophytes. In the Qiu et al. [<xref ref-type="bibr" rid="B56">56</xref>
] tree, monilophytes are more closely related to seed plants than lycophytes and these relationships are well established based on other evidence (e.g., morphology [<xref ref-type="bibr" rid="B57">57</xref>
], gene order and gene losses [<xref ref-type="bibr" rid="B58">58</xref>
,<xref ref-type="bibr" rid="B59">59</xref>
]). We find the lycophytes to be more closely related to seed plants, which is likely to be mistaken and reflects an artifact in the evolution of <italic>rbcL</italic>
.</p>
<fig position="float" id="F3"><label>Figure 3</label>
<caption><p><bold>Maximum-likelihood phylogeny for 13,533 species of green plants based on <italic>rbcL </italic>
DNA sequences</bold>
. The data matrix was constructed using the mega-phylogeny method; major clades are labeled and denoted with a star.</p>
</caption>
<graphic xlink:href="1471-2148-9-37-3"></graphic>
</fig>
<p>Our much larger phylogeny resolves some relationships by including more data in the form of more species instead of more genes. This has been documented previously but has rarely been tested on such a large scale [<xref ref-type="bibr" rid="B60">60</xref>
,<xref ref-type="bibr" rid="B61">61</xref>
]. With the inclusion of more taxa other broad evolutionary patterns emerge [cf. [<xref ref-type="bibr" rid="B27">27</xref>
]]. For example, in this case, the ferns appear to have faster rates of evolution than the other vascular plants. Further study is required to quantify this pattern and its important, as the timing and rate of evolution of ferns has been interpreted in light of angiosperm evolution [<xref ref-type="bibr" rid="B62">62</xref>
]. With more taxa sampled, rate heterogeneity can become more apparent, raising an important issue about the possible effects of clade-specific rates on divergence-time estimates [<xref ref-type="bibr" rid="B27">27</xref>
]. Unfortunately, accurate estimates of divergence times using tens of thousands of species remain impractical.</p>
<p>Another important result is that <italic>rbcL </italic>
appears to be saturated across green plants. That is, despite the conservative nature of this coding region, when looking very broadly there are likely to be multiple mutations at sites throughout the gene causing either less accurate multiple sequence alignments or causing clustering methods to break up the matrix into smaller sections. Broad analyses of green plants will need to take this into account. Our analysis also demonstrates the limitations of conventional computers for analyzing large phylogenies. The matrix manipulation, tree construction, and tree rerooting required at least 8 GB of memory and were conducted on an 8 CPU machine. To build even larger matrices, more memory and faster machines will be essential.</p>
</sec>
</sec>
<sec><title>Discussion and conclusion</title>
<p>The examples presented here demonstrate the utility of our strategy for building large phylogenetic trees. The mega-phylogeny method is capable of producing large and somewhat denser phylogenetic matrices with the addition of human intervention in the selection of gene regions. These matrices can be a partitioned multi-locus dataset, as in the Asterales example, or a single-locus analysis of tens of thousands of terminals, as in the green plants. The size is limited only by computing power. Also, our examples illustrate how well sampled regions (such as ITS) that may be evolving too fast for traditional multiple sequence alignment may be included in broad phylogenetic analyses. Our mega-phylogeny approach also demonstrates that the addition of many more taxa can help resolve relationships where, traditionally, more genes would be required. A direct comparison of our mega-phylogeny method to trees constructed from supermatrix methods is difficult as the two approaches have somewhat different goals. Supermatrix methods can, as implemented by McMahon and Sanderson [[<xref ref-type="bibr" rid="B4">4</xref>
], also see [<xref ref-type="bibr" rid="B2">2</xref>
,<xref ref-type="bibr" rid="B3">3</xref>
]], attempt to make the largest matrix from a database of sequences without specifically identifying particular regions of interest. The mega-phylogeny approach attempts to make the matrix with the largest number of taxa for any clade given the gene regions pre-identified as being of interest. This allows for the creation of somewhat denser matrices, faster. Because fewer gene regions would, in general, be used in the mega-phylogeny approach, partitioned likelihood analyses can be employed more easily with shorter run times. At the moment, there is no standard software for supermatrix methods that could be benchmarked against the mega-phylogeny approach.</p>
<p>Our mega-phylogeny method will perhaps be most useful for comparative biology. In recent years there is an emerging interest in compiling broad-scale datasets to identify general patterns and test specific hypotheses using a phylogenetic framework. For example, new and interesting patterns have emerged in topics ranging from molecular rates [<xref ref-type="bibr" rid="B27">27</xref>
] to ecophysiology [<xref ref-type="bibr" rid="B63">63</xref>
-<xref ref-type="bibr" rid="B65">65</xref>
] to biodiversity [<xref ref-type="bibr" rid="B66">66</xref>
] and ecosystem processes [<xref ref-type="bibr" rid="B67">67</xref>
]. However, the level of resolution in the underlying phylogeny has limited many of these studies with many being constructed from multiple literature-based trees (e.g. Phylomatic; [<xref ref-type="bibr" rid="B68">68</xref>
]). Our method provides a means to construct large phylogenies from primary data, which we hope will facilitate more sophisticated and robust comparative analyses. This has been demonstrated with rate heterogeneity in plants [<xref ref-type="bibr" rid="B27">27</xref>
]. The limiting factor may soon become the ability of software for various comparative analyses to handle mega-phylogenies.</p>
<sec><title>Modularity</title>
<p>The mega-phylogeny method is inherently modular, making each step easily extensible. For example, instead of using BLAST comparisons for sequence orthology tests, another test could be used. In fact, a modified clustering method, as is typically found in supermatrix construction, could be utilized. Additionally, instead of the MAD measurement for saturation, other measures could be devised. Concomitantly, any number of different taxonomic databases can be used when saturation is detected. We have relied on the NCBI hierarchical taxonomy, but increased precision might eventually be obtained by using a system containing many additional levels. The modularity of the mega-phylogeny approach encourages its longevity when better methods and approaches become available to address specific procedures underlying mega-phylogeny matrix creation.</p>
<p>Modularity is especially important with respect to the guide trees involved in profile alignment, where the results from different guide trees (or no guide tree) can be compared. For example, there may be a published study of broadly sampled taxa included in the clade of interest for the mega-phylogeny approach. A profile alignment using this tree could easily be compared to one that consists only of basal polytomies, which will be profiled randomly. The use of guide trees for this step highlights the need for available definitive bifurcating trees for profile alignments, especially broadly sampled trees. From this perspective, the compilation of large-scale trees from published phylogenies (e.g., available on TreeBASE, <ext-link ext-link-type="uri" xlink:href="http://www.treebase.org"></ext-link>
) becomes a highly relevant endeavor, not only from the standpoint of the initial guide tree but also as a basis for the comparison of results. Important differences could then highlight areas in special need of attention. For example, further attention is needed to the signal in <italic>rbcL </italic>
that places lycophytes with seed plants.</p>
</sec>
<sec><title>Potential pitfalls</title>
<p>A key element of our mega-phylogeny method is its reliance on prior knowledge of phylogenetic relationships when performing profile-to-profile alignments. We assume that each group being aligned is monophyletic, which is potentially a problem once saturation is detected and less inclusive multiple sequence alignments are employed. However, despite such assumptions, our saturation analysis using the MAD statistic is not irreversibly susceptible to outliers and can detect extreme variation when, for example, it is not monophyletic as demonstrated with <italic>Campanula </italic>
within our Asterales matrix. In this case, the MAD score suggested that the assumption of monophyly for <italic>Campanula </italic>
was violated, and it emerged as paraphyletic in the final tree. Even though the assumption of monophyly is a potential problem, it is not always detrimental. Further work is needed to explore the sensitivity of the results to such assumptions. In the meantime, the approach highlights the need for taxonomic databases to most accurately reflect current best knowledge of phylogenetic relationships.</p>
<p>Various problems identified in supermatrix construction may also pertain to the mega-phylogeny method. For example, there are problems with database-enabled phylogenetics that are hard or impossible to avoid, such as misidentification or mislabeling in GenBank [<xref ref-type="bibr" rid="B4">4</xref>
]. Sequence orthology tests can help identify such problems, however outliers are likely to still cause difficulties in some matrices. Additionally, there are potential problems with "rogue taxa" that can lower resolution and support throughout the tree. However, the problem of rogue taxa continues to also be a problem for supermatrices and therefore the development of solutions will likely benefit both methods.</p>
</sec>
<sec><title>Extensions</title>
<p>It may be possible to incorporate diversity estimates for each taxonomic group, such that large clades represented by single (or a few) species for a particular gene could be excluded. This would likely reduce problems associated with rogue taxa. Although this information is not currently readily available, its inclusion could greatly increase the efficacy of the mega-phylogeny approach.</p>
<p>Our method can also be extended to deal with the problem of sequence outliers. Unfortunately, the size of the matrices that can be constructed makes checking for outliers by hand impractical. But saturation statistics could be extended to identify these outliers in the individual gene regions. Although the orthology tests and reverse complement procedures identify the vast majority of problematic sequences, the MAD statistic has the potential to cleanse the datasets further, allowing for almost complete automation of large tree construction.</p>
<p>Finally, the mega-phylogeny procedure can be parallelized. Many of the procedures related to sequence-to-sequence comparisons (e.g., orthology tests, reverse complements) can be easily distributed on multiple CPU's or computers. This is also true of some of the multiple sequence alignment calculations. Parallelizing these procedures would yield even faster the mega-phylogenic analyses.</p>
</sec>
</sec>
<sec><title>Authors' contributions</title>
<p>SAS developed the approach and conducted the analyses. JMB developed the saturation tests. SAS, JMB, and MJD wrote and edited the manuscript.</p>
</sec>
<sec sec-type="supplementary-material"><title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1"><caption><title>Additional file 1</title>
<p><bold>Figure S1.</bold>
 The final assembled "guide" tree used to direct the profile alignments across Asterales. Each "terminal" represents the name of the name of the 180-alignment files output from our saturation analysis. We assembled a "guide" tree representing the relationships among the alignment files by compiling and grafting together published phylogenies. This guide tree was then traversed in a post-order manner, performing each profile-to-profile alignment working recursively back to the root.</p>
</caption>
<media xlink:href="1471-2148-9-37-S1.pdf" mimetype="application" mime-subtype="pdf"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2"><caption><title>Additional file 2</title>
<p><bold>The "guide" tree used to direct the profile alignments across green plants (Viridiplantae).</bold>
 Similar to Fig S1, each terminal in the tree represents a separate alignment file output from our saturation analysis. Due to the large uncertainty at several nodes, members of a polytomy were profile aligned to find the best order.</p>
</caption>
<media xlink:href="1471-2148-9-37-S2.pdf" mimetype="application" mime-subtype="pdf"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back><ack><sec><title>Acknowledgements</title>
<p>Valuable feedback was obtained from Jeff Oliver and David Tank. We are especially grateful for initial discussions on large tree methodology with Brian Moore. SAS was partially supported by the Cyberinfrastructure for Phylogenetic Research (CIPRES) program (NSF #EF-0331654) and by the National Evolutionary Synthesis Center (NESCent; NSF #EF-0423641). MJD and JMB have been supported through a NSF "Tree of Life" (ATOL) award.</p>
</sec>
</ack>
<ref-list><ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bininda-Emonds</surname>
<given-names>ORP</given-names>
</name>
<name><surname>Cardillo</surname>
<given-names>M</given-names>
</name>
<name><surname>Jones</surname>
<given-names>KE</given-names>
</name>
<name><surname>MacPhee</surname>
<given-names>RDE</given-names>
</name>
<name><surname>Beck</surname>
<given-names>RMD</given-names>
</name>
<name><surname>Grenyer</surname>
<given-names>R</given-names>
</name>
<name><surname>Price</surname>
<given-names>SA</given-names>
</name>
<name><surname>Vos</surname>
<given-names>RA</given-names>
</name>
<name><surname>Gittleman</surname>
<given-names>JL</given-names>
</name>
<name><surname>Purvis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>The delayed rise of present-day mammals</article-title>
<source>Nature</source>
<year>2007</year>
<volume>446</volume>
<fpage>507</fpage>
<lpage>512</lpage>
<pub-id pub-id-type="pmid">17392779</pub-id>
</citation>
</ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Driskell</surname>
<given-names>AC</given-names>
</name>
<name><surname>Ané</surname>
<given-names>C</given-names>
</name>
<name><surname>Burleigh</surname>
<given-names>JG</given-names>
</name>
<name><surname>McMahon</surname>
<given-names>MM</given-names>
</name>
<name><surname>O'meara</surname>
<given-names>BC</given-names>
</name>
<name><surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Prospects for building the tree of life from large sequence databases</article-title>
<source>Science</source>
<year>2004</year>
<volume>306</volume>
<fpage>1172</fpage>
<lpage>1174</lpage>
<pub-id pub-id-type="pmid">15539599</pub-id>
</citation>
</ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ciccarelli</surname>
<given-names>FD</given-names>
</name>
<name><surname>Doerks</surname>
<given-names>T</given-names>
</name>
<name><surname>Mering</surname>
<given-names>C</given-names>
</name>
<name><surname>Creevey</surname>
<given-names>CJ</given-names>
</name>
<name><surname>Snell</surname>
<given-names>B</given-names>
</name>
<name><surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Toward automatic reconstruction of a highly resolved tree of life</article-title>
<source>Science</source>
<year>2006</year>
<volume>311</volume>
<fpage>1283</fpage>
<lpage>1287</lpage>
<pub-id pub-id-type="pmid">16513982</pub-id>
</citation>
</ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McMahon</surname>
<given-names>MM</given-names>
</name>
<name><surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes</article-title>
<source>Systematic Biology</source>
<year>2006</year>
<volume>55</volume>
<fpage>818</fpage>
<lpage>836</lpage>
<pub-id pub-id-type="pmid">17060202</pub-id>
</citation>
</ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chase</surname>
<given-names>MW</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name><surname>Olmstead</surname>
<given-names>RG</given-names>
</name>
<name><surname>Morgan</surname>
<given-names>D</given-names>
</name>
<name><surname>Les</surname>
<given-names>DH</given-names>
</name>
<name><surname>Mishler</surname>
<given-names>BD</given-names>
</name>
<name><surname>Duvall</surname>
<given-names>MR</given-names>
</name>
<name><surname>Price</surname>
<given-names>RA</given-names>
</name>
<name><surname>Hills</surname>
<given-names>HG</given-names>
</name>
<name><surname>Qui</surname>
<given-names>YL</given-names>
</name>
<name><surname>Kron</surname>
<given-names>KA</given-names>
</name>
<name><surname>Rettig</surname>
<given-names>JH</given-names>
</name>
<name><surname>Conti</surname>
<given-names>E</given-names>
</name>
<name><surname>Palmer</surname>
<given-names>JD</given-names>
</name>
<name><surname>Manhart</surname>
<given-names>JR</given-names>
</name>
<name><surname>Sytsma</surname>
<given-names>KJ</given-names>
</name>
<name><surname>Michael</surname>
<given-names>HJ</given-names>
</name>
<name><surname>Kress</surname>
<given-names>WJ</given-names>
</name>
<name><surname>Karol</surname>
<given-names>KG</given-names>
</name>
<name><surname>Clark</surname>
<given-names>WD</given-names>
</name>
<name><surname>Hedren</surname>
<given-names>M</given-names>
</name>
<name><surname>Gaut</surname>
<given-names>BS</given-names>
</name>
<name><surname>Jansen</surname>
<given-names>RK</given-names>
</name>
<name><surname>Kim</surname>
<given-names>KJ</given-names>
</name>
<name><surname>Wimpee</surname>
<given-names>CF</given-names>
</name>
<name><surname>Smith</surname>
<given-names>JF</given-names>
</name>
<name><surname>Furnier</surname>
<given-names>GR</given-names>
</name>
<name><surname>Strauss</surname>
<given-names>SH</given-names>
</name>
<name><surname>Xiang</surname>
<given-names>QY</given-names>
</name>
<name><surname>Plunkett</surname>
<given-names>GM</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>PS</given-names>
</name>
<name><surname>Swensen</surname>
<given-names>SM</given-names>
</name>
<name><surname>Williams</surname>
<given-names>SE</given-names>
</name>
<name><surname>Gadek</surname>
<given-names>PA</given-names>
</name>
<name><surname>Quinn</surname>
<given-names>CJ</given-names>
</name>
<name><surname>Eguiarte</surname>
<given-names>LE</given-names>
</name>
<name><surname>Golenberg</surname>
<given-names>E</given-names>
</name>
<name><surname>Learn</surname>
<given-names>GH</given-names>
<suffix>Jr</suffix>
</name>
<name><surname>Graham</surname>
<given-names>SW</given-names>
</name>
<name><surname>Barrett</surname>
<given-names>SCH</given-names>
</name>
<name><surname>Dayanandan</surname>
<given-names>S</given-names>
</name>
<name><surname>Albert</surname>
<given-names>VA</given-names>
</name>
</person-group>
<article-title>Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL</article-title>
<source>Annals of the Missouri Botanical Garden</source>
<year>1993</year>
<volume>80</volume>
<fpage>528</fpage>
<lpage>580</lpage>
</citation>
</ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hilu</surname>
<given-names>KW</given-names>
</name>
<name><surname>Borsch</surname>
<given-names>T</given-names>
</name>
<name><surname>Muller</surname>
<given-names>K</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>PS</given-names>
</name>
<name><surname>Savolainen</surname>
<given-names>V</given-names>
</name>
<name><surname>Chase</surname>
<given-names>MW</given-names>
</name>
<name><surname>Powell</surname>
<given-names>MP</given-names>
</name>
<name><surname>Alice</surname>
<given-names>LA</given-names>
</name>
<name><surname>Evans</surname>
<given-names>R</given-names>
</name>
<name><surname>Sauquet</surname>
<given-names>H</given-names>
</name>
<name><surname>Neinhuis</surname>
<given-names>C</given-names>
</name>
<name><surname>Slotta</surname>
<given-names>TAB</given-names>
</name>
<name><surname>Rohwer</surname>
<given-names>JG</given-names>
</name>
<name><surname>Campbell</surname>
<given-names>CS</given-names>
</name>
<name><surname>Chatrou</surname>
<given-names>LW</given-names>
</name>
</person-group>
<article-title>Angiosperm phylogeny based on <italic>matK </italic>
sequence information</article-title>
<source>American Journal of Botany</source>
<year>2003</year>
<volume>90</volume>
<fpage>1758</fpage>
<lpage>1776</lpage>
</citation>
</ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huson</surname>
<given-names>D</given-names>
</name>
<name><surname>Nettle</surname>
<given-names>S</given-names>
</name>
<name><surname>Warnow</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Disk-covering, a fast-converging method for phylogenetic tree reconstruction</article-title>
<source>Journal of Computational Biology</source>
<year>1999</year>
<volume>6</volume>
<fpage>369</fpage>
<lpage>386</lpage>
<pub-id pub-id-type="pmid">10582573</pub-id>
</citation>
</ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stamatakis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>2688</fpage>
<lpage>2690</lpage>
<pub-id pub-id-type="pmid">16928733</pub-id>
</citation>
</ref>
<ref id="B9"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zwickl</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<source>Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion</source>
<year>2006</year>
<publisher-name>Ph.D. dissertation, The University of Texas at Austin</publisher-name>
</citation>
</ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
<name><surname>Purvis</surname>
<given-names>A</given-names>
</name>
<name><surname>Henze</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Phylogenetic supertrees: assembling the trees of life</article-title>
<source>Trends in Ecology and Evolution</source>
<year>1998</year>
<volume>13</volume>
<fpage>105</fpage>
<lpage>109</lpage>
</citation>
</ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bininda-Emonds</surname>
<given-names>ORP</given-names>
</name>
</person-group>
<article-title>The evolution of supertrees</article-title>
<source>Trends Ecol Evol</source>
<year>2004</year>
<volume>19</volume>
<fpage>315</fpage>
<lpage>322</lpage>
<pub-id pub-id-type="pmid">16701277</pub-id>
</citation>
</ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baum</surname>
<given-names>BR</given-names>
</name>
</person-group>
<article-title>Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees</article-title>
<source>Taxon</source>
<year>1992</year>
<volume>41</volume>
<fpage>3</fpage>
<lpage>10</lpage>
</citation>
</ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ragan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Phylogenetic inference based on matrix representation of trees</article-title>
<source>Mol Phylogenet Evol</source>
<year>1992</year>
<volume>1</volume>
<fpage>53</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">1342924</pub-id>
</citation>
</ref>
<ref id="B14"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname>
<given-names>D</given-names>
</name>
<name><surname>Eulenstein</surname>
<given-names>O</given-names>
</name>
<name><surname>Fernandez</surname>
<given-names>D</given-names>
</name>
<name><surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Supertrees by flipping</article-title>
<source>Proceedings of COCOON 2002</source>
<year>2002</year>
<publisher-name>Springer-Verlag LNCS</publisher-name>
</citation>
</ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Semple</surname>
<given-names>C</given-names>
</name>
<name><surname>Steel</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>A supertree method for rooted trees</article-title>
<source>Discrete Appl Math</source>
<year>2000</year>
<volume>105</volume>
<fpage>147</fpage>
<lpage>158</lpage>
</citation>
</ref>
<ref id="B16"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Page</surname>
<given-names>RDM</given-names>
</name>
</person-group>
<person-group person-group-type="editor"><name><surname>Guigo R, Gusfield D</surname>
</name>
</person-group>
<article-title>Modified MinCut supertrees</article-title>
<source>WABI 2002</source>
<year>2002</year>
<publisher-name>LNCS</publisher-name>
<fpage>537</fpage>
<lpage>551</lpage>
</citation>
</ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname>
<given-names>BR</given-names>
</name>
<name><surname>Smith</surname>
<given-names>SA</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Increasing data transparency and estimating phylogenetic uncertainty in supertrees: approaches using nonparametric bootstrapping</article-title>
<source>Syst Biol</source>
<year>2007</year>
<volume>55</volume>
<fpage>662</fpage>
<lpage>676</lpage>
<pub-id pub-id-type="pmid">16969942</pub-id>
</citation>
</ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname>
<given-names>KE</given-names>
</name>
<name><surname>Purvis</surname>
<given-names>A</given-names>
</name>
<name><surname>MacLarnon</surname>
<given-names>A</given-names>
</name>
<name><surname>Bininda-Emonds</surname>
<given-names>OR</given-names>
</name>
<name><surname>Simmons</surname>
<given-names>NB</given-names>
</name>
</person-group>
<article-title>A phylogenetic supertree of the bats (Mammalia: Chiroptera)</article-title>
<source>Biological Reviews</source>
<year>2002</year>
<volume>77</volume>
<fpage>223</fpage>
<lpage>259</lpage>
<pub-id pub-id-type="pmid">12056748</pub-id>
</citation>
</ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruta</surname>
<given-names>M</given-names>
</name>
<name><surname>Jeffery</surname>
<given-names>JE</given-names>
</name>
<name><surname>Coates</surname>
<given-names>MI</given-names>
</name>
</person-group>
<article-title>A supertree of early tetrapods</article-title>
<source>Proc Biol Sci</source>
<year>2003</year>
<volume>270</volume>
<fpage>2507</fpage>
<lpage>2516</lpage>
<pub-id pub-id-type="pmid">14667343</pub-id>
</citation>
</ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Queiroz</surname>
<given-names>A</given-names>
</name>
<name><surname>Gatesy</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>The supermatrix approach to systematics</article-title>
<source>Trends Ecol Evol</source>
<year>2007</year>
<volume>22</volume>
<fpage>34</fpage>
<lpage>41</lpage>
<pub-id pub-id-type="pmid">17046100</pub-id>
</citation>
</ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Enright</surname>
<given-names>AJ</given-names>
</name>
<name><surname>Dongen</surname>
<given-names>SV</given-names>
</name>
<name><surname>Ouzounis</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>An efficient algorithm for large-scale detection of protein families</article-title>
<source>Nucleic Acids Research</source>
<year>2002</year>
<volume>30</volume>
<fpage>1575</fpage>
<lpage>1584</lpage>
<pub-id pub-id-type="pmid">11917018</pub-id>
</citation>
</ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
<name><surname>Driskell</surname>
<given-names>AC</given-names>
</name>
<name><surname>Ree</surname>
<given-names>RH</given-names>
</name>
<name><surname>Eulenstein</surname>
<given-names>O</given-names>
</name>
<name><surname>Langley</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Obtaining maximal concatenated phylogenetic data sets from large sequence databases</article-title>
<source>Mol Biol Evol</source>
<year>2003</year>
<volume>20</volume>
<fpage>1036</fpage>
<lpage>1042</lpage>
<pub-id pub-id-type="pmid">12777519</pub-id>
</citation>
</ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname>
<given-names>C</given-names>
</name>
<name><surname>Burleigh</surname>
<given-names>JG</given-names>
</name>
<name><surname>Eulenstein</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Identifying optimal incomplete phylogenetic data sets from sequence databases</article-title>
<source>Mol Phylogenet Evol</source>
<year>2005</year>
<volume>35</volume>
<fpage>528</fpage>
<lpage>535</lpage>
<pub-id pub-id-type="pmid">15878123</pub-id>
</citation>
</ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanderson</surname>
<given-names>MJ</given-names>
</name>
<name><surname>Boss</surname>
<given-names>D</given-names>
</name>
<name><surname>Chen</surname>
<given-names>D</given-names>
</name>
<name><surname>Cranston</surname>
<given-names>KA</given-names>
</name>
<name><surname>Wehe</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>The PhyLoTA browser: processing GenBank for molecular phylogenetics research</article-title>
<source>Systematic Biology</source>
<year>2008</year>
<volume>57</volume>
<fpage>335</fpage>
<lpage>346</lpage>
<pub-id pub-id-type="pmid">18570030</pub-id>
</citation>
</ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roure</surname>
<given-names>B</given-names>
</name>
<name><surname>Rodriguez-Ezpeleta</surname>
<given-names>N</given-names>
</name>
<name><surname>Philippe</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics</article-title>
<source>BMC Evolutionary Biology</source>
<year>2007</year>
<volume>7</volume>
<fpage>S2</fpage>
<pub-id pub-id-type="pmid">17288575</pub-id>
</citation>
</ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ren</surname>
<given-names>F</given-names>
</name>
<name><surname>Tanaka</surname>
<given-names>H</given-names>
</name>
<name><surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>A likelihood look at the supermatrix-supertree controversy</article-title>
<source>Gene</source>
<comment></comment>
<pub-id pub-id-type="pmid">18502054</pub-id>
</citation>
</ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname>
<given-names>SA</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Rates of molecular evolution linked to life history in flowering plants</article-title>
<source>Science</source>
<year>2008</year>
<volume>322</volume>
<fpage>86</fpage>
<lpage>89</lpage>
<pub-id pub-id-type="pmid">18832643</pub-id>
</citation>
</ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hickson</surname>
<given-names>RE</given-names>
</name>
<name><surname>Simon</surname>
<given-names>C</given-names>
</name>
<name><surname>Perrey</surname>
<given-names>SW</given-names>
</name>
</person-group>
<article-title>The Performance of Several Multiple-Sequence Alignment Programs in Relation to Secondary-Structure Features for an rRNA Sequence</article-title>
<source>Mol Biol Evol</source>
<year>2000</year>
<volume>17</volume>
<fpage>530</fpage>
<lpage>539</lpage>
<pub-id pub-id-type="pmid">10742045</pub-id>
</citation>
</ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname>
<given-names>JR</given-names>
</name>
<name><surname>Douady</surname>
<given-names>CJ</given-names>
</name>
<name><surname>Italia</surname>
<given-names>MJ</given-names>
</name>
<name><surname>Marshall</surname>
<given-names>WE</given-names>
</name>
<name><surname>Stanhope</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Universal trees based on large combined protein sequence data sets</article-title>
<source>Nature Genetics</source>
<year>2001</year>
<volume>28</volume>
<fpage>281</fpage>
<lpage>285</lpage>
<pub-id pub-id-type="pmid">11431701</pub-id>
</citation>
</ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edgar</surname>
<given-names>RC</given-names>
</name>
<name><surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Multiple sequence alignment</article-title>
<source>Current Opinion in Structural Biology</source>
<year>2006</year>
<volume>16</volume>
<fpage>368</fpage>
<lpage>373</lpage>
<pub-id pub-id-type="pmid">16679011</pub-id>
</citation>
</ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cammarano</surname>
<given-names>P</given-names>
</name>
<name><surname>Creti</surname>
<given-names>R</given-names>
</name>
<name><surname>Sanangelantoni</surname>
<given-names>AM</given-names>
</name>
<name><surname>Palm</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>The archaea monophyly issue: A phylogeny of translational elongation factor G(2) sequences inferred from anoptimized selection of alignment positions</article-title>
<source>J Mol Evol</source>
<year>1999</year>
<volume>49</volume>
<fpage>524</fpage>
<lpage>537</lpage>
<pub-id pub-id-type="pmid">10486009</pub-id>
</citation>
</ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mugridge</surname>
<given-names>NB</given-names>
</name>
<name><surname>Morrison</surname>
<given-names>DA</given-names>
</name>
<name><surname>Jäkel</surname>
<given-names>T</given-names>
</name>
<name><surname>Heckeroth</surname>
<given-names>AR</given-names>
</name>
<name><surname>Tenter</surname>
<given-names>AM</given-names>
</name>
<name><surname>Johnson</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Effects of sequence alignment and structural domains of ribosomal DNA on phylogeny reconstruction for the protozoan family sarcocystidae</article-title>
<source>Mol Biol Evol</source>
<year>2000</year>
<volume>17</volume>
<fpage>1842</fpage>
<lpage>1853</lpage>
<pub-id pub-id-type="pmid">11110900</pub-id>
</citation>
</ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ogden</surname>
<given-names>TH</given-names>
</name>
<name><surname>Rosenberg</surname>
<given-names>MS</given-names>
</name>
</person-group>
<article-title>Multiple sequence alignment accuracy and phylogenetic inference</article-title>
<source>Systematic Biology</source>
<year>2006</year>
<volume>55</volume>
<fpage>314</fpage>
<lpage>328</lpage>
<pub-id pub-id-type="pmid">16611602</pub-id>
</citation>
</ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dunn</surname>
<given-names>CW</given-names>
</name>
<name><surname>Hejnol</surname>
<given-names>A</given-names>
</name>
<name><surname>Matus</surname>
<given-names>DQ</given-names>
</name>
<name><surname>Pang</surname>
<given-names>K</given-names>
</name>
<name><surname>Browne</surname>
<given-names>WE</given-names>
</name>
<name><surname>Smith</surname>
<given-names>SA</given-names>
</name>
<name><surname>Seaver</surname>
<given-names>E</given-names>
</name>
<name><surname>Rouse</surname>
<given-names>GW</given-names>
</name>
<name><surname>Obst</surname>
<given-names>M</given-names>
</name>
<name><surname>Edgecombe</surname>
<given-names>GD</given-names>
</name>
<name><surname>Sorensen</surname>
<given-names>MV</given-names>
</name>
<name><surname>Haddock</surname>
<given-names>SHD</given-names>
</name>
<name><surname>Schmidt-Rhaesa</surname>
<given-names>A</given-names>
</name>
<name><surname>Okusu</surname>
<given-names>A</given-names>
</name>
<name><surname>Kristensen</surname>
<given-names>RM</given-names>
</name>
<name><surname>Wheeler</surname>
<given-names>WC</given-names>
</name>
<name><surname>Martindale</surname>
<given-names>MQ</given-names>
</name>
<name><surname>Giribet</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Broad phylogenomic sampling improves the resolution of the animal tree of life</article-title>
<source>Nature</source>
<year>2008</year>
<volume>452</volume>
<fpage>745</fpage>
<lpage>749</lpage>
<pub-id pub-id-type="pmid">18322464</pub-id>
</citation>
</ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>von Ohsen</surname>
<given-names>N</given-names>
</name>
<name><surname>Sommer</surname>
<given-names>I</given-names>
</name>
<name><surname>Zimmer</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Profile-profile alignment: a powerful tool for protein structure prediction</article-title>
<source>Pac Symp Biocomput</source>
<year>2003</year>
<fpage>252</fpage>
<lpage>263</lpage>
<pub-id pub-id-type="pmid">12603033</pub-id>
</citation>
</ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edgar</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>MUSCLE: multiple sequence alignment with high accuracy and high throughput</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>1792</fpage>
<lpage>1797</lpage>
<pub-id pub-id-type="pmid">15034147</pub-id>
</citation>
</ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katoh</surname>
<given-names>K</given-names>
</name>
<name><surname>Kuma</surname>
<given-names>K</given-names>
</name>
<name><surname>Toh</surname>
<given-names>H</given-names>
</name>
<name><surname>Miyata</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>MAFFT version 5: improvement in accuracy of multiple sequence alignment</article-title>
<source>Nucleic Acids Research</source>
<year>2005</year>
<volume>33</volume>
<fpage>511</fpage>
<lpage>518</lpage>
<pub-id pub-id-type="pmid">15661851</pub-id>
</citation>
</ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cantino</surname>
<given-names>PD</given-names>
</name>
<name><surname>Doyle</surname>
<given-names>JA</given-names>
</name>
<name><surname>Graham</surname>
<given-names>SW</given-names>
</name>
<name><surname>Judd</surname>
<given-names>WS</given-names>
</name>
<name><surname>Olmstead</surname>
<given-names>RG</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>PS</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Towards a phylogenetic nomenclature of Tracheophyta</article-title>
<source>Taxon</source>
<year>2007</year>
<volume>56</volume>
<fpage>822</fpage>
<lpage>846</lpage>
</citation>
</ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rousseeuw</surname>
<given-names>PJ</given-names>
</name>
<name><surname>Croux</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Alternatives to the median absolute deviation</article-title>
<source>Journal of the American Statistical Association</source>
<year>1993</year>
<volume>88</volume>
<fpage>1273</fpage>
<lpage>1283</lpage>
</citation>
</ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rambaut</surname>
<given-names>A</given-names>
</name>
<name><surname>Grassly</surname>
<given-names>NC</given-names>
</name>
</person-group>
<article-title>Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees</article-title>
<source>Comput Appl Biosci</source>
<year>1997</year>
<volume>13</volume>
<fpage>235</fpage>
<lpage>238</lpage>
<pub-id pub-id-type="pmid">9183526</pub-id>
</citation>
</ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Winkworth</surname>
<given-names>RC</given-names>
</name>
<name><surname>Lundberg</surname>
<given-names>J</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Resolving campanulid phylogeny: some preliminary insights</article-title>
<source>Taxon</source>
<year>2008</year>
<volume>57</volume>
<fpage>53</fpage>
<lpage>65</lpage>
</citation>
</ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karehed</surname>
<given-names>J</given-names>
</name>
<name><surname>Lundberg</surname>
<given-names>J</given-names>
</name>
<name><surname>Bremer</surname>
<given-names>B</given-names>
</name>
<name><surname>Bremer</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Evolution of the Australasian families Alseuosmiaceae, Argophyllaceae, and Phellinaceae</article-title>
<source>Systematic Botany</source>
<year>1999</year>
<volume>24</volume>
<fpage>660</fpage>
<lpage>682</lpage>
</citation>
</ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olmstead</surname>
<given-names>RG</given-names>
</name>
<name><surname>Kim</surname>
<given-names>KJ</given-names>
</name>
<name><surname>Jansen</surname>
<given-names>RK</given-names>
</name>
<name><surname>Wagstaff</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>The phylogeny of the Asteridae <italic>sensu lato </italic>
based on chloroplast ndhF gene sequences</article-title>
<source>Mol Phylogenet Evol</source>
<year>2000</year>
<volume>16</volume>
<fpage>96</fpage>
<lpage>112</lpage>
<pub-id pub-id-type="pmid">10877943</pub-id>
</citation>
</ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lundberg</surname>
<given-names>J</given-names>
</name>
<name><surname>Bremer</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>A phylogenetic study of the order Asterales using one morphological and three molecular data sets</article-title>
<source>International Journal of Plant Sciences</source>
<year>2003</year>
<volume>164</volume>
<fpage>553</fpage>
<lpage>578</lpage>
</citation>
</ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soltis</surname>
<given-names>DE</given-names>
</name>
<name><surname>Gitzendanner</surname>
<given-names>MA</given-names>
</name>
<name><surname>Soltis</surname>
<given-names>PS</given-names>
</name>
</person-group>
<article-title>A 567-taxon data set for angiosperms: the challenges posed by Bayesian analyses of large data sets</article-title>
<source>International Journal of Plant Sciences</source>
<year>2007</year>
<volume>168</volume>
<fpage>137</fpage>
<lpage>157</lpage>
</citation>
</ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Panero</surname>
<given-names>JL</given-names>
</name>
<name><surname>Funk</surname>
<given-names>VA</given-names>
</name>
</person-group>
<article-title>The value of sampling anomalous taxa in phylogenetic studies: major clades of the Asteraceae revealed</article-title>
<source>Mol Phylogenet Evol</source>
<year>2008</year>
<volume>47</volume>
<fpage>757</fpage>
<lpage>782</lpage>
<pub-id pub-id-type="pmid">18375151</pub-id>
</citation>
</ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weiblen</surname>
<given-names>GD</given-names>
</name>
<name><surname>Oyama</surname>
<given-names>RK</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Phylogenetic analysis of dioecy in monocotyledons</article-title>
<source>The American Naturalist</source>
<year>2000</year>
<volume>155</volume>
<fpage>46</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">10657176</pub-id>
</citation>
</ref>
<ref id="B48"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kadereit</surname>
<given-names>J</given-names>
</name>
</person-group>
<person-group person-group-type="editor"><name><surname>Kadereit JW, Jeffrey C</surname>
</name>
</person-group>
<article-title>Asterales: introduction and conspectus</article-title>
<source>The families and genera of vascular plants: flowering plants, eudicots, Asterales</source>
<year>2007</year>
<publisher-name>Springer</publisher-name>
<fpage>1</fpage>
<lpage>6</lpage>
</citation>
</ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname>
<given-names>JM</given-names>
</name>
<name><surname>Kovacic</surname>
<given-names>S</given-names>
</name>
<name><surname>Liber</surname>
<given-names>Z</given-names>
</name>
<name><surname>Eddie</surname>
<given-names>WMM</given-names>
</name>
<name><surname>Schneeweiss</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Phylogeny and biogeography if isophyllous species of <italic>Campanula </italic>
(Campanulaceae) in the Mediterranean area</article-title>
<source>Systematic Botany</source>
<year>2006</year>
<volume>31</volume>
<fpage>862</fpage>
<lpage>880</lpage>
</citation>
</ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roquet</surname>
<given-names>C</given-names>
</name>
<name><surname>Saez</surname>
<given-names>L</given-names>
</name>
<name><surname>Aldasoro</surname>
<given-names>JJ</given-names>
</name>
<name><surname>Susanna</surname>
<given-names>A</given-names>
</name>
<name><surname>Alarcon</surname>
<given-names>ML</given-names>
</name>
<name><surname>Garcia-Jacas</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Natural delineation, molecular phylogeny and floral evolution in <italic>Campanula</italic>
</article-title>
<source>Systematic Botany</source>
<year>2008</year>
<volume>33</volume>
<fpage>203</fpage>
<lpage>217</lpage>
</citation>
</ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pelser</surname>
<given-names>PB</given-names>
</name>
<name><surname>Nordenstam</surname>
<given-names>B</given-names>
</name>
<name><surname>Kadereit</surname>
<given-names>J</given-names>
</name>
<name><surname>Watson</surname>
<given-names>LE</given-names>
</name>
</person-group>
<article-title>An ITS phylogeny of tribe Senecioneae (Asteraceae) and a new delimitation of <italic>Senecio </italic>
L</article-title>
<source>Taxon</source>
<year>2007</year>
<volume>56</volume>
<fpage>1077</fpage>
<lpage>1114</lpage>
</citation>
</ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lemieux</surname>
<given-names>C</given-names>
</name>
<name><surname>Otis</surname>
<given-names>C</given-names>
</name>
<name><surname>Turmel</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Ancestral chloroplast genome in <italic>Mesostigma viride </italic>
reveals an early branch of green plant evolution</article-title>
<source>Nature</source>
<year>2000</year>
<volume>403</volume>
<fpage>649</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="pmid">10688199</pub-id>
</citation>
</ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karol</surname>
<given-names>KG</given-names>
</name>
<name><surname>McCourt</surname>
<given-names>RM</given-names>
</name>
<name><surname>Cimino</surname>
<given-names>MT</given-names>
</name>
<name><surname>Delwiche</surname>
<given-names>CF</given-names>
</name>
</person-group>
<article-title>The closest living relatives of land plants</article-title>
<source>Science</source>
<year>2001</year>
<volume>294</volume>
<fpage>2351</fpage>
<lpage>2353</lpage>
<pub-id pub-id-type="pmid">11743201</pub-id>
</citation>
</ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pryer</surname>
<given-names>KM</given-names>
</name>
<name><surname>Schneider</surname>
<given-names>H</given-names>
</name>
<name><surname>Smith</surname>
<given-names>AR</given-names>
</name>
<name><surname>Cranfill</surname>
<given-names>R</given-names>
</name>
<name><surname>Wolf</surname>
<given-names>PG</given-names>
</name>
<name><surname>Hunt</surname>
<given-names>JS</given-names>
</name>
<name><surname>Sipes</surname>
<given-names>SD</given-names>
</name>
</person-group>
<article-title>Horsetails and ferns are a monophyletic group and the closest living relatives to seed plants</article-title>
<source>Nature</source>
<year>2001</year>
<volume>409</volume>
<fpage>618</fpage>
<lpage>621</lpage>
<pub-id pub-id-type="pmid">11214320</pub-id>
</citation>
</ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qiu</surname>
<given-names>YL</given-names>
</name>
<name><surname>Li</surname>
<given-names>LB</given-names>
</name>
<name><surname>Wang</surname>
<given-names>B</given-names>
</name>
<name><surname>Chen</surname>
<given-names>ZD</given-names>
</name>
<name><surname>Knoop</surname>
<given-names>V</given-names>
</name>
<name><surname>Groth-Malonek</surname>
<given-names>M</given-names>
</name>
<name><surname>Dombrovska</surname>
<given-names>O</given-names>
</name>
<name><surname>Lee</surname>
<given-names>J</given-names>
</name>
<name><surname>Kent</surname>
<given-names>L</given-names>
</name>
<name><surname>Rest</surname>
<given-names>J</given-names>
</name>
<name><surname>Estabrook</surname>
<given-names>GF</given-names>
</name>
<name><surname>Hendry</surname>
<given-names>TA</given-names>
</name>
<name><surname>Taylor</surname>
<given-names>DW</given-names>
</name>
<name><surname>Testa</surname>
<given-names>CM</given-names>
</name>
<name><surname>Ambros</surname>
<given-names>M</given-names>
</name>
<name><surname>Crandall-Stotler</surname>
<given-names>B</given-names>
</name>
<name><surname>Duff</surname>
<given-names>RJ</given-names>
</name>
<name><surname>Stech</surname>
<given-names>M</given-names>
</name>
<name><surname>Frey</surname>
<given-names>W</given-names>
</name>
<name><surname>Quandt</surname>
<given-names>D</given-names>
</name>
<name><surname>Davis</surname>
<given-names>CC</given-names>
</name>
</person-group>
<article-title>The deepest divergences in land plants inferred from phylogenomic evidence</article-title>
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<year>2006</year>
<volume>103</volume>
<fpage>15511</fpage>
<lpage>15516</lpage>
<pub-id pub-id-type="pmid">17030812</pub-id>
</citation>
</ref>
<ref id="B56"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<person-group person-group-type="editor"><name><surname>Cracraft J, Donoghue MJ</surname>
</name>
</person-group>
<article-title>Immeasurable progress on the Tree of Life</article-title>
<source>Assembling the Tree of Life</source>
<year>2004</year>
<publisher-name>Oxford Press</publisher-name>
<fpage>548</fpage>
<lpage>552</lpage>
</citation>
</ref>
<ref id="B57"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kendrick</surname>
<given-names>P</given-names>
</name>
<name><surname>Crane</surname>
<given-names>PR</given-names>
</name>
</person-group>
<source>The origin and early diversification of land plants: a cladistic study</source>
<year>1997</year>
<publisher-name>Smithsonian Institution Press</publisher-name>
</citation>
</ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolf</surname>
<given-names>PG</given-names>
</name>
<name><surname>Karol</surname>
<given-names>KG</given-names>
</name>
<name><surname>Mandoli</surname>
<given-names>DF</given-names>
</name>
<name><surname>Kuehl</surname>
<given-names>J</given-names>
</name>
<name><surname>Arumuganathan</surname>
<given-names>K</given-names>
</name>
<name><surname>Ellis</surname>
<given-names>MW</given-names>
</name>
<name><surname>Mishler</surname>
<given-names>BD</given-names>
</name>
<name><surname>Kelch</surname>
<given-names>DG</given-names>
</name>
<name><surname>Olmstead</surname>
<given-names>RG</given-names>
</name>
<name><surname>Boore</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>The first complete chloroplast genome sequence of a lycophyte, <italic>Huperzia lucidula </italic>
(Lycopodiaceae)</article-title>
<source>Gene</source>
<year>2005</year>
<volume>350</volume>
<fpage>117</fpage>
<lpage>128</lpage>
<pub-id pub-id-type="pmid">15788152</pub-id>
</citation>
</ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsuji</surname>
<given-names>S</given-names>
</name>
<name><surname>Ueda</surname>
<given-names>K</given-names>
</name>
<name><surname>Nishiyama</surname>
<given-names>T</given-names>
</name>
<name><surname>Hasebe</surname>
<given-names>M</given-names>
</name>
<name><surname>Yoshikawa</surname>
<given-names>S</given-names>
</name>
<name><surname>Konagaya</surname>
<given-names>A</given-names>
</name>
<name><surname>Nishiuchi</surname>
<given-names>T</given-names>
</name>
<name><surname>Yamaguchi</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>The chloroplast genome from a lycophyte (microphyllophyte), <italic>Selaginella uncinata</italic>
, has a unique inversion, transpositions and many gene losses</article-title>
<source>Journal of Plant Research</source>
<year>2007</year>
<volume>120</volume>
<fpage>281</fpage>
<lpage>290</lpage>
<pub-id pub-id-type="pmid">17297557</pub-id>
</citation>
</ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hillis</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>Taxonomic sampling, phylogenetic accuracy, and investigator bias</article-title>
<source>Systematic Biology</source>
<year>1998</year>
<volume>47</volume>
<fpage>3</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="pmid">12064238</pub-id>
</citation>
</ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwickl</surname>
<given-names>DJ</given-names>
</name>
<name><surname>Hillis</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>Increased taxon sampling greatly reduces phylogenetic error</article-title>
<source>Systematic Biology</source>
<year>2002</year>
<volume>51</volume>
<fpage>588</fpage>
<lpage>598</lpage>
<pub-id pub-id-type="pmid">12228001</pub-id>
</citation>
</ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schneider</surname>
<given-names>H</given-names>
</name>
<name><surname>Shuettpelz</surname>
<given-names>E</given-names>
</name>
<name><surname>Pryer</surname>
<given-names>KM</given-names>
</name>
<name><surname>Cranfill</surname>
<given-names>R</given-names>
</name>
<name><surname>Magallon</surname>
<given-names>S</given-names>
</name>
<name><surname>Lupia</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Ferns diversified in the shadows of angiosperms</article-title>
<source>Nature</source>
<year>2004</year>
<volume>428</volume>
<fpage>553</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="pmid">15058303</pub-id>
</citation>
</ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moles</surname>
<given-names>AT</given-names>
</name>
<name><surname>Ackerly</surname>
<given-names>DD</given-names>
</name>
<name><surname>Webb</surname>
<given-names>CO</given-names>
</name>
<name><surname>Tweddle</surname>
<given-names>JC</given-names>
</name>
<name><surname>Dickie</surname>
<given-names>JB</given-names>
</name>
<name><surname>Westoby</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>A brief history of seed size</article-title>
<source>Science</source>
<year>2005</year>
<volume>307</volume>
<fpage>576</fpage>
<lpage>580</lpage>
<pub-id pub-id-type="pmid">15681384</pub-id>
</citation>
</ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beaulieu</surname>
<given-names>JM</given-names>
</name>
<name><surname>Moles</surname>
<given-names>AT</given-names>
</name>
<name><surname>Leitch</surname>
<given-names>IJ</given-names>
</name>
<name><surname>Bennett</surname>
<given-names>MD</given-names>
</name>
<name><surname>Dickie</surname>
<given-names>JB</given-names>
</name>
<name><surname>Knight</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>Correlated evolution of genome size and seed mass</article-title>
<source>New Phytologist</source>
<year>2007</year>
<volume>173</volume>
<fpage>422</fpage>
<lpage>437</lpage>
<pub-id pub-id-type="pmid">17204088</pub-id>
</citation>
</ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname>
<given-names>IJ</given-names>
</name>
<name><surname>Ackerly</surname>
<given-names>DD</given-names>
</name>
<name><surname>Bongers</surname>
<given-names>F</given-names>
</name>
<name><surname>Harms</surname>
<given-names>KE</given-names>
</name>
<name><surname>Ibarra-Manriquez</surname>
<given-names>G</given-names>
</name>
<name><surname>Martinez-Ramos</surname>
<given-names>M</given-names>
</name>
<name><surname>Mazer</surname>
<given-names>SJ</given-names>
</name>
<name><surname>Muller-Landau</surname>
<given-names>HC</given-names>
</name>
<name><surname>Paz</surname>
<given-names>H</given-names>
</name>
<name><surname>Pitman</surname>
<given-names>NCA</given-names>
</name>
<name><surname>Poorter</surname>
<given-names>L</given-names>
</name>
<name><surname>Silman</surname>
<given-names>MR</given-names>
</name>
<name><surname>Vriesendorp</surname>
<given-names>CF</given-names>
</name>
<name><surname>Webb</surname>
<given-names>CO</given-names>
</name>
<name><surname>Westoby</surname>
<given-names>M</given-names>
</name>
<name><surname>Wright</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>Relationships among ecologically important dimensions of plant trait variation in seven neotropical forests</article-title>
<source>Annals of Botany</source>
<year>2007</year>
<volume>99</volume>
<fpage>1003</fpage>
<lpage>1015</lpage>
<pub-id pub-id-type="pmid">16595553</pub-id>
</citation>
</ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Forest</surname>
<given-names>F</given-names>
</name>
<name><surname>Grenyer</surname>
<given-names>R</given-names>
</name>
<name><surname>Rouget</surname>
<given-names>M</given-names>
</name>
<name><surname>Davies</surname>
<given-names>TJ</given-names>
</name>
<name><surname>Cowling</surname>
<given-names>RM</given-names>
</name>
<name><surname>Faith</surname>
<given-names>DP</given-names>
</name>
<name><surname>Balmford</surname>
<given-names>A</given-names>
</name>
<name><surname>Manning</surname>
<given-names>JC</given-names>
</name>
<name><surname>Proche</surname>
<given-names>S</given-names>
</name>
<name><surname>Bank</surname>
<given-names>M</given-names>
</name>
<name><surname>Reeves</surname>
<given-names>G</given-names>
</name>
<name><surname>Hedderson</surname>
<given-names>TAJ</given-names>
</name>
<name><surname>Savolainen</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Preserving the evolutionary potential of floras in biodiversity hotspots</article-title>
<source>Nature</source>
<year>2007</year>
<volume>445</volume>
<fpage>757</fpage>
<lpage>760</lpage>
<pub-id pub-id-type="pmid">17301791</pub-id>
</citation>
</ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edwards</surname>
<given-names>EJ</given-names>
</name>
<name><surname>Still</surname>
<given-names>CJ</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>The relevance of phylogeny to studies of global change</article-title>
<source>Trends Ecol Evol</source>
<year>2007</year>
<volume>22</volume>
<fpage>243</fpage>
<lpage>249</lpage>
<pub-id pub-id-type="pmid">17296242</pub-id>
</citation>
</ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname>
<given-names>CO</given-names>
</name>
<name><surname>Donoghue</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Phylomatic: tree assembly for applied phylogenetics</article-title>
<source>Molecular Ecology Notes</source>
<year>2005</year>
<volume>5</volume>
<fpage>181</fpage>
<lpage>183</lpage>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000215 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000215 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2645364
   |texte=   Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:19210768" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024

	Serveur d'exploration Cyberinfrastructure
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration Cyberinfrastructure

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki