Is multiple-sequence alignment required for accurate inference of phylogeny?
Identifieur interne : 002188 ( PubMed/Corpus ); précédent : 002187; suivant : 002189Is multiple-sequence alignment required for accurate inference of phylogeny?
Auteurs : Michael Höhl ; Mark A. RaganSource :
- Systematic biology [ 1063-5157 ] ; 2007.
English descriptors
- KwdEn :
- MESH :
Abstract
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.
DOI: 10.1080/10635150701294741
PubMed: 17454975
Links to Exploration step
pubmed:17454975Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Is multiple-sequence alignment required for accurate inference of phylogeny?</title>
<author><name sortKey="Hohl, Michael" sort="Hohl, Michael" uniqKey="Hohl M" first="Michael" last="Höhl">Michael Höhl</name>
<affiliation><nlm:affiliation>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A" last="Ragan">Mark A. Ragan</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2007">2007</date>
<idno type="RBID">pubmed:17454975</idno>
<idno type="pmid">17454975</idno>
<idno type="doi">10.1080/10635150701294741</idno>
<idno type="wicri:Area/PubMed/Corpus">002188</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002188</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Is multiple-sequence alignment required for accurate inference of phylogeny?</title>
<author><name sortKey="Hohl, Michael" sort="Hohl, Michael" uniqKey="Hohl M" first="Michael" last="Höhl">Michael Höhl</name>
<affiliation><nlm:affiliation>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A" last="Ragan">Mark A. Ragan</name>
</author>
</analytic>
<series><title level="j">Systematic biology</title>
<idno type="ISSN">1063-5157</idno>
<imprint><date when="2007" type="published">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Bayes Theorem</term>
<term>Computational Biology (methods)</term>
<term>Likelihood Functions</term>
<term>Models, Genetic</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
<term>Sequence Analysis</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Bayes Theorem</term>
<term>Likelihood Functions</term>
<term>Models, Genetic</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" Owner="NLM"><PMID Version="1">17454975</PMID>
<DateCompleted><Year>2007</Year>
<Month>07</Month>
<Day>12</Day>
</DateCompleted>
<DateRevised><Year>2020</Year>
<Month>04</Month>
<Day>03</Day>
</DateRevised>
<Article PubModel="Print"><Journal><ISSN IssnType="Print">1063-5157</ISSN>
<JournalIssue CitedMedium="Print"><Volume>56</Volume>
<Issue>2</Issue>
<PubDate><Year>2007</Year>
<Month>Apr</Month>
</PubDate>
</JournalIssue>
<Title>Systematic biology</Title>
<ISOAbbreviation>Syst. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Is multiple-sequence alignment required for accurate inference of phylogeny?</ArticleTitle>
<Pagination><MedlinePgn>206-21</MedlinePgn>
</Pagination>
<Abstract><AbstractText>The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Höhl</LastName>
<ForeName>Michael</ForeName>
<Initials>M</Initials>
<AffiliationInfo><Affiliation>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Ragan</LastName>
<ForeName>Mark A</ForeName>
<Initials>MA</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D023362">Evaluation Study</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Syst Biol</MedlineTA>
<NlmUniqueID>9302532</NlmUniqueID>
<ISSNLinking>1063-5157</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D001499" MajorTopicYN="N">Bayes Theorem</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016013" MajorTopicYN="N">Likelihood Functions</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D008957" MajorTopicYN="N">Models, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D010802" MajorTopicYN="Y">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016415" MajorTopicYN="Y">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017421" MajorTopicYN="N">Sequence Analysis</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="pubmed"><Year>2007</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2007</Year>
<Month>7</Month>
<Day>13</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2007</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">17454975</ArticleId>
<ArticleId IdType="pii">776485575</ArticleId>
<ArticleId IdType="doi">10.1080/10635150701294741</ArticleId>
<ArticleId IdType="pmc">PMC7107264</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Bioinformatics. 2004 Jan 22;20(2):206-15</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14734312</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Proc Natl Acad Sci U S A. 2005 Oct 4;102(40):14332-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16176988</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2003 Nov 1;19(16):2122-30</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14594718</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Mol Evol. 2004 Jan;58(1):1-11</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14743310</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Bioinform Comput Biol. 2003 Oct;1(3):475-93</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15290766</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2001 Feb;17(2):149-54</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11238070</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">3460087</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Mol Biol Evol. 2002 Apr;19(4):554-62</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11919297</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Comput Appl Biosci. 1997 Jun;13(3):235-8</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9183526</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 1998;14(1):55-67</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9520502</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Comput Biol. 2005 Oct;12(8):1103-16</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16241900</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Comput Appl Biosci. 1992 Jun;8(3):275-82</Citation>
<ArticleIdList><ArticleId IdType="pubmed">1633570</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">1438297</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Mol Biol Evol. 2000 Apr;17(4):540-52</Citation>
<ArticleIdList><ArticleId IdType="pubmed">10742046</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>BMC Bioinformatics. 2004 Dec 17;5:204</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15606920</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2002 Jan;18(1):100-8</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11836217</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Comput Biol. 1994 Winter;1(4):337-48</Citation>
<ArticleIdList><ArticleId IdType="pubmed">8790475</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2003 Mar 1;19(4):513-23</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12611807</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>BMC Bioinformatics. 2004 Apr 29;5:45</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15115543</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2004 Feb 12;20(3):399-406</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14764560</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Syst Biol. 2006 Aug;55(4):553-65</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16857650</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Mol Biol Evol. 1987 Jul;4(4):406-25</Citation>
<ArticleIdList><ArticleId IdType="pubmed">3447015</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Syst Biol. 2001 Nov-Dec;50(6):913-25</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12116640</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2001 Aug;17(8):754-5</Citation>
<ArticleIdList><ArticleId IdType="pubmed">11524383</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Evolution. 1992 Feb;46(1):159-173</Citation>
<ArticleIdList><ArticleId IdType="pubmed">28564959</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Evol Bioinform Online. 2007 Feb 25;2:359-75</Citation>
<ArticleIdList><ArticleId IdType="pubmed">19455227</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Comput Biol. 2006 Mar;13(2):336-50</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16597244</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Mol Biol Evol. 2005 Mar;22(3):792-802</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15590907</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2003 Aug 12;19(12):1572-4</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12912839</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Nucleic Acids Res. 2004 Jan 16;32(1):380-5</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14729922</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2005 May 15;21(10):2230-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15728118</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Bioinform Comput Biol. 2004 Mar;2(1):1-19</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15272430</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Philos Trans R Soc Lond B Biol Sci. 1994 May 28;344(1309):305-11</Citation>
<ArticleIdList><ArticleId IdType="pubmed">7938201</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>J Theor Biol. 1993 Sep 7;164(1):65-83</Citation>
<ArticleIdList><ArticleId IdType="pubmed">8264244</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Biometrics. 1997 Dec;53(4):1431-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">9423258</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Mol Biol Evol. 2004 Jan;21(1):200-6</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14595102</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Syst Biol. 2006 Apr;55(2):314-28</Citation>
<ArticleIdList><ArticleId IdType="pubmed">16611602</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Nucleic Acids Res. 2004 Mar 19;32(5):1792-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15034147</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002188 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 002188 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:17454975 |texte= Is multiple-sequence alignment required for accurate inference of phylogeny? }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:17454975" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |