Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Is multiple-sequence alignment required for accurate inference of phylogeny?

Identifieur interne : 002188 ( PubMed/Corpus ); précédent : 002187; suivant : 002189

Is multiple-sequence alignment required for accurate inference of phylogeny?

Auteurs : Michael Höhl ; Mark A. Ragan

Source :

RBID : pubmed:17454975

English descriptors

Abstract

The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.

DOI: 10.1080/10635150701294741
PubMed: 17454975

Links to Exploration step

pubmed:17454975

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Is multiple-sequence alignment required for accurate inference of phylogeny?</title>
<author>
<name sortKey="Hohl, Michael" sort="Hohl, Michael" uniqKey="Hohl M" first="Michael" last="Höhl">Michael Höhl</name>
<affiliation>
<nlm:affiliation>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A" last="Ragan">Mark A. Ragan</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2007">2007</date>
<idno type="RBID">pubmed:17454975</idno>
<idno type="pmid">17454975</idno>
<idno type="doi">10.1080/10635150701294741</idno>
<idno type="wicri:Area/PubMed/Corpus">002188</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002188</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Is multiple-sequence alignment required for accurate inference of phylogeny?</title>
<author>
<name sortKey="Hohl, Michael" sort="Hohl, Michael" uniqKey="Hohl M" first="Michael" last="Höhl">Michael Höhl</name>
<affiliation>
<nlm:affiliation>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A" last="Ragan">Mark A. Ragan</name>
</author>
</analytic>
<series>
<title level="j">Systematic biology</title>
<idno type="ISSN">1063-5157</idno>
<imprint>
<date when="2007" type="published">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Bayes Theorem</term>
<term>Computational Biology (methods)</term>
<term>Likelihood Functions</term>
<term>Models, Genetic</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Bayes Theorem</term>
<term>Likelihood Functions</term>
<term>Models, Genetic</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">17454975</PMID>
<DateCompleted>
<Year>2007</Year>
<Month>07</Month>
<Day>12</Day>
</DateCompleted>
<DateRevised>
<Year>2020</Year>
<Month>04</Month>
<Day>03</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">1063-5157</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>56</Volume>
<Issue>2</Issue>
<PubDate>
<Year>2007</Year>
<Month>Apr</Month>
</PubDate>
</JournalIssue>
<Title>Systematic biology</Title>
<ISOAbbreviation>Syst. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Is multiple-sequence alignment required for accurate inference of phylogeny?</ArticleTitle>
<Pagination>
<MedlinePgn>206-21</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Höhl</LastName>
<ForeName>Michael</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Ragan</LastName>
<ForeName>Mark A</ForeName>
<Initials>MA</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D023362">Evaluation Study</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Syst Biol</MedlineTA>
<NlmUniqueID>9302532</NlmUniqueID>
<ISSNLinking>1063-5157</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D001499" MajorTopicYN="N">Bayes Theorem</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016013" MajorTopicYN="N">Likelihood Functions</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008957" MajorTopicYN="N">Models, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010802" MajorTopicYN="Y">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016415" MajorTopicYN="Y">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017421" MajorTopicYN="N">Sequence Analysis</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2007</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2007</Year>
<Month>7</Month>
<Day>13</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2007</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">17454975</ArticleId>
<ArticleId IdType="pii">776485575</ArticleId>
<ArticleId IdType="doi">10.1080/10635150701294741</ArticleId>
<ArticleId IdType="pmc">PMC7107264</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2004 Jan 22;20(2):206-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14734312</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2005 Oct 4;102(40):14332-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16176988</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2003 Nov 1;19(16):2122-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14594718</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Mol Evol. 2004 Jan;58(1):1-11</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14743310</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Bioinform Comput Biol. 2003 Oct;1(3):475-93</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15290766</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2001 Feb;17(2):149-54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11238070</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3460087</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Mol Biol Evol. 2002 Apr;19(4):554-62</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11919297</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Comput Appl Biosci. 1997 Jun;13(3):235-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9183526</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 1998;14(1):55-67</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9520502</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Comput Biol. 2005 Oct;12(8):1103-16</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16241900</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Comput Appl Biosci. 1992 Jun;8(3):275-82</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1633570</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1438297</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Mol Biol Evol. 2000 Apr;17(4):540-52</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10742046</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>BMC Bioinformatics. 2004 Dec 17;5:204</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15606920</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2002 Jan;18(1):100-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11836217</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Comput Biol. 1994 Winter;1(4):337-48</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">8790475</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2003 Mar 1;19(4):513-23</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12611807</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>BMC Bioinformatics. 2004 Apr 29;5:45</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15115543</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2004 Feb 12;20(3):399-406</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14764560</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Syst Biol. 2006 Aug;55(4):553-65</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16857650</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Mol Biol Evol. 1987 Jul;4(4):406-25</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3447015</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Syst Biol. 2001 Nov-Dec;50(6):913-25</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12116640</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2001 Aug;17(8):754-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11524383</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Evolution. 1992 Feb;46(1):159-173</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28564959</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Evol Bioinform Online. 2007 Feb 25;2:359-75</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19455227</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Comput Biol. 2006 Mar;13(2):336-50</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16597244</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Mol Biol Evol. 2005 Mar;22(3):792-802</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15590907</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2003 Aug 12;19(12):1572-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12912839</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Nucleic Acids Res. 2004 Jan 16;32(1):380-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14729922</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2005 May 15;21(10):2230-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15728118</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Bioinform Comput Biol. 2004 Mar;2(1):1-19</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15272430</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Philos Trans R Soc Lond B Biol Sci. 1994 May 28;344(1309):305-11</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7938201</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Theor Biol. 1993 Sep 7;164(1):65-83</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">8264244</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Biometrics. 1997 Dec;53(4):1431-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9423258</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Mol Biol Evol. 2004 Jan;21(1):200-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14595102</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Syst Biol. 2006 Apr;55(2):314-28</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16611602</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Nucleic Acids Res. 2004 Mar 19;32(5):1792-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15034147</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002188 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 002188 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:17454975
   |texte=   Is multiple-sequence alignment required for accurate inference of phylogeny?
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:17454975" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021