Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Recapitulating phylogenies using k-mers: from trees to networks.

Identifieur interne : 000E08 ( PubMed/Corpus ); précédent : 000E07; suivant : 000E09

Recapitulating phylogenies using k-mers: from trees to networks.

Auteurs : Guillaume Bernard ; Mark A. Ragan ; Cheong Xin Chan

Source :

RBID : pubmed:28105314

Abstract

Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.

DOI: 10.12688/f1000research.10225.2
PubMed: 28105314

Links to Exploration step

pubmed:28105314

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Recapitulating phylogenies using
<i>k</i>
-mers: from trees to networks.</title>
<author>
<name sortKey="Bernard, Guillaume" sort="Bernard, Guillaume" uniqKey="Bernard G" first="Guillaume" last="Bernard">Guillaume Bernard</name>
<affiliation>
<nlm:affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A" last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Chan, Cheong Xin" sort="Chan, Cheong Xin" uniqKey="Chan C" first="Cheong Xin" last="Chan">Cheong Xin Chan</name>
<affiliation>
<nlm:affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:28105314</idno>
<idno type="pmid">28105314</idno>
<idno type="doi">10.12688/f1000research.10225.2</idno>
<idno type="wicri:Area/PubMed/Corpus">000E08</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000E08</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Recapitulating phylogenies using
<i>k</i>
-mers: from trees to networks.</title>
<author>
<name sortKey="Bernard, Guillaume" sort="Bernard, Guillaume" uniqKey="Bernard G" first="Guillaume" last="Bernard">Guillaume Bernard</name>
<affiliation>
<nlm:affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ragan, Mark A" sort="Ragan, Mark A" uniqKey="Ragan M" first="Mark A" last="Ragan">Mark A. Ragan</name>
<affiliation>
<nlm:affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Chan, Cheong Xin" sort="Chan, Cheong Xin" uniqKey="Chan C" first="Cheong Xin" last="Chan">Cheong Xin Chan</name>
<affiliation>
<nlm:affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">F1000Research</title>
<idno type="ISSN">2046-1402</idno>
<imprint>
<date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared
<i>k</i>
-mers (subsequences at fixed length
<i>k</i>
). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using
<i>k</i>
-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" VersionID="2" VersionDate="2016/12/23" Owner="NLM">
<PMID Version="2">28105314</PMID>
<DateRevised>
<Year>2019</Year>
<Month>11</Month>
<Day>20</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Print">2046-1402</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>5</Volume>
<PubDate>
<Year>2016</Year>
</PubDate>
</JournalIssue>
<Title>F1000Research</Title>
<ISOAbbreviation>F1000Res</ISOAbbreviation>
</Journal>
<ArticleTitle>Recapitulating phylogenies using
<i>k</i>
-mers: from trees to networks.</ArticleTitle>
<Pagination>
<MedlinePgn>2789</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.12688/f1000research.10225.2</ELocationID>
<Abstract>
<AbstractText>Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared
<i>k</i>
-mers (subsequences at fixed length
<i>k</i>
). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using
<i>k</i>
-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Bernard</LastName>
<ForeName>Guillaume</ForeName>
<Initials>G</Initials>
<AffiliationInfo>
<Affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Ragan</LastName>
<ForeName>Mark A</ForeName>
<Initials>MA</Initials>
<AffiliationInfo>
<Affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Chan</LastName>
<ForeName>Cheong Xin</ForeName>
<Initials>CX</Initials>
<Identifier Source="ORCID">0000-0002-3729-8176</Identifier>
<AffiliationInfo>
<Affiliation>Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>11</Month>
<Day>29</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>F1000Res</MedlineTA>
<NlmUniqueID>101594320</NlmUniqueID>
<ISSNLinking>2046-1402</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">k-mers</Keyword>
<Keyword MajorTopicYN="N">phylogenetic networks</Keyword>
<Keyword MajorTopicYN="N">phylogenetic trees</Keyword>
<Keyword MajorTopicYN="N">phylogenies</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>12</Month>
<Day>20</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2017</Year>
<Month>1</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2017</Year>
<Month>1</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2017</Year>
<Month>1</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">28105314</ArticleId>
<ArticleId IdType="doi">10.12688/f1000research.10225.2</ArticleId>
<ArticleId IdType="pmc">PMC5224691</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>PLoS Comput Biol. 2007 Aug;3(8):e123</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17784778</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Sci Rep. 2016 Jul 01;6:28970</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27363362</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2015 May 14;521(7551):173-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25945739</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Brief Bioinform. 2014 Nov;15(6):890-905</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23904502</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Eukaryot Microbiol. 2012 Sep;59(5):429-93</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23020233</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2009 Dec;16(12):1615-34</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20001252</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2005 Oct 4;102(40):14332-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16176988</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 1999 Jun 25;284(5423):2124-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10381871</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>RNA Biol. 2014;11(3):176-85</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24572375</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Jun 1;27(11):1466-72</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21471011</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Sci Rep. 2014 Sep 30;4:6504</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25266120</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Biol. 2014 Aug 21;12:66</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25141959</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2005 Jul;15(7):954-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15965028</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>F1000Res. 2016 Jul 25;5:null</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27508073</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Microbiol. 2011 Oct;19(10):483-91</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21820313</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Brief Bioinform. 2014 May;15(3):407-18</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24291823</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2006;7(10):118</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17081279</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Sci Rep. 2016 Jul 25;6:30308</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27453035</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Pharmacogenomics. 2002 Jan;3(1):131-44</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11966409</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Syst Biol. 2003 Aug;52(4):515-27</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12857642</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genetics. 2013 Aug;194(4):793-805</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23908372</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Microbiol. 2016 Mar;24(3):224-37</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26774999</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Biol Direct. 2013 Jan 22;8:3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23339707</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2010 Nov;17(11):1467-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20973742</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Biol Evol. 2006 Feb;23(2):254-67</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16221896</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Genet. 2008 Jul 18;4(7):e1000128</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18650965</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol Evol. 2011;3:23-35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21081312</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2003 Apr 29;100(9):5455-60</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12704232</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Genet. 2002 Nov;32(3):402-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12219091</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Genet. 2000 May;16(5):227-31</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10782117</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E08 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000E08 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:28105314
   |texte=   Recapitulating phylogenies using k-mers: from trees to networks.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:28105314" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021