Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.

Identifieur interne : 001F80 ( PubMed/Corpus ); précédent : 001F79; suivant : 001F81

Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.

Auteurs : Se-Ran Jun ; Gregory E. Sims ; Guohong A. Wu ; Sung-Hou Kim

Source :

RBID : pubmed:20018669

English descriptors

Abstract

We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously "unclassified" genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.

DOI: 10.1073/pnas.0913033107
PubMed: 20018669

Links to Exploration step

pubmed:20018669

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.</title>
<author>
<name sortKey="Jun, Se Ran" sort="Jun, Se Ran" uniqKey="Jun S" first="Se-Ran" last="Jun">Se-Ran Jun</name>
<affiliation>
<nlm:affiliation>Department of Chemistry, University of California, Berkeley, CA 94720, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Sims, Gregory E" sort="Sims, Gregory E" uniqKey="Sims G" first="Gregory E" last="Sims">Gregory E. Sims</name>
</author>
<author>
<name sortKey="Wu, Guohong A" sort="Wu, Guohong A" uniqKey="Wu G" first="Guohong A" last="Wu">Guohong A. Wu</name>
</author>
<author>
<name sortKey="Kim, Sung Hou" sort="Kim, Sung Hou" uniqKey="Kim S" first="Sung-Hou" last="Kim">Sung-Hou Kim</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="RBID">pubmed:20018669</idno>
<idno type="pmid">20018669</idno>
<idno type="doi">10.1073/pnas.0913033107</idno>
<idno type="wicri:Area/PubMed/Corpus">001F80</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F80</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.</title>
<author>
<name sortKey="Jun, Se Ran" sort="Jun, Se Ran" uniqKey="Jun S" first="Se-Ran" last="Jun">Se-Ran Jun</name>
<affiliation>
<nlm:affiliation>Department of Chemistry, University of California, Berkeley, CA 94720, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Sims, Gregory E" sort="Sims, Gregory E" uniqKey="Sims G" first="Gregory E" last="Sims">Gregory E. Sims</name>
</author>
<author>
<name sortKey="Wu, Guohong A" sort="Wu, Guohong A" uniqKey="Wu G" first="Guohong A" last="Wu">Guohong A. Wu</name>
</author>
<author>
<name sortKey="Kim, Sung Hou" sort="Kim, Sung Hou" uniqKey="Kim S" first="Sung-Hou" last="Kim">Sung-Hou Kim</name>
</author>
</analytic>
<series>
<title level="j">Proceedings of the National Academy of Sciences of the United States of America</title>
<idno type="eISSN">1091-6490</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Genome</term>
<term>Phylogeny</term>
<term>Prokaryotic Cells (classification)</term>
<term>Prokaryotic Cells (physiology)</term>
<term>Proteome (genetics)</term>
<term>Proteomics (methods)</term>
<term>Sequence Alignment (methods)</term>
<term>Sequence Analysis, Protein (methods)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>Proteome</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en">
<term>Prokaryotic Cells</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Proteomics</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, Protein</term>
</keywords>
<keywords scheme="MESH" qualifier="physiology" xml:lang="en">
<term>Prokaryotic Cells</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Genome</term>
<term>Phylogeny</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously "unclassified" genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">20018669</PMID>
<DateCompleted>
<Year>2010</Year>
<Month>03</Month>
<Day>09</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1091-6490</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>107</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2010</Year>
<Month>Jan</Month>
<Day>05</Day>
</PubDate>
</JournalIssue>
<Title>Proceedings of the National Academy of Sciences of the United States of America</Title>
<ISOAbbreviation>Proc. Natl. Acad. Sci. U.S.A.</ISOAbbreviation>
</Journal>
<ArticleTitle>Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.</ArticleTitle>
<Pagination>
<MedlinePgn>133-8</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1073/pnas.0913033107</ELocationID>
<Abstract>
<AbstractText>We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously "unclassified" genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Jun</LastName>
<ForeName>Se-Ran</ForeName>
<Initials>SR</Initials>
<AffiliationInfo>
<Affiliation>Department of Chemistry, University of California, Berkeley, CA 94720, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Sims</LastName>
<ForeName>Gregory E</ForeName>
<Initials>GE</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Wu</LastName>
<ForeName>Guohong A</ForeName>
<Initials>GA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Kim</LastName>
<ForeName>Sung-Hou</ForeName>
<Initials>SH</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>P50 GM062412</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>GM62412</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2009</Year>
<Month>12</Month>
<Day>14</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Proc Natl Acad Sci U S A</MedlineTA>
<NlmUniqueID>7505876</NlmUniqueID>
<ISSNLinking>0027-8424</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D020543">Proteome</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D016678" MajorTopicYN="N">Genome</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010802" MajorTopicYN="Y">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011387" MajorTopicYN="Y">Prokaryotic Cells</DescriptorName>
<QualifierName UI="Q000145" MajorTopicYN="N">classification</QualifierName>
<QualifierName UI="Q000502" MajorTopicYN="N">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020543" MajorTopicYN="N">Proteome</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D040901" MajorTopicYN="N">Proteomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020539" MajorTopicYN="N">Sequence Analysis, Protein</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2009</Year>
<Month>12</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2009</Year>
<Month>12</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2010</Year>
<Month>3</Month>
<Day>10</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">20018669</ArticleId>
<ArticleId IdType="pii">0913033107</ArticleId>
<ArticleId IdType="doi">10.1073/pnas.0913033107</ArticleId>
<ArticleId IdType="pmc">PMC2806744</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>J Mol Evol. 2003 Aug;57(2):140-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14562958</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Biol Evol. 1987 Jul;4(4):406-25</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3447015</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Genet. 2009 Jan;5(1):e1000344</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19165319</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Annu Rev Microbiol. 2005;59:191-209</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16153168</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2677-82</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19188606</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2003 Feb;13(2):145-58</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12566393</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Ecol Evol. 2004 Jun;19(6):315-22</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16701277</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Biol Phys. 2003 Mar;29(1):23-38</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23345817</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Annu Rev Microbiol. 2005;59:299-328</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15910279</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1977 Nov;74(11):5088-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">270744</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2008;9:322</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18662388</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2005 Jan 11;102(2):373-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15630082</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Evol. 2004 Jan;58(1):1-11</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14743310</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2002 May 16;417(6886):244</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12015592</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Biol Evol. 1997 Jul;14(7):685-95</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9254330</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Genet. 2002 Sep;18(9):472-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12175808</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Genet. 2005 May;6(5):361-75</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15861208</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2006 Mar 3;311(5765):1283-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16513982</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Biol Evol. 2007 May;24(5):1181-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17331957</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2004 Sep 1;20(13):2044-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15044248</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2006 Mar;13(2):336-50</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16597244</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1989 Dec;86(23):9355-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">2531898</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Ecol Evol. 2008 May;23(5):276-81</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18367290</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2000 Sep 12;97(19):10567-72</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10954745</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2004;32(16):4937-44</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15383646</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2007 Jan;35(Database issue):D169-72</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17090583</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Biol Direct. 2008;3:54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19105819</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 May 15;21(10):2329-35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15166018</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2004;5(2):R12</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14759262</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1995 Apr 7;247(4):536-40</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7723011</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14681378</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Biol Evol. 2004 Mar;21(3):612-24</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14739253</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2003 Nov 1;19(16):2122-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14594718</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2006 May 19;312(5776):1011-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16709776</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Biochimie. 2007 Dec;89(12):1454-63</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17949885</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Trends Genet. 2002 Mar;18(3):158-62</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11858840</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F80 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001F80 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:20018669
   |texte=   Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:20018669" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021