Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Physicochemical property distributions for accurate and rapid pairwise protein homology detection.

Identifieur interne : 001F60 ( PubMed/Corpus ); précédent : 001F59; suivant : 001F61

Physicochemical property distributions for accurate and rapid pairwise protein homology detection.

Auteurs : Bobbie-Jo M. Webb-Robertson ; Kyle G. Ratuiste ; Christopher S. Oehmen

Source :

RBID : pubmed:20302613

English descriptors

Abstract

The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.

DOI: 10.1186/1471-2105-11-145
PubMed: 20302613

Links to Exploration step

pubmed:20302613

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Physicochemical property distributions for accurate and rapid pairwise protein homology detection.</title>
<author>
<name sortKey="Webb Robertson, Bobbie Jo M" sort="Webb Robertson, Bobbie Jo M" uniqKey="Webb Robertson B" first="Bobbie-Jo M" last="Webb-Robertson">Bobbie-Jo M. Webb-Robertson</name>
<affiliation>
<nlm:affiliation>Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA. bj@pnl.gov</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ratuiste, Kyle G" sort="Ratuiste, Kyle G" uniqKey="Ratuiste K" first="Kyle G" last="Ratuiste">Kyle G. Ratuiste</name>
</author>
<author>
<name sortKey="Oehmen, Christopher S" sort="Oehmen, Christopher S" uniqKey="Oehmen C" first="Christopher S" last="Oehmen">Christopher S. Oehmen</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="RBID">pubmed:20302613</idno>
<idno type="pmid">20302613</idno>
<idno type="doi">10.1186/1471-2105-11-145</idno>
<idno type="wicri:Area/PubMed/Corpus">001F60</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F60</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Physicochemical property distributions for accurate and rapid pairwise protein homology detection.</title>
<author>
<name sortKey="Webb Robertson, Bobbie Jo M" sort="Webb Robertson, Bobbie Jo M" uniqKey="Webb Robertson B" first="Bobbie-Jo M" last="Webb-Robertson">Bobbie-Jo M. Webb-Robertson</name>
<affiliation>
<nlm:affiliation>Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA. bj@pnl.gov</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ratuiste, Kyle G" sort="Ratuiste, Kyle G" uniqKey="Ratuiste K" first="Kyle G" last="Ratuiste">Kyle G. Ratuiste</name>
</author>
<author>
<name sortKey="Oehmen, Christopher S" sort="Oehmen, Christopher S" uniqKey="Oehmen C" first="Christopher S" last="Oehmen">Christopher S. Oehmen</name>
</author>
</analytic>
<series>
<title level="j">BMC bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Computational Biology (methods)</term>
<term>Databases, Protein</term>
<term>Pattern Recognition, Automated</term>
<term>Proteins (chemistry)</term>
<term>Sequence Homology, Amino Acid</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Databases, Protein</term>
<term>Pattern Recognition, Automated</term>
<term>Sequence Homology, Amino Acid</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">20302613</PMID>
<DateCompleted>
<Year>2010</Year>
<Month>05</Month>
<Day>11</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">1471-2105</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>11</Volume>
<PubDate>
<Year>2010</Year>
<Month>Mar</Month>
<Day>19</Day>
</PubDate>
</JournalIssue>
<Title>BMC bioinformatics</Title>
<ISOAbbreviation>BMC Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Physicochemical property distributions for accurate and rapid pairwise protein homology detection.</ArticleTitle>
<Pagination>
<MedlinePgn>145</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/1471-2105-11-145</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Webb-Robertson</LastName>
<ForeName>Bobbie-Jo M</ForeName>
<Initials>BJ</Initials>
<AffiliationInfo>
<Affiliation>Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA. bj@pnl.gov</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Ratuiste</LastName>
<ForeName>Kyle G</ForeName>
<Initials>KG</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Oehmen</LastName>
<ForeName>Christopher S</ForeName>
<Initials>CS</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2010</Year>
<Month>03</Month>
<Day>19</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>BMC Bioinformatics</MedlineTA>
<NlmUniqueID>100965194</NlmUniqueID>
<ISSNLinking>1471-2105</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D011506">Proteins</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D030562" MajorTopicYN="N">Databases, Protein</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010363" MajorTopicYN="N">Pattern Recognition, Automated</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011506" MajorTopicYN="N">Proteins</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017386" MajorTopicYN="Y">Sequence Homology, Amino Acid</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2009</Year>
<Month>10</Month>
<Day>24</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2010</Year>
<Month>03</Month>
<Day>19</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2010</Year>
<Month>3</Month>
<Day>23</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2010</Year>
<Month>3</Month>
<Day>23</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2010</Year>
<Month>5</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">20302613</ArticleId>
<ArticleId IdType="pii">1471-2105-11-145</ArticleId>
<ArticleId IdType="doi">10.1186/1471-2105-11-145</ArticleId>
<ArticleId IdType="pmc">PMC2851606</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2009 Jan 1;25(1):121-2</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18990723</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2009 Jul 1;25(13):1602-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19389731</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Conf Proc IEEE Eng Med Biol Soc. 2005;7:7738-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17282075</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2006;7 Suppl 1:S10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16723003</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2004 Mar 1;20(4):467-76</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14990442</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Oct 1;21(19):3711-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16076885</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2006 Feb 1;22(3):285-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16317074</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2008;9:510</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19046430</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2008 May 15;24(10):1264-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18378524</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2008 May;4(5):e1000077</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18464927</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Comput Biol Chem. 2005 Dec;29(6):440-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16290168</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2003;10(6):857-68</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14980014</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2003;19 Suppl 1:i26-33</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12855434</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2007 Jul 15;23(14):1728-36</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17488755</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2008;9:389</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18808707</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2004 Jul 22;20(11):1682-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14988126</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>FEBS J. 2005 Oct;272(20):5119-28</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16218946</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Dec 1;21(23):4239-47</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16188929</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1990 Oct 5;215(3):403-10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">2231712</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2008 Jan;36(Database issue):D202-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17998252</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2003 Nov 22;19(17):2294-301</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14630658</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1995 Apr 7;247(4):536-40</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7723011</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2008 Mar 15;24(6):783-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18245127</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Pac Symp Biocomput. 2002;:564-75</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11928508</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2009 Mar 15;25(6):729-35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19164303</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2006 Sep 15;22(18):2224-31</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16837522</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proteins. 2004 Nov 15;57(3):518-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15382242</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Biochem Biophys Res Commun. 1992 Apr 30;184(2):1008-14</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1575719</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1981 Mar 25;147(1):195-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7265238</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Theor Biol. 2008 May 7;252(1):145-54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18342336</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 1997 Sep 1;25(17):3389-402</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9254694</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Comput Biol Chem. 2008 Dec;32(6):458-61</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18722814</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F60 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001F60 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:20302613
   |texte=   Physicochemical property distributions for accurate and rapid pairwise protein homology detection.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:20302613" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021