Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.

Identifieur interne : 002341 ( PubMed/Curation ); précédent : 002340; suivant : 002342

Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.

Auteurs : Zheng Rong Yang [Royaume-Uni]

Source :

RBID : pubmed:15797903

Descripteurs français

English descriptors

Abstract

Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2-P1-P1'-P2', the conventional inductive programming method may result in a rule like 'if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved'. If the site P1 is not orthogonal to the others (for instance, P2, P1' and P2'), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data.

DOI: 10.1093/bioinformatics/bti404
PubMed: 15797903

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:15797903

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.</title>
<author>
<name sortKey="Yang, Zheng Rong" sort="Yang, Zheng Rong" uniqKey="Yang Z" first="Zheng Rong" last="Yang">Zheng Rong Yang</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science, Exeter University, United Kingdom. z.r.yang@exeter.ac.uk</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Computer Science, Exeter University</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:15797903</idno>
<idno type="pmid">15797903</idno>
<idno type="doi">10.1093/bioinformatics/bti404</idno>
<idno type="wicri:Area/PubMed/Corpus">002341</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002341</idno>
<idno type="wicri:Area/PubMed/Curation">002341</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002341</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.</title>
<author>
<name sortKey="Yang, Zheng Rong" sort="Yang, Zheng Rong" uniqKey="Yang Z" first="Zheng Rong" last="Yang">Zheng Rong Yang</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science, Exeter University, United Kingdom. z.r.yang@exeter.ac.uk</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Computer Science, Exeter University</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="ISSN">1367-4803</idno>
<imprint>
<date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Artificial Intelligence</term>
<term>Binding Sites</term>
<term>Computer Simulation</term>
<term>Cysteine Endopeptidases</term>
<term>Databases, Protein</term>
<term>Decision Support Techniques</term>
<term>Endopeptidases (analysis)</term>
<term>Endopeptidases (chemistry)</term>
<term>Models, Chemical</term>
<term>Models, Molecular</term>
<term>Protein Binding</term>
<term>Sequence Alignment (methods)</term>
<term>Sequence Analysis, Protein (methods)</term>
<term>Sequence Homology, Amino Acid</term>
<term>Viral Proteins (analysis)</term>
<term>Viral Proteins (chemistry)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Alignement de séquences ()</term>
<term>Analyse de séquence de protéine ()</term>
<term>Bases de données de protéines</term>
<term>Cysteine endopeptidases</term>
<term>Endopeptidases ()</term>
<term>Endopeptidases (analyse)</term>
<term>Intelligence artificielle</term>
<term>Liaison aux protéines</term>
<term>Modèles chimiques</term>
<term>Modèles moléculaires</term>
<term>Protéines virales ()</term>
<term>Protéines virales (analyse)</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Simulation numérique</term>
<term>Sites de fixation</term>
<term>Techniques d'aide à la décision</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="analysis" xml:lang="en">
<term>Endopeptidases</term>
<term>Viral Proteins</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>Endopeptidases</term>
<term>Viral Proteins</term>
</keywords>
<keywords scheme="MESH" type="chemical" xml:lang="en">
<term>Cysteine Endopeptidases</term>
</keywords>
<keywords scheme="MESH" qualifier="analyse" xml:lang="fr">
<term>Endopeptidases</term>
<term>Protéines virales</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Alignment</term>
<term>Sequence Analysis, Protein</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Artificial Intelligence</term>
<term>Binding Sites</term>
<term>Computer Simulation</term>
<term>Databases, Protein</term>
<term>Decision Support Techniques</term>
<term>Models, Chemical</term>
<term>Models, Molecular</term>
<term>Protein Binding</term>
<term>Sequence Homology, Amino Acid</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence de protéine</term>
<term>Bases de données de protéines</term>
<term>Cysteine endopeptidases</term>
<term>Endopeptidases</term>
<term>Intelligence artificielle</term>
<term>Liaison aux protéines</term>
<term>Modèles chimiques</term>
<term>Modèles moléculaires</term>
<term>Protéines virales</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Simulation numérique</term>
<term>Sites de fixation</term>
<term>Techniques d'aide à la décision</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2-P1-P1'-P2', the conventional inductive programming method may result in a rule like 'if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved'. If the site P1 is not orthogonal to the others (for instance, P2, P1' and P2'), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">15797903</PMID>
<DateCompleted>
<Year>2005</Year>
<Month>09</Month>
<Day>16</Day>
</DateCompleted>
<DateRevised>
<Year>2020</Year>
<Month>03</Month>
<Day>25</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Print">1367-4803</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>21</Volume>
<Issue>11</Issue>
<PubDate>
<Year>2005</Year>
<Month>Jun</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.</ArticleTitle>
<Pagination>
<MedlinePgn>2644-50</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2-P1-P1'-P2', the conventional inductive programming method may result in a rule like 'if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved'. If the site P1 is not orthogonal to the others (for instance, P2, P1' and P2'), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data.</AbstractText>
<AbstractText Label="RESULT" NlmCategory="RESULTS">Eighteen sequences of coronavirus polyprotein were downloaded from NCBI (http://www.ncbi.nlm.nih.gov). Among these sequences, 252 cleavage sites were experimentally determined. These sequences were scanned using a sliding window with size k to generate about 50,000 k-mer sub-sequences (for short, k-mers). The value of k varies from 4 to 12 with a gap of two. The bio-basis function proposed by Thomson et al. is used to transform the k-mers to a high-dimensional numerical space on which an inductive programming method is applied for the purpose of deriving a decision tree for decision-making. The process of this transform is referred to as a bio-mapping. The constructed decision trees select about 10 out of 50,000 k-mers. This small set of selected k-mers is regarded as a set of decisive templates. By doing so, non-orthogonal decision trees are constructed using the selected templates and the prediction accuracy is significantly improved.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Yang</LastName>
<ForeName>Zheng Rong</ForeName>
<Initials>ZR</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, Exeter University, United Kingdom. z.r.yang@exeter.ac.uk</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D003160">Comparative Study</PublicationType>
<PublicationType UI="D023362">Evaluation Study</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2005</Year>
<Month>03</Month>
<Day>29</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D014764">Viral Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>EC 3.4.-</RegistryNumber>
<NameOfSubstance UI="D010450">Endopeptidases</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>EC 3.4.22.-</RegistryNumber>
<NameOfSubstance UI="C099456">3C-like proteinase, Coronavirus</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>EC 3.4.22.-</RegistryNumber>
<NameOfSubstance UI="D003546">Cysteine Endopeptidases</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001185" MajorTopicYN="Y">Artificial Intelligence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001665" MajorTopicYN="N">Binding Sites</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003198" MajorTopicYN="N">Computer Simulation</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003546" MajorTopicYN="N">Cysteine Endopeptidases</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D030562" MajorTopicYN="N">Databases, Protein</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003661" MajorTopicYN="Y">Decision Support Techniques</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010450" MajorTopicYN="N">Endopeptidases</DescriptorName>
<QualifierName UI="Q000032" MajorTopicYN="N">analysis</QualifierName>
<QualifierName UI="Q000737" MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008956" MajorTopicYN="Y">Models, Chemical</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008958" MajorTopicYN="N">Models, Molecular</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011485" MajorTopicYN="N">Protein Binding</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020539" MajorTopicYN="N">Sequence Analysis, Protein</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017386" MajorTopicYN="N">Sequence Homology, Amino Acid</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D014764" MajorTopicYN="N">Viral Proteins</DescriptorName>
<QualifierName UI="Q000032" MajorTopicYN="N">analysis</QualifierName>
<QualifierName UI="Q000737" MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2005</Year>
<Month>3</Month>
<Day>31</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2005</Year>
<Month>9</Month>
<Day>17</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2005</Year>
<Month>3</Month>
<Day>31</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">15797903</ArticleId>
<ArticleId IdType="pii">bti404</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/bti404</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002341 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 002341 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:15797903
   |texte=   Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:15797903" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021