Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Machine learning for regulatory analysis and transcription factor target prediction in yeast

Identifieur interne : 001420 ( Pmc/Checkpoint ); précédent : 001419; suivant : 001421

Machine learning for regulatory analysis and transcription factor target prediction in yeast

Auteurs : Dustin T. Holloway [États-Unis] ; Mark Kon [États-Unis] ; Charles Delisi [États-Unis]

Source :

RBID : PMC:2533145

Abstract

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps—the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.

Electronic Supplementary Material

Supplementary material is available in the online version of this article at http://dx.doi.org/10.1007/s11693-006-9003-3 and is accessible for authorized users.


Url:
DOI: 10.1007/s11693-006-9003-3
PubMed: 19003435
PubMed Central: 2533145


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:2533145

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Machine learning for regulatory analysis and transcription factor target prediction in yeast</title>
<author>
<name sortKey="Holloway, Dustin T" sort="Holloway, Dustin T" uniqKey="Holloway D" first="Dustin T." last="Holloway">Dustin T. Holloway</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff1">Molecular Biology Cell Biology and Biochemistry, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Molecular Biology Cell Biology and Biochemistry, Boston University, Boston</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kon, Mark" sort="Kon, Mark" uniqKey="Kon M" first="Mark" last="Kon">Mark Kon</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">Department of Mathematics and Statistics, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Department of Mathematics and Statistics, Boston University, Boston</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="Aff3">Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Bioinformatics and Systems Biology, Boston University, Boston</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Delisi, Charles" sort="Delisi, Charles" uniqKey="Delisi C" first="Charles" last="Delisi">Charles Delisi</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff3">Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Bioinformatics and Systems Biology, Boston University, Boston</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19003435</idno>
<idno type="pmc">2533145</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2533145</idno>
<idno type="RBID">PMC:2533145</idno>
<idno type="doi">10.1007/s11693-006-9003-3</idno>
<date when="2006">2006</date>
<idno type="wicri:Area/Pmc/Corpus">000566</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000566</idno>
<idno type="wicri:Area/Pmc/Curation">000566</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000566</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001420</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001420</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Machine learning for regulatory analysis and transcription factor target prediction in yeast</title>
<author>
<name sortKey="Holloway, Dustin T" sort="Holloway, Dustin T" uniqKey="Holloway D" first="Dustin T." last="Holloway">Dustin T. Holloway</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff1">Molecular Biology Cell Biology and Biochemistry, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Molecular Biology Cell Biology and Biochemistry, Boston University, Boston</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kon, Mark" sort="Kon, Mark" uniqKey="Kon M" first="Mark" last="Kon">Mark Kon</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">Department of Mathematics and Statistics, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Department of Mathematics and Statistics, Boston University, Boston</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="Aff3">Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Bioinformatics and Systems Biology, Boston University, Boston</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Delisi, Charles" sort="Delisi, Charles" uniqKey="Delisi C" first="Charles" last="Delisi">Charles Delisi</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff3">Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:cityArea>Bioinformatics and Systems Biology, Boston University, Boston</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Systems and Synthetic Biology</title>
<idno type="ISSN">1872-5325</idno>
<idno type="eISSN">1872-5333</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps—the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104
<italic>Saccharomyces cerevisiae</italic>
regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying
<italic>k</italic>
-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.</p>
<sec>
<title>Electronic Supplementary Material</title>
<p>Supplementary material is available in the online version of this article at
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s11693-006-9003-3">http://dx.doi.org/10.1007/s11693-006-9003-3</ext-link>
and is accessible for authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Syst Synth Biol</journal-id>
<journal-title>Systems and Synthetic Biology</journal-title>
<issn pub-type="ppub">1872-5325</issn>
<issn pub-type="epub">1872-5333</issn>
<publisher>
<publisher-name>Kluwer Academic Publishers</publisher-name>
<publisher-loc>Dordrecht</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19003435</article-id>
<article-id pub-id-type="pmc">2533145</article-id>
<article-id pub-id-type="publisher-id">9003</article-id>
<article-id pub-id-type="doi">10.1007/s11693-006-9003-3</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Machine learning for regulatory analysis and transcription factor target prediction in yeast</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name name-style="western">
<surname>Holloway</surname>
<given-names>Dustin T.</given-names>
</name>
<address>
<email>dth128@bu.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name name-style="western">
<surname>Kon</surname>
<given-names>Mark</given-names>
</name>
<address>
<email>mkon@bu.edu</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name name-style="western">
<surname>DeLisi</surname>
<given-names>Charles</given-names>
</name>
<address>
<email>delisi@bu.edu</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
Molecular Biology Cell Biology and Biochemistry, Boston University, Boston, MA 02215 USA</aff>
<aff id="Aff2">
<label>2</label>
Department of Mathematics and Statistics, Boston University, Boston, MA 02215 USA</aff>
<aff id="Aff3">
<label>3</label>
Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>31</day>
<month>10</month>
<year>2006</year>
</pub-date>
<pub-date pub-type="ppub">
<month>3</month>
<year>2007</year>
</pub-date>
<volume>1</volume>
<issue>1</issue>
<fpage>25</fpage>
<lpage>46</lpage>
<permissions>
<copyright-statement>© Springer Science + Business Media B.V. 2006</copyright-statement>
</permissions>
<abstract>
<p>High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps—the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104
<italic>Saccharomyces cerevisiae</italic>
regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying
<italic>k</italic>
-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.</p>
<sec>
<title>Electronic Supplementary Material</title>
<p>Supplementary material is available in the online version of this article at
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s11693-006-9003-3">http://dx.doi.org/10.1007/s11693-006-9003-3</ext-link>
and is accessible for authorized users.</p>
</sec>
</abstract>
<kwd-group>
<title>Keywords</title>
<kwd>Transcription factor</kwd>
<kwd>SVM</kwd>
<kwd>Machine learning</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© Springer Science + Business Media B.V. 2007</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Massachusetts</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Holloway, Dustin T" sort="Holloway, Dustin T" uniqKey="Holloway D" first="Dustin T." last="Holloway">Dustin T. Holloway</name>
</region>
<name sortKey="Delisi, Charles" sort="Delisi, Charles" uniqKey="Delisi C" first="Charles" last="Delisi">Charles Delisi</name>
<name sortKey="Kon, Mark" sort="Kon, Mark" uniqKey="Kon M" first="Mark" last="Kon">Mark Kon</name>
<name sortKey="Kon, Mark" sort="Kon, Mark" uniqKey="Kon M" first="Mark" last="Kon">Mark Kon</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001420 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 001420 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:2533145
   |texte=   Machine learning for regulatory analysis and transcription factor target prediction in yeast
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:19003435" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021