Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.

Identifieur interne : 000287 ( PubMed/Curation ); précédent : 000286; suivant : 000288

Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.

Auteurs : Zongyu Wang [République populaire de Chine] ; Wenying He [République populaire de Chine] ; Jijun Tang [République populaire de Chine] ; Fei Guo [République populaire de Chine]

Source :

RBID : pubmed:31944107

Abstract

Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.

DOI: 10.1021/acs.jcim.9b01012
PubMed: 31944107

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:31944107

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</title>
<author>
<name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2020">2020</date>
<idno type="RBID">pubmed:31944107</idno>
<idno type="pmid">31944107</idno>
<idno type="doi">10.1021/acs.jcim.9b01012</idno>
<idno type="wicri:Area/PubMed/Corpus">000287</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000287</idno>
<idno type="wicri:Area/PubMed/Curation">000287</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000287</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</title>
<author>
<name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of chemical information and modeling</title>
<idno type="eISSN">1549-960X</idno>
<imprint>
<date when="2020" type="published">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="In-Data-Review" Owner="NLM">
<PMID Version="1">31944107</PMID>
<DateRevised>
<Year>2020</Year>
<Month>03</Month>
<Day>23</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1549-960X</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>60</Volume>
<Issue>3</Issue>
<PubDate>
<Year>2020</Year>
<Month>Mar</Month>
<Day>23</Day>
</PubDate>
</JournalIssue>
<Title>Journal of chemical information and modeling</Title>
<ISOAbbreviation>J Chem Inf Model</ISOAbbreviation>
</Journal>
<ArticleTitle>Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</ArticleTitle>
<Pagination>
<MedlinePgn>1876-1883</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1021/acs.jcim.9b01012</ELocationID>
<Abstract>
<AbstractText>Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Wang</LastName>
<ForeName>Zongyu</ForeName>
<Initials>Z</Initials>
<AffiliationInfo>
<Affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>He</LastName>
<ForeName>Wenying</ForeName>
<Initials>W</Initials>
<AffiliationInfo>
<Affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Tang</LastName>
<ForeName>Jijun</ForeName>
<Initials>J</Initials>
<AffiliationInfo>
<Affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, P. R. China.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29208, United States.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Guo</LastName>
<ForeName>Fei</ForeName>
<Initials>F</Initials>
<Identifier Source="ORCID">http://orcid.org/0000-0003-2911-7643</Identifier>
<AffiliationInfo>
<Affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2020</Year>
<Month>01</Month>
<Day>28</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Chem Inf Model</MedlineTA>
<NlmUniqueID>101230060</NlmUniqueID>
<ISSNLinking>1549-9596</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2020</Year>
<Month>1</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2020</Year>
<Month>1</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2020</Year>
<Month>1</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">31944107</ArticleId>
<ArticleId IdType="doi">10.1021/acs.jcim.9b01012</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000287 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 000287 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:31944107
   |texte=   Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:31944107" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021