Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.
Identifieur interne : 000183 ( Main/Curation ); précédent : 000182; suivant : 000184Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.
Auteurs : Zongyu Wang [République populaire de Chine] ; Wenying He [République populaire de Chine] ; Jijun Tang [République populaire de Chine] ; Fei Guo [République populaire de Chine]Source :
- Journal of chemical information and modeling [ 1549-960X ] ; 2020.
Abstract
Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.
DOI: 10.1021/acs.jcim.9b01012
PubMed: 31944107
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000287
- to stream PubMed, to step Curation: Pour aller vers cette notice dans l'étape Curation :000287
- to stream PubMed, to step Checkpoint: Pour aller vers cette notice dans l'étape Curation :000182
- to stream Ncbi, to step Merge: Pour aller vers cette notice dans l'étape Curation :002481
- to stream Ncbi, to step Curation: Pour aller vers cette notice dans l'étape Curation :002481
- to stream Ncbi, to step Checkpoint: Pour aller vers cette notice dans l'étape Curation :002481
- to stream Main, to step Merge: Pour aller vers cette notice dans l'étape Curation :000186
Links to Exploration step
pubmed:31944107Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</title>
<author><name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2020">2020</date>
<idno type="RBID">pubmed:31944107</idno>
<idno type="pmid">31944107</idno>
<idno type="doi">10.1021/acs.jcim.9b01012</idno>
<idno type="wicri:Area/PubMed/Corpus">000287</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000287</idno>
<idno type="wicri:Area/PubMed/Curation">000287</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000287</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000182</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000182</idno>
<idno type="wicri:Area/Ncbi/Merge">002481</idno>
<idno type="wicri:Area/Ncbi/Curation">002481</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">002481</idno>
<idno type="wicri:Area/Main/Merge">000186</idno>
<idno type="wicri:Area/Main/Curation">000183</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</title>
<author><name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<affiliation wicri:level="1"><nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName><settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of chemical information and modeling</title>
<idno type="eISSN">1549-960X</idno>
<imprint><date when="2020" type="published">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000183 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Curation/biblio.hfd -nk 000183 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Curation |type= RBID |clé= pubmed:31944107 |texte= Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Curation/RBID.i -Sk "pubmed:31944107" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Curation/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |