Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

Identifieur interne : 000948 ( Pmc/Curation ); précédent : 000947; suivant : 000949

Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

Auteurs : Juhani K H R [Finlande] ; Harri L Hdesm Ki [Finlande]

Source :

RBID : PMC:3750486

Abstract

Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.


Url:
DOI: 10.1186/1471-2105-14-S10-S2
PubMed: 24267147
PubMed Central: 3750486

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3750486

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Evaluating a linear
<bold>
<italic>k</italic>
</bold>
-mer model for protein-DNA interactions using high-throughput SELEX data</title>
<author>
<name sortKey="K H R, Juhani" sort="K H R, Juhani" uniqKey="K H R J" first="Juhani" last="K H R">Juhani K H R</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="L Hdesm Ki, Harri" sort="L Hdesm Ki, Harri" uniqKey="L Hdesm Ki H" first="Harri" last="L Hdesm Ki">Harri L Hdesm Ki</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Turku Centre for Biotechnology, Turku University, Turku, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Turku Centre for Biotechnology, Turku University, Turku</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24267147</idno>
<idno type="pmc">3750486</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750486</idno>
<idno type="RBID">PMC:3750486</idno>
<idno type="doi">10.1186/1471-2105-14-S10-S2</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000948</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000948</idno>
<idno type="wicri:Area/Pmc/Curation">000948</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000948</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Evaluating a linear
<bold>
<italic>k</italic>
</bold>
-mer model for protein-DNA interactions using high-throughput SELEX data</title>
<author>
<name sortKey="K H R, Juhani" sort="K H R, Juhani" uniqKey="K H R J" first="Juhani" last="K H R">Juhani K H R</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="L Hdesm Ki, Harri" sort="L Hdesm Ki, Harri" uniqKey="L Hdesm Ki H" first="Harri" last="L Hdesm Ki">Harri L Hdesm Ki</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Turku Centre for Biotechnology, Turku University, Turku, Finland</nlm:aff>
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Turku Centre for Biotechnology, Turku University, Turku</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear
<italic>k</italic>
-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative
<italic>k</italic>
-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of
<italic>k</italic>
-mers in the model and observed that the number of
<italic>k</italic>
-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of
<italic>k</italic>
-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the
<italic>k</italic>
-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for
<italic>k</italic>
-mer and PWMs, respectively. Finally, the
<italic>k</italic>
-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Wasserman, W" uniqKey="Wasserman W">W Wasserman</name>
</author>
<author>
<name sortKey="Sandelin, A" uniqKey="Sandelin A">A Sandelin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weirauch, M" uniqKey="Weirauch M">M Weirauch</name>
</author>
<author>
<name sortKey="Cote, A" uniqKey="Cote A">A Cote</name>
</author>
<author>
<name sortKey="Norel, R" uniqKey="Norel R">R Norel</name>
</author>
<author>
<name sortKey="Annala, M" uniqKey="Annala M">M Annala</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Riley, T" uniqKey="Riley T">T Riley</name>
</author>
<author>
<name sortKey="Saez Rodriguez, J" uniqKey="Saez Rodriguez J">J Saez-Rodriguez</name>
</author>
<author>
<name sortKey="Cokelaer, T" uniqKey="Cokelaer T">T Cokelaer</name>
</author>
<author>
<name sortKey="Vedenko, A" uniqKey="Vedenko A">A Vedenko</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
<author>
<name sortKey="Bussemaker, H" uniqKey="Bussemaker H">H Bussemaker</name>
</author>
<author>
<name sortKey="Morris, Q" uniqKey="Morris Q">Q Morris</name>
</author>
<author>
<name sortKey="Bulyk, M" uniqKey="Bulyk M">M Bulyk</name>
</author>
<author>
<name sortKey="Stolovitzky, G" uniqKey="Stolovitzky G">G Stolovitzky</name>
</author>
<author>
<name sortKey="Hughes, T" uniqKey="Hughes T">T Hughes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Staden, R" uniqKey="Staden R">R Staden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, G" uniqKey="Stormo G">G Stormo</name>
</author>
<author>
<name sortKey="Schneider, T" uniqKey="Schneider T">T Schneider</name>
</author>
<author>
<name sortKey="Gold, L" uniqKey="Gold L">L Gold</name>
</author>
<author>
<name sortKey="Ehrenfeucht, A" uniqKey="Ehrenfeucht A">A Ehrenfeucht</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, T" uniqKey="Bailey T">T Bailey</name>
</author>
<author>
<name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lawrence, C" uniqKey="Lawrence C">C Lawrence</name>
</author>
<author>
<name sortKey="Altschul, S" uniqKey="Altschul S">S Altschul</name>
</author>
<author>
<name sortKey="Boguski, M" uniqKey="Boguski M">M Boguski</name>
</author>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
<author>
<name sortKey="Neuwald, A" uniqKey="Neuwald A">A Neuwald</name>
</author>
<author>
<name sortKey="Wootton, J" uniqKey="Wootton J">J Wootton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wingender, E" uniqKey="Wingender E">E Wingender</name>
</author>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
<author>
<name sortKey="Hehl, R" uniqKey="Hehl R">R Hehl</name>
</author>
<author>
<name sortKey="Karas, H" uniqKey="Karas H">H Karas</name>
</author>
<author>
<name sortKey="Liebich, I" uniqKey="Liebich I">I Liebich</name>
</author>
<author>
<name sortKey="Matys, V" uniqKey="Matys V">V Matys</name>
</author>
<author>
<name sortKey="Meinhardt, T" uniqKey="Meinhardt T">T Meinhardt</name>
</author>
<author>
<name sortKey="Pruss, M" uniqKey="Pruss M">M Prüss</name>
</author>
<author>
<name sortKey="Reuter, I" uniqKey="Reuter I">I Reuter</name>
</author>
<author>
<name sortKey="Schacherer, F" uniqKey="Schacherer F">F Schacherer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sandelin, A" uniqKey="Sandelin A">A Sandelin</name>
</author>
<author>
<name sortKey="Alkema, W" uniqKey="Alkema W">W Alkema</name>
</author>
<author>
<name sortKey="Engstrom, P" uniqKey="Engstrom P">P Engström</name>
</author>
<author>
<name sortKey="Wasserman, W" uniqKey="Wasserman W">W Wasserman</name>
</author>
<author>
<name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Annala, M" uniqKey="Annala M">M Annala</name>
</author>
<author>
<name sortKey="Laurila, K" uniqKey="Laurila K">K Laurila</name>
</author>
<author>
<name sortKey="L Hdesm Ki, H" uniqKey="L Hdesm Ki H">H Lähdesmäki</name>
</author>
<author>
<name sortKey="Nykter, M" uniqKey="Nykter M">M Nykter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jolma, A" uniqKey="Jolma A">A Jolma</name>
</author>
<author>
<name sortKey="Kivioja, T" uniqKey="Kivioja T">T Kivioja</name>
</author>
<author>
<name sortKey="Toivonen, J" uniqKey="Toivonen J">J Toivonen</name>
</author>
<author>
<name sortKey="Cheng, L" uniqKey="Cheng L">L Cheng</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
<author>
<name sortKey="Enge, M" uniqKey="Enge M">M Enge</name>
</author>
<author>
<name sortKey="Taipale, M" uniqKey="Taipale M">M Taipale</name>
</author>
<author>
<name sortKey="Vaquerizas, J" uniqKey="Vaquerizas J">J Vaquerizas</name>
</author>
<author>
<name sortKey="Yan, J" uniqKey="Yan J">J Yan</name>
</author>
<author>
<name sortKey="Sillanp, M" uniqKey="Sillanp M">M Sillanpää</name>
</author>
<author>
<name sortKey="Bonke, M" uniqKey="Bonke M">M Bonke</name>
</author>
<author>
<name sortKey="Palin, K" uniqKey="Palin K">K Palin</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
<author>
<name sortKey="Hughes, T" uniqKey="Hughes T">T Hughes</name>
</author>
<author>
<name sortKey="Luscombe, N" uniqKey="Luscombe N">N Luscombe</name>
</author>
<author>
<name sortKey="Ukkonen, E" uniqKey="Ukkonen E">E Ukkonen</name>
</author>
<author>
<name sortKey="Taipale, J" uniqKey="Taipale J">J Taipale</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ambroise, C" uniqKey="Ambroise C">C Ambroise</name>
</author>
<author>
<name sortKey="Mclachlan, G" uniqKey="Mclachlan G">G McLachlan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gengsheng, Q" uniqKey="Gengsheng Q">Q Gengsheng</name>
</author>
<author>
<name sortKey="Hotilovac, L" uniqKey="Hotilovac L">L Hotilovac</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24267147</article-id>
<article-id pub-id-type="pmc">3750486</article-id>
<article-id pub-id-type="publisher-id">1471-2105-14-S10-S2</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-14-S10-S2</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Evaluating a linear
<bold>
<italic>k</italic>
</bold>
-mer model for protein-DNA interactions using high-throughput SELEX data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" id="A1">
<name>
<surname>Kähärä</surname>
<given-names>Juhani</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>juhani.kahara@aalto.fi</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Lähdesmäki</surname>
<given-names>Harri</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>harri.lahdesmaki@aalto.fi</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Information and Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland</aff>
<aff id="I2">
<label>2</label>
Turku Centre for Biotechnology, Turku University, Turku, Finland</aff>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>12</day>
<month>8</month>
<year>2013</year>
</pub-date>
<volume>14</volume>
<issue>Suppl 10</issue>
<supplement>
<named-content content-type="supplement-title">Selected articles from the 10th International Workshop on Computational Systems Biology (WCSB) 2013: Bioinformatics</named-content>
<named-content content-type="supplement-editor">Reija Autio, Ilya Shmulevich, Korbinian Strimmer and Carsten Wiuf</named-content>
<named-content content-type="supplement-sponsor">Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. Articles have undergone the journal's standard peer-review process for supplements. The Supplement Editors declare that they have no competing interests.</named-content>
</supplement>
<fpage>S2</fpage>
<lpage>S2</lpage>
<permissions>
<copyright-statement>Copyright © 2013 Kähärä and Lähdesmäki; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Kähärä and Lähdesmäki; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/14/S10/S2"></self-uri>
<abstract>
<p>Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear
<italic>k</italic>
-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative
<italic>k</italic>
-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of
<italic>k</italic>
-mers in the model and observed that the number of
<italic>k</italic>
-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of
<italic>k</italic>
-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the
<italic>k</italic>
-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for
<italic>k</italic>
-mer and PWMs, respectively. Finally, the
<italic>k</italic>
-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.</p>
</abstract>
<conference>
<conf-date>10-12 June 2013</conf-date>
<conf-name>10th International Workshop on Computational Systems Biology</conf-name>
<conf-loc>Tampere, Finland</conf-loc>
</conference>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000948 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000948 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:3750486
   |texte=   Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:24267147" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021