Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data

Identifieur interne : 000F54 ( Pmc/Corpus ); précédent : 000F53; suivant : 000F55

A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data

Auteurs : Yaron Orenstein ; Ron Shamir

Source :

RBID : PMC:4005680

Abstract

Understanding gene regulation is a key challenge in today's biology. The new technologies of protein-binding microarrays (PBMs) and high-throughput SELEX (HT-SELEX) allow measurement of the binding intensities of one transcription factor (TF) to numerous synthetic double-stranded DNA sequences in a single experiment. Recently, Jolma et al. reported the results of 547 HT-SELEX experiments covering human and mouse TFs. Because 162 of these TFs were also covered by PBM technology, for the first time, a large-scale comparison between implementations of these two in vitro technologies is possible. Here we assessed the similarities and differences between binding models, represented as position weight matrices, inferred from PBM and HT-SELEX, and also measured how well these models predict in vivo binding. Our results show that HT-SELEX- and PBM-derived models agree for most TFs. For some TFs, the HT-SELEX-derived models are longer versions of the PBM-derived models, whereas for other TFs, the HT-SELEX models match the secondary PBM-derived models. Remarkably, PBM-based 8-mer ranking is more accurate than that of HT-SELEX, but models derived from HT-SELEX predict in vivo binding better. In addition, we reveal several biases in HT-SELEX data including nucleotide frequency bias, enrichment of C-rich k-mers and oligos and underrepresentation of palindromes.


Url:
DOI: 10.1093/nar/gku117
PubMed: 24500199
PubMed Central: 4005680

Links to Exploration step

PMC:4005680

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data</title>
<author>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
</author>
<author>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24500199</idno>
<idno type="pmc">4005680</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005680</idno>
<idno type="RBID">PMC:4005680</idno>
<idno type="doi">10.1093/nar/gku117</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000F54</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F54</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data</title>
<author>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
</author>
<author>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Understanding gene regulation is a key challenge in today's biology. The new technologies of protein-binding microarrays (PBMs) and high-throughput SELEX (HT-SELEX) allow measurement of the binding intensities of one transcription factor (TF) to numerous synthetic double-stranded DNA sequences in a single experiment. Recently, Jolma
<italic>et al.</italic>
reported the results of 547 HT-SELEX experiments covering human and mouse TFs. Because 162 of these TFs were also covered by PBM technology, for the first time, a large-scale comparison between implementations of these two
<italic>in vitro</italic>
technologies is possible. Here we assessed the similarities and differences between binding models, represented as position weight matrices, inferred from PBM and HT-SELEX, and also measured how well these models predict
<italic>in vivo</italic>
binding. Our results show that HT-SELEX- and PBM-derived models agree for most TFs. For some TFs, the HT-SELEX-derived models are longer versions of the PBM-derived models, whereas for other TFs, the HT-SELEX models match the secondary PBM-derived models. Remarkably, PBM-based 8-mer ranking is more accurate than that of HT-SELEX, but models derived from HT-SELEX predict
<italic>in vivo</italic>
binding better. In addition, we reveal several biases in HT-SELEX data including nucleotide frequency bias, enrichment of C-rich k-mers and oligos and underrepresentation of palindromes.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Aparicio, O" uniqKey="Aparicio O">O Aparicio</name>
</author>
<author>
<name sortKey="Geisberg, Jv" uniqKey="Geisberg J">JV Geisberg</name>
</author>
<author>
<name sortKey="Struhl, K" uniqKey="Struhl K">K Struhl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rhee, Hs" uniqKey="Rhee H">HS Rhee</name>
</author>
<author>
<name sortKey="Pugh, Bf" uniqKey="Pugh B">BF Pugh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Philippakis, Aa" uniqKey="Philippakis A">AA Philippakis</name>
</author>
<author>
<name sortKey="Qureshi, Am" uniqKey="Qureshi A">AM Qureshi</name>
</author>
<author>
<name sortKey="He, Fs" uniqKey="He F">FS He</name>
</author>
<author>
<name sortKey="Estep, Pw" uniqKey="Estep P">PW Estep</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fordyce, Pm" uniqKey="Fordyce P">PM Fordyce</name>
</author>
<author>
<name sortKey="Gerber, D" uniqKey="Gerber D">D Gerber</name>
</author>
<author>
<name sortKey="Tran, D" uniqKey="Tran D">D Tran</name>
</author>
<author>
<name sortKey="Zheng, J" uniqKey="Zheng J">J Zheng</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Derisi, Jl" uniqKey="Derisi J">JL DeRisi</name>
</author>
<author>
<name sortKey="Quake, Sr" uniqKey="Quake S">SR Quake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jolma, A" uniqKey="Jolma A">A Jolma</name>
</author>
<author>
<name sortKey="Kivioja, T" uniqKey="Kivioja T">T Kivioja</name>
</author>
<author>
<name sortKey="Toivonen, J" uniqKey="Toivonen J">J Toivonen</name>
</author>
<author>
<name sortKey="Cheng, L" uniqKey="Cheng L">L Cheng</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
<author>
<name sortKey="Enge, M" uniqKey="Enge M">M Enge</name>
</author>
<author>
<name sortKey="Taipale, M" uniqKey="Taipale M">M Taipale</name>
</author>
<author>
<name sortKey="Vaquerizas, Jm" uniqKey="Vaquerizas J">JM Vaquerizas</name>
</author>
<author>
<name sortKey="Yan, J" uniqKey="Yan J">J Yan</name>
</author>
<author>
<name sortKey="Sillanpaa, Mj" uniqKey="Sillanpaa M">MJ Sillanpaa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Slattery, M" uniqKey="Slattery M">M Slattery</name>
</author>
<author>
<name sortKey="Riley, T" uniqKey="Riley T">T Riley</name>
</author>
<author>
<name sortKey="Liu, P" uniqKey="Liu P">P Liu</name>
</author>
<author>
<name sortKey="Abe, N" uniqKey="Abe N">N Abe</name>
</author>
<author>
<name sortKey="Gomez Alcala, P" uniqKey="Gomez Alcala P">P Gomez-Alcala</name>
</author>
<author>
<name sortKey="Dror, I" uniqKey="Dror I">I Dror</name>
</author>
<author>
<name sortKey="Zhou, T" uniqKey="Zhou T">T Zhou</name>
</author>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
<author>
<name sortKey="Honig, B" uniqKey="Honig B">B Honig</name>
</author>
<author>
<name sortKey="Bussemaker, Hj" uniqKey="Bussemaker H">HJ Bussemaker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Granas, D" uniqKey="Granas D">D Granas</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robasky, K" uniqKey="Robasky K">K Robasky</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y Orenstein</name>
</author>
<author>
<name sortKey="Linhart, C" uniqKey="Linhart C">C Linhart</name>
</author>
<author>
<name sortKey="Shamir, R" uniqKey="Shamir R">R Shamir</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weirauch, Mt" uniqKey="Weirauch M">MT Weirauch</name>
</author>
<author>
<name sortKey="Cote, A" uniqKey="Cote A">A Cote</name>
</author>
<author>
<name sortKey="Norel, R" uniqKey="Norel R">R Norel</name>
</author>
<author>
<name sortKey="Annala, M" uniqKey="Annala M">M Annala</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Riley, Tr" uniqKey="Riley T">TR Riley</name>
</author>
<author>
<name sortKey="Saez Rodriguez, J" uniqKey="Saez Rodriguez J">J Saez-Rodriguez</name>
</author>
<author>
<name sortKey="Cokelaer, T" uniqKey="Cokelaer T">T Cokelaer</name>
</author>
<author>
<name sortKey="Vedenko, A" uniqKey="Vedenko A">A Vedenko</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jolma, A" uniqKey="Jolma A">A Jolma</name>
</author>
<author>
<name sortKey="Yan, J" uniqKey="Yan J">J Yan</name>
</author>
<author>
<name sortKey="Whitington, T" uniqKey="Whitington T">T Whitington</name>
</author>
<author>
<name sortKey="Toivonen, J" uniqKey="Toivonen J">J Toivonen</name>
</author>
<author>
<name sortKey="Nitta, Kr" uniqKey="Nitta K">KR Nitta</name>
</author>
<author>
<name sortKey="Rastas, P" uniqKey="Rastas P">P Rastas</name>
</author>
<author>
<name sortKey="Morgunova, E" uniqKey="Morgunova E">E Morgunova</name>
</author>
<author>
<name sortKey="Enge, M" uniqKey="Enge M">M Enge</name>
</author>
<author>
<name sortKey="Taipale, M" uniqKey="Taipale M">M Taipale</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Landt, Sg" uniqKey="Landt S">SG Landt</name>
</author>
<author>
<name sortKey="Marinov, Gk" uniqKey="Marinov G">GK Marinov</name>
</author>
<author>
<name sortKey="Kundaje, A" uniqKey="Kundaje A">A Kundaje</name>
</author>
<author>
<name sortKey="Kheradpour, P" uniqKey="Kheradpour P">P Kheradpour</name>
</author>
<author>
<name sortKey="Pauli, F" uniqKey="Pauli F">F Pauli</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
<author>
<name sortKey="Bernstein, Be" uniqKey="Bernstein B">BE Bernstein</name>
</author>
<author>
<name sortKey="Bickel, P" uniqKey="Bickel P">P Bickel</name>
</author>
<author>
<name sortKey="Brown, Jb" uniqKey="Brown J">JB Brown</name>
</author>
<author>
<name sortKey="Cayting, P" uniqKey="Cayting P">P Cayting</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
<author>
<name sortKey="Morris, Q" uniqKey="Morris Q">Q Morris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Philippakis, Aa" uniqKey="Philippakis A">AA Philippakis</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
<author>
<name sortKey="Gehrke, Ar" uniqKey="Gehrke A">AR Gehrke</name>
</author>
<author>
<name sortKey="Jaeger, Sa" uniqKey="Jaeger S">SA Jaeger</name>
</author>
<author>
<name sortKey="Chan, Et" uniqKey="Chan E">ET Chan</name>
</author>
<author>
<name sortKey="Metzler, G" uniqKey="Metzler G">G Metzler</name>
</author>
<author>
<name sortKey="Vedenko, A" uniqKey="Vedenko A">A Vedenko</name>
</author>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y Orenstein</name>
</author>
<author>
<name sortKey="Mick, E" uniqKey="Mick E">E Mick</name>
</author>
<author>
<name sortKey="Shamir, R" uniqKey="Shamir R">R Shamir</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Gehrke, Ar" uniqKey="Gehrke A">AR Gehrke</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
<author>
<name sortKey="Philippakis, Aa" uniqKey="Philippakis A">AA Philippakis</name>
</author>
<author>
<name sortKey="Pena Castillo, L" uniqKey="Pena Castillo L">L Pena-Castillo</name>
</author>
<author>
<name sortKey="Alleyne, Tm" uniqKey="Alleyne T">TM Alleyne</name>
</author>
<author>
<name sortKey="Mnaimneh, S" uniqKey="Mnaimneh S">S Mnaimneh</name>
</author>
<author>
<name sortKey="Botvinnik, Ob" uniqKey="Botvinnik O">OB Botvinnik</name>
</author>
<author>
<name sortKey="Chan, Et" uniqKey="Chan E">ET Chan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wei, Gh" uniqKey="Wei G">GH Wei</name>
</author>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Kivioja, T" uniqKey="Kivioja T">T Kivioja</name>
</author>
<author>
<name sortKey="Palin, K" uniqKey="Palin K">K Palin</name>
</author>
<author>
<name sortKey="Enge, M" uniqKey="Enge M">M Enge</name>
</author>
<author>
<name sortKey="Bonke, M" uniqKey="Bonke M">M Bonke</name>
</author>
<author>
<name sortKey="Jolma, A" uniqKey="Jolma A">A Jolma</name>
</author>
<author>
<name sortKey="Varjosalo, M" uniqKey="Varjosalo M">M Varjosalo</name>
</author>
<author>
<name sortKey="Gehrke, Ar" uniqKey="Gehrke A">AR Gehrke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
<author>
<name sortKey="Jin, X" uniqKey="Jin X">X Jin</name>
</author>
<author>
<name sortKey="West, Sm" uniqKey="West S">SM West</name>
</author>
<author>
<name sortKey="Joshi, R" uniqKey="Joshi R">R Joshi</name>
</author>
<author>
<name sortKey="Honig, B" uniqKey="Honig B">B Honig</name>
</author>
<author>
<name sortKey="Mann, Rs" uniqKey="Mann R">RS Mann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Machanick, P" uniqKey="Machanick P">P Machanick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Park, Pj" uniqKey="Park P">PJ Park</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aird, D" uniqKey="Aird D">D Aird</name>
</author>
<author>
<name sortKey="Ross, Mg" uniqKey="Ross M">MG Ross</name>
</author>
<author>
<name sortKey="Chen, Ws" uniqKey="Chen W">WS Chen</name>
</author>
<author>
<name sortKey="Danielsson, M" uniqKey="Danielsson M">M Danielsson</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Russ, C" uniqKey="Russ C">C Russ</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hansen, Kd" uniqKey="Hansen K">KD Hansen</name>
</author>
<author>
<name sortKey="Brenner, Se" uniqKey="Brenner S">SE Brenner</name>
</author>
<author>
<name sortKey="Dudoit, S" uniqKey="Dudoit S">S Dudoit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
<author>
<name sortKey="Zhao, K" uniqKey="Zhao K">K Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiang, B" uniqKey="Jiang B">B Jiang</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Afek, A" uniqKey="Afek A">A Afek</name>
</author>
<author>
<name sortKey="Lukatsky, Db" uniqKey="Lukatsky D">DB Lukatsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klug, Sj" uniqKey="Klug S">SJ Klug</name>
</author>
<author>
<name sortKey="Famulok, M" uniqKey="Famulok M">M Famulok</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gordan, R" uniqKey="Gordan R">R Gordan</name>
</author>
<author>
<name sortKey="Shen, N" uniqKey="Shen N">N Shen</name>
</author>
<author>
<name sortKey="Dror, I" uniqKey="Dror I">I Dror</name>
</author>
<author>
<name sortKey="Zhou, T" uniqKey="Zhou T">T Zhou</name>
</author>
<author>
<name sortKey="Horton, J" uniqKey="Horton J">J Horton</name>
</author>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, L" uniqKey="Yang L">L Yang</name>
</author>
<author>
<name sortKey="Zhou, T" uniqKey="Zhou T">T Zhou</name>
</author>
<author>
<name sortKey="Dror, I" uniqKey="Dror I">I Dror</name>
</author>
<author>
<name sortKey="Mathelier, A" uniqKey="Mathelier A">A Mathelier</name>
</author>
<author>
<name sortKey="Wasserman, Ww" uniqKey="Wasserman W">WW Wasserman</name>
</author>
<author>
<name sortKey="Gordan, R" uniqKey="Gordan R">R Gordan</name>
</author>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-id journal-id-type="hwp">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24500199</article-id>
<article-id pub-id-type="pmc">4005680</article-id>
<article-id pub-id-type="doi">10.1093/nar/gku117</article-id>
<article-id pub-id-type="publisher-id">gku117</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods Online</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Orenstein</surname>
<given-names>Yaron</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shamir</surname>
<given-names>Ron</given-names>
</name>
<xref ref-type="corresp" rid="gku117-COR1">*</xref>
</contrib>
<aff>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel</aff>
</contrib-group>
<author-notes>
<corresp id="gku117-COR1">*To whom correspondence should be addressed. Tel:
<phone>+972 3 6405383</phone>
; Fax:
<fax>+972 3 640 5384</fax>
; Email:
<email>rshamir@tau.ac.il</email>
</corresp>
<fn>
<p>Present address: Ron Shamir, School of Computer Science, Tel-Aviv University, P.O.B. 39040, Tel-Aviv 69978, Israel.</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<month>4</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>5</day>
<month>2</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>5</day>
<month>2</month>
<year>2014</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>42</volume>
<issue>8</issue>
<fpage>e63</fpage>
<lpage>e63</lpage>
<history>
<date date-type="received">
<day>11</day>
<month>11</month>
<year>2013</year>
</date>
<date date-type="rev-recd">
<day>22</day>
<month>12</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>1</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2014. Published by Oxford University Press.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by/3.0/">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Understanding gene regulation is a key challenge in today's biology. The new technologies of protein-binding microarrays (PBMs) and high-throughput SELEX (HT-SELEX) allow measurement of the binding intensities of one transcription factor (TF) to numerous synthetic double-stranded DNA sequences in a single experiment. Recently, Jolma
<italic>et al.</italic>
reported the results of 547 HT-SELEX experiments covering human and mouse TFs. Because 162 of these TFs were also covered by PBM technology, for the first time, a large-scale comparison between implementations of these two
<italic>in vitro</italic>
technologies is possible. Here we assessed the similarities and differences between binding models, represented as position weight matrices, inferred from PBM and HT-SELEX, and also measured how well these models predict
<italic>in vivo</italic>
binding. Our results show that HT-SELEX- and PBM-derived models agree for most TFs. For some TFs, the HT-SELEX-derived models are longer versions of the PBM-derived models, whereas for other TFs, the HT-SELEX models match the secondary PBM-derived models. Remarkably, PBM-based 8-mer ranking is more accurate than that of HT-SELEX, but models derived from HT-SELEX predict
<italic>in vivo</italic>
binding better. In addition, we reveal several biases in HT-SELEX data including nucleotide frequency bias, enrichment of C-rich k-mers and oligos and underrepresentation of palindromes.</p>
</abstract>
<counts>
<page-count count="10"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>INTRODUCTION</title>
<p>The questions of how, when and where genes are expressed have been fundamental in the field of cell research in the past decades. Transcription factors (TFs) are known to be the main regulators of gene transcription and thus have been a subject for extensive study. These proteins bind to specific short DNA sequence, mainly in the promoter and enhancer regions, and by that impede or encourage transcription. They bind with variable affinity, depending on the sequence and on other factors, and this affinity affects transcription. Learning and modeling the binding preferences of TFs is a central goal in gene regulation research.</p>
<p>Many high-throughput technologies have been developed to study TF binding. Technologies that measure
<italic>in vivo</italic>
binding include ChIP-chip (
<xref rid="gku117-B1" ref-type="bibr">1</xref>
), ChIP-seq (
<xref rid="gku117-B2" ref-type="bibr">2</xref>
) and the recently developed ChIP-exo (
<xref rid="gku117-B3" ref-type="bibr">3</xref>
). However, measuring
<italic>in vivo</italic>
binding may not reveal the full picture. First, the accessible sites may not cover the full spectrum of possible DNA k-mers. Second,
<italic>in vivo</italic>
binding is affected by additional factors, such as chromatin structure, nucleosome positioning and co-factors. As opposed to
<italic>in vivo</italic>
binding,
<italic>in vitro</italic>
binding is purely because of direct TF–DNA interaction (or cooperative binding of specific factors) and allows sampling of the full spectrum of DNA k-mers. Technologies that measure
<italic>in vitro</italic>
binding include protein-binding microarray (PBM) (
<xref rid="gku117-B4" ref-type="bibr">4</xref>
) and mechanically induced trapping of molecular interactions (
<xref rid="gku117-B5" ref-type="bibr">5</xref>
), both of which measure the binding of a specific protein to a set of oligo sequences designed to cover all k-mers. A newer technology is high-throughput SELEX (HT-SELEX), which consists of several cycles of incubating the DNA-binding protein with a mixture of DNA sequences, enrichment of the bound DNA sequences, sequencing a sample of them and feeding them to the next cycle (
<xref rid="gku117-B6" ref-type="bibr">6–8</xref>
).</p>
<p>PBMs have gained great popularity, thanks to their high-throughput and unbiased nature. The public database UniPROBE contains experiments of >400 TFs (
<xref rid="gku117-B9" ref-type="bibr">9</xref>
). Although the models derived from this technology have been used extensively, it is unclear how accurate these models are in predicting
<italic>in vivo</italic>
binding. Several studies have shown that using these positional weight matrix (PWM) models to predict
<italic>in vivo</italic>
binding leads to poorer results compared with
<italic>in vitro</italic>
binding prediction (
<xref rid="gku117-B10" ref-type="bibr">10</xref>
,
<xref rid="gku117-B11" ref-type="bibr">11</xref>
). This performance gap can be explained by several reasons related to
<italic>in vivo</italic>
binding, such as indirect binding and inaccessibility of genomic DNA. Another possible explanation is that these models include PBM-specific biases. Thus, an independent
<italic>in vitro</italic>
measurement is required to evaluate the validity of these models.</p>
<p>Recently, a study covering >500 TFs in >800 HT-SELEX experiments was conducted by the Taipale laboratory (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
). For the first time, a high number of TFs have available experimental data in two independent
<italic>in vitro</italic>
technologies: 162 TFs were tested both in HT-SELEX and PBM experiments by the Taipale and Bluyk laboratories, respectively. Jolma
<italic>et al.</italic>
(
<xref rid="gku117-B12" ref-type="bibr">12</xref>
) compared SELEX models with PBM models by length and presented several examples where the SELEX models are more accurate than PBM models based on ChIP-seq data. However, a much broader systematic comparison of the binding models produced by each technology is required.</p>
<p>In this study we aim to analyze and measure the similarities and differences between the two technologies. First, we ask how well HT-SELEX-derived PWM models predict PBM binding. Second, to compare the methods without depending on inferred binding models, we study how well the top k-mers of the two technologies correlate, and which technology is better in k-mer ranking. Third, we test which technology produces better models in predicting
<italic>in vivo</italic>
binding. Fourth, we uncover biases in HT-SELEX technology. We aim to highlight the advantages of each technology compared with the other. Our observations may help in developing a new method to learn binding models based on HT-SELEX data.</p>
</sec>
<sec sec-type="materials|methods">
<title>MATERIALS AND METHODS</title>
<sec>
<title>Data</title>
<p>PBM data and PBM-derived PWM models were downloaded from UniPROBE database (
<xref rid="gku117-B9" ref-type="bibr">9</xref>
). We used normalized PBM probe data, as available in the database (i.e. the median signal intensity values and corresponding nucleotide probe sequences). Only the 36 bp of unique sequence were used. HT-SELEX experimental data and HT-SELEX-PWM models were downloaded from (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
). Human ChIP-seq data were downloaded from ENCODE (
<xref rid="gku117-B13" ref-type="bibr">13</xref>
).</p>
</sec>
<sec>
<title>Binding prediction</title>
<p>PWMs were used to represent TF binding preferences (
<xref rid="gku117-B14" ref-type="bibr">14</xref>
). For each TF, the set of PWMs reported was used for the binding prediction. In many cases, multiple models were available. In general, we did not distinguish between mouse and human and between the full protein and the binding domain only. For each sequence (either PBM probe or a ChIP-seq peak), the maximum sum occupancy score over the set of PWMs was its predicted binding intensity. For probe sequence s and PWM Θ of length k, the sum occupancy score is
<disp-formula>
<graphic xlink:href="gku117um1.jpg" position="float"></graphic>
</disp-formula>
where Θ
<sub>i</sub>
(x) is the probability of base x in position i of the PWM. A PBM probe is defined as a positive hit for Θ if its binding intensity is greater than the median by at least 4 * (MAD/0.6745), where MAD is the median absolute deviation from the binding intensity median (MAD = 0.6745 for the normal distribution N(0,1)) (
<xref rid="gku117-B15" ref-type="bibr">15</xref>
). The positive ChIP-seq peaks are defined as the 500 peaks with the smallest reported
<italic>P</italic>
-value. We used the 250 bp around the center of the peak as the positive sequence and the 250-bp-long genomic sequence 300 bp downstream of the peak center as the negative sequence. Spearman rank coefficient, sensitivity at 1% false-positive and area under the receiver operating characteristic curve were used to gauge the binding prediction (see (
<xref rid="gku117-B15" ref-type="bibr">15</xref>
) for details). For ChIP-seq data, when several experiments were available for the same TF, the average area under curve (AUC) over these experiments is reported.</p>
</sec>
<sec>
<title>Model independent comparison</title>
<p>For each experiment, the scores of the top 100 8-mers according to one technique were compared with their scores in the other technique. PBM 8-mers were scored by average (or median) binding intensity. For a probe
<italic>p
<sub>i</sub>
</italic>
<sub>,</sub>
let
<italic>s(p
<sub>i</sub>
)</italic>
be its intensity. The score of 8-mer
<italic>w</italic>
is the average binding intensity:
<inline-formula>
<inline-graphic xlink:href="gku117i1.jpg"></inline-graphic>
</inline-formula>
.</p>
<p>HT-SELEX 8-mers were scored by either their frequency or ratio of frequencies (frequency in cycle i divided by frequency in cycle i-1). The top 100 8-mers according to their PBM scores were selected, and Pearson correlation was calculated between the PBM scores and the HT-SELEX scores on these 8-mers. Similarly, the top 100 HT-SELEX 8-mers were chosen and their HT-SELEX scores were compared with their PBM scores using Pearson correlation.</p>
</sec>
<sec>
<title>Logo drawing</title>
<p>Motif logos were plotted using
<ext-link ext-link-type="uri" xlink:href="http://demo.tinyray.com/weblogo">http://demo.tinyray.com/weblogo</ext-link>
.</p>
</sec>
</sec>
<sec sec-type="results">
<title>RESULTS</title>
<sec>
<title>HT-SELEX-derived models predict PBM binding accurately for most TFs</title>
<p>We first used the HT-SELEX-derived PWM models published in (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
) to predict bound probes in PBM experiments and compared their performance with PBM-derived PWM models. We used the SCI09 data set of (
<xref rid="gku117-B16" ref-type="bibr">16</xref>
), which includes 115 paired PBM experiments of 104 mouse TFs [in paired experiments, two array designs are used to study the same TF, and so a model learned on one array can be evaluated on the other, see (
<xref rid="gku117-B15" ref-type="bibr">15</xref>
)]. For 128 PBM experiments (out of 230), an HT-SELEX-derived model was available for the same TF; this set covers 56 different TFs. For some TFs, Jolma
<italic>et al.</italic>
reported several PWMs, either because of multiple experiments or because of construction of several PWMs by their algorithm. Occasionally, for a TF analyzed by PBM, both a primary motif and a secondary motif are reported. When multiple PWMs were reported for the same TF by one technology, we assigned to each sequence the highest score obtained by such a model. We used five algorithms to generate PWMs from PBM experiments: Amadeus-PBM (
<xref rid="gku117-B10" ref-type="bibr">10</xref>
), Seed-and-Wobble (
<xref rid="gku117-B4" ref-type="bibr">4</xref>
), RankMotif++ (
<xref rid="gku117-B15" ref-type="bibr">15</xref>
), BEEML-PBM (
<xref rid="gku117-B17" ref-type="bibr">17</xref>
) and RAP (
<xref rid="gku117-B18" ref-type="bibr">18</xref>
). The performance of the models generated by each algorithm was reported in (
<xref rid="gku117-B18" ref-type="bibr">18</xref>
). For each paired experiment, these models were learned on one array and tested on the other to avoid overfitting. Testing of a model was by predicting the binding intensity for each probe in the other array and comparing it with the measured binding intensity. Scores for the comparison were the Spearman rank coefficient on the positive probes, the sensitivity (true positive ratio) at 1% false-positive and AUC of the receiver operating characteristic curve (see Methods). We report the average results in
<xref ref-type="table" rid="gku117-T1">Table 1</xref>
(for complete results see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Table S1</ext-link>
).
<table-wrap id="gku117-T1" position="float">
<label>Table 1.</label>
<caption>
<p>Accuracy of HT-SELEX- and PBM-based PWM models in predicting TF binding to PBMs</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Model based on</th>
<th rowspan="1" colspan="1">HT-SELEX</th>
<th colspan="5" align="center" rowspan="1">PBM
<hr></hr>
</th>
</tr>
<tr>
<th rowspan="1" colspan="1">Algorithm</th>
<th rowspan="1" colspan="1">Jolma
<italic>et al.</italic>
</th>
<th rowspan="1" colspan="1">Amadeus-PBM</th>
<th rowspan="1" colspan="1">Seed-and-Wobble</th>
<th rowspan="1" colspan="1">RankMotif++</th>
<th rowspan="1" colspan="1">BEEML-PBM</th>
<th rowspan="1" colspan="1">RAP</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Spearman rank coefficient</td>
<td rowspan="1" colspan="1">0.282</td>
<td rowspan="1" colspan="1">0.230</td>
<td rowspan="1" colspan="1">0.272</td>
<td rowspan="1" colspan="1">0.301</td>
<td rowspan="1" colspan="1">0.335</td>
<td rowspan="1" colspan="1">0.339</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sensitivity at 1% false-positive</td>
<td rowspan="1" colspan="1">0.288</td>
<td rowspan="1" colspan="1">0.327</td>
<td rowspan="1" colspan="1">0.293</td>
<td rowspan="1" colspan="1">0.277</td>
<td rowspan="1" colspan="1">0.403</td>
<td rowspan="1" colspan="1">0.400</td>
</tr>
<tr>
<td rowspan="1" colspan="1">AUC</td>
<td rowspan="1" colspan="1">0.825</td>
<td rowspan="1" colspan="1">0.877</td>
<td rowspan="1" colspan="1">0.872</td>
<td rowspan="1" colspan="1">0.882</td>
<td rowspan="1" colspan="1">0.899</td>
<td rowspan="1" colspan="1">0.898</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gku117-TF1">
<p>
<italic>Note</italic>
. Results show average Spearman rank coefficient, sensitivity at 1% false-positive and AUC for predicting positive binding in 128 paired PBM experiments (covering 56 different TFs). PBM data were taken from (
<xref rid="gku117-B16" ref-type="bibr">16</xref>
) and HT-SELEX models were taken from (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
). Prediction results for the different PBM-based algorithms were taken from (
<xref rid="gku117-B18" ref-type="bibr">18</xref>
). For each experiment the PWM models learned by HT-SELEX or by the other PBM array were used to predict the bound probes (see Methods).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>The results show good agreement between the two technologies (
<xref ref-type="table" rid="gku117-T1">Table 1</xref>
and
<xref ref-type="fig" rid="gku117-F1">Figure 1</xref>
A). The average accuracy of HT-SELEX models is significantly lower than that obtained by PBM-derived models (e.g. AUC of 0.825 compared with 0.899 for the best PBM-derived models,
<italic>P</italic>
-value = 7.68·10
<sup></sup>
<sup>14</sup>
Wilcoxon signed-rank test). This is expected because the evaluation is using PBM measurements. In an additional test on two other PBM data sets covering 115 human and mouse E26 transformation-specific (ETS) and homeodomain TFs tested on a single array (
<xref rid="gku117-B19" ref-type="bibr">19</xref>
,
<xref rid="gku117-B20" ref-type="bibr">20</xref>
), HT-SELEX-derived models achieved an average AUC of 0.928 (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
). These results may reflect properties of specific TF families.
<fig id="gku117-F1" position="float">
<label>Figure 1.</label>
<caption>
<p>Quality of binding prediction based on PBM data. (
<bold>A</bold>
) Accuracy in predicting PBM binding. For each PBM experiment, PBM probes are ranked according to motifs inferred by five PBM algorithms (AM = Amadeus-PBM, SW = Seed-and-Wobble, RM = RankMotif++, BE = BEEML-PBM and RAP) and by the HT-SELEX-derived models. This ranking is compared with the true ranking by calculating the AUC for predicting the bound PBM probes. Each dot is the average result of one algorithm in two or four experiments (TF names are listed at the bottom, TF family names are at the top, as given in Jolma
<italic>et al.</italic>
). (
<bold>B</bold>
) Sensitivity results in predicting PBM binding. For each PBM experiment, the bound probes were predicted using BEEML-PBM and HT-SELEX PWM models. The plot shows the sensitivity (true positive rate) at 1% false-positive rate of these predictions. Colors correspond to protein families. (
<bold>C</bold>
) Disagreement between HT-SELEX- and PBM-derived models. The logos are of the PWMs learned from HT-SELEX (top), and PBM (middle and bottom) taken from Jolma
<italic>et al.</italic>
and UniPROBE, respectively. The middle and bottom models learned from PBM for each TF are the primary and secondary models, respectively. 1, 2: examples where HT-SELEX produces motifs that are similar to the primary PBM model, but too long for PBM technology; 3, 4: cases where HT-SELEX models agree with PBM secondary model; 5: an example where the HT-SELEX model disagrees with both PBM models. (TCF3 was excluded from the analysis because each technology tested a different TF with that name: a bHLH Tcf3 was tested by HT-SELEX, whereas the HMG Tcf3 was tested by PBM).</p>
</caption>
<graphic xlink:href="gku117f1p"></graphic>
</fig>
</p>
<p>We found no significant difference between binding models based on mouse and human proteins and between models based on full proteins and binding domains; in both cases the two models performed essentially equally in predicting PBM binding that used mouse binding domains (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
). Note that sample sizes were small and broader tests are still needed.</p>
<p>For some TFs, the HT-SELEX prediction results were poorer than those achieved by PBM models. We define a set of HT-SELEX-derived models for the same TF as a failure if it achieved an AUC lower by at least 0.1 than the average of the five PBM models. HT-SELEX models failed in 20 TFs (covered by 42 experiments), including all Sox, E2F and Rfx proteins, as well as the individual TFs Hnf4a, Rara, Rxra, Smad3, Sry and Zscan4 (
<xref ref-type="fig" rid="gku117-F1">Figure 1</xref>
A and B). These failures occur in particular TF families, including the E2F, Sox, NR, Rfx, MAD and znfC2H2 families [experiments on HMG and znfC2H2 proteins had a low success rate (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
)]. The high-mobility group (HMG) super-family includes the Sox, Lef and Tcf protein families. It was suggested that for this family of proteins the DNA structure plays a larger role for binding site recognition than sequence specificity (
<xref rid="gku117-B21" ref-type="bibr">21</xref>
), which may explain the failure for this protein family. The recent observation that E2F1 and Smad1 ChIP-seq peaks do not contain the
<italic>in vitro</italic>
binding site (
<xref rid="gku117-B22" ref-type="bibr">22</xref>
) may explain the failures for E2F and Smad3.
<xref ref-type="fig" rid="gku117-F1">Figure 1</xref>
C presents the differences in the models for some of these cases.</p>
</sec>
<sec>
<title>A model-independent comparison</title>
<p>To avoid dependency on model learning, we performed a model-independent comparison. For each HT-SELEX experiment, we selected one arbitrary PBM experiment of the same TF from Cell08, SCI09 or EMBO10 studies. This resulted in 238 PBM-SELEX data sets. We chose to summarize the measurements of each method using 8-mer statistics, and focus on the top ranking 8-mers, which are expected to contain most of the information relevant for TF binding. For PBM 8-mer scores, we used average binding intensity, which is an accurate estimate of binding affinities (
<xref rid="gku117-B18" ref-type="bibr">18</xref>
). For HT-SELEX 8-mer scores, we tested two options: 8-mer frequency and 8-mer ratio (frequency in cycle i divided by frequency in cycle i-1) for all cycles (see Methods). With these scores at hand, for each data set we used the set of top 100 8-mers, according to one technology, and calculated the Pearson correlation of its scores with the scores of the same set in the other.
<xref ref-type="fig" rid="gku117-F2">Figure 2</xref>
shows the results for the different cycles, different scores and different selection of top 8-mers. Complete results are available in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Table S2</ext-link>
. Using the Spearman rank correlation provided similar results (data not shown).
<fig id="gku117-F2" position="float">
<label>Figure 2.</label>
<caption>
<p>Correlation between the top 8-mers as ranked by PBM and HT-SELEX data. For each HT-SELEX experiment 8-mers were scored by frequency or by the ratio of the frequency to the frequency in the previous cycle. The 8-mers of a PBM experiment on the same TF were scored by average binding intensity. For the 100 top scoring 8-mers according to PBM, the correlation between their PBM scores and their HT-SELEX frequency and ratio scores was computed. Similarly, for the 100 top scoring 8-mers according to HT-SELEX frequency (ratio), their correlation with the PBM scores was computed. (
<bold>A</bold>
) Average correlation in each cycle. Bar names indicate the technology used to determine the top 100 8-mers. The plot is based on average correlation over 238 TFs. (
<bold>B</bold>
) Distribution of the maximum correlation for different parameter combinations. The plot shows the number of times the maximum correlation is achieved by each combination of cycle, source of top 8-mers and HT-SELEX 8-mers score. (Because only 39 HT-SELEX experiments included data for a fifth cycle, we excluded it from the comparison; none of these experiments had maximum score at the fifth cycle).</p>
</caption>
<graphic xlink:href="gku117f2p"></graphic>
</fig>
</p>
<p>The results show that frequency scores give consistently better correlation with PBM scores than ratios. Hence, for the data analyzed in this study, frequency is superior to ratio, and we used it henceforth. The highest average correlation (just over 0.74) is achieved at cycle 3, when the top 8-mers are selected by PBM data, and HT-SELEX 8-mers are ranked by frequency (
<xref ref-type="fig" rid="gku117-F2">Figure 2</xref>
A). The k-mer ranking becomes more specific as the cycles progress [as was noted in (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
)]. At some point it becomes too specific, overrepresenting a small number of top k-mers and thus less accurate for medium- and low-affinity k-mers; we refer to this phenomenon as overspecification.
<xref ref-type="fig" rid="gku117-F2">Figure 2</xref>
B shows, for each combination of cycle, source of top 8-mers and HT-SELEX 8-mer score and the number of times the maximum correlation is achieved by that combination. Cycles 1, 2 and 3 have the highest numbers, supporting the idea of a trade-off between specificity and variability.</p>
<p>The results also suggest that 8-mers ranking using PBM is more reliable than using HT-SELEX. The top 100 PBM 8-mers have greater correlation than the top 100 HT-SELEX 8-mers. Identification of these 8-mers is important for learning the binding preference of the protein. At the current read coverage of HT-SELEX experiments, PBM data are more robust in identifying the top 8-mers. Sequencing a larger sample of the bound oligos may improve 8-mer scores and thus affect the binding models derived from them.</p>
<p>No significant differences were observed when comparing mouse versus human models as well as full protein versus binding domains (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
). Using median binding intensity to score PBM 8-mers instead of the average showed similar results (data not shown).</p>
</sec>
<sec>
<title>HT-SELEX models predict
<italic>in vivo</italic>
binding more accurately than PBM models</title>
<p>We compared the performance of PBM PWM models with HT-SELEX PWM models in predicting
<italic>in vivo</italic>
binding. We used human ChIP-seq data from the ENCODE project (
<xref rid="gku117-B13" ref-type="bibr">13</xref>
) for TFs that had both PBM and HT-SELEX data. In total, 15 human TFs covered by 111 ChIP-seq experiments were included in this comparison. The top 500 peaks in each experiment were used as a positive set, taking for each peak 250 bp around its center. The negative set consisted of 250-bp-long sequences taken from flanking sequences 300 bp downstream of each positive sequence. This choice is aimed to select negative sequences with statistical features, such as GC-content and k-mer counts, similar to those of the positive ones (
<xref rid="gku117-B23" ref-type="bibr">23</xref>
). PBM and HT-SELEX PWM models were taken from UniPROBE database (
<xref rid="gku117-B9" ref-type="bibr">9</xref>
) and Jolma
<italic>et al.</italic>
(
<xref rid="gku117-B12" ref-type="bibr">12</xref>
), respectively. When multiple models were reported by one technology, we assigned to each genomic sequence the highest score obtained by such a model. We did not distinguish between human and mouse TFs because Jolma
<italic>et al.</italic>
(
<xref rid="gku117-B12" ref-type="bibr">12</xref>
) reported conservation of binding specificities between these species. Average AUC over the set of ChIP-seq experiments for each TF is reported. Complete results are shown in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Table S3</ext-link>
.</p>
<p>Our results show that HT-SELEX models are more accurate in predicting
<italic>in vivo</italic>
binding (average AUC of 0.756 compared with 0.715,
<italic>P</italic>
-value = 9·10
<sup></sup>
<sup>5</sup>
Wilcoxon signed-rank test) (
<xref ref-type="fig" rid="gku117-F3">Figure 3</xref>
A). Trimming the PWM to the eight most informative positions results in average AUC of 0.732 and 0.719 (
<italic>P</italic>
-value = 0.18 Wilcoxon signed-rank test), respectively, hinting that the advantage of HT-SELEX models may be due to the addition of flanking positions. We note that the test set is too small to draw definitive conclusions, but we believe it points to an advantage of HT-SELEX models in predicting
<italic>in vivo</italic>
binding. For Tcf7, Srf, Mafk, Gata3 and Hnf4a HT-SELEX models, AUC is greater than that of PBM models by > 0.05 (
<xref ref-type="fig" rid="gku117-F1">Figure 1</xref>
C and
<xref ref-type="fig" rid="gku117-F3">3</xref>
B). When excluding secondary PBM models, for Tcf7 and Mafk the average AUC increased from 0.61 to 0.81 and from 0.87 to 0.92, respectively, suggesting that some secondary models are wrong. At the same time, for Hnf4a the AUC dropped from 0.86 to 0.65. Similar results were observed on mouse ChIP-seq experiments downloaded from the ENCODE project (data not shown). Using the upstream sequences as control gave similar results (data not shown). When using a larger set of 1000 peaks, the advantage of HT-SELEX was smaller but still significant (data not shown).
<fig id="gku117-F3" position="float">
<label>Figure 3.</label>
<caption>
<p>Predicting
<italic>in vivo</italic>
binding using HT-SELEX- and PBM-derived PWM models. The PWMs learned from HT-SELEX and PBM were taken from Jolma
<italic>et al.</italic>
and UniPROBE, respectively.
<italic>In vivo</italic>
binding was measured by the ENCODE project using ChIP-seq. (
<bold>A</bold>
) AUC results for each ChIP-seq experiment for which HT-SELEX and PBM experiments on the same TF are available. (
<bold>B</bold>
) Examples where HT-SELEX predicts
<italic>in vivo</italic>
binding better. For all these examples, the average AUC achieved by the HT-SELEX models exceeds that of the PBM models by >0.05.</p>
</caption>
<graphic xlink:href="gku117f3p"></graphic>
</fig>
</p>
<p>We checked the effect of the source organism on predicting
<italic>in vivo</italic>
binding in human. Similarly, we compared the prediction quality based on experiments with full proteins compared to experiments using only the TF binding domains. None of the comparisons showed a significant difference (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
).</p>
</sec>
<sec>
<title>HT-SELEX experiments show systematic biases</title>
<p>Binding models for HT-SELEX use the most frequent k-mer in some cycle as a seed (
<xref rid="gku117-B6" ref-type="bibr">6</xref>
). To study the performance of these models on PBM data, we selected the most frequent 8-mer from each cycle and compared it with the top PBM 8-mer (determined by average binding intensity), when PBM data for the same TF were available (see Methods). We define a positive identification if the top 8-mer is identical with up to two mismatches to the top PBM 8-mer allowing an offset of up to two positions between the aligned sequences. The results are summarized in
<xref ref-type="fig" rid="gku117-F4">Figure 4</xref>
A. Notably, in a substantial number of experiments, the most frequent HT-SELEX 8-mer in the last cycles did not match the top PBM 8-mer. Only 184 of 225 (81%) of the top HT-SELEX 8-mers in cycle 4 matched the top PBM 8-mer. Complete results are available in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Table S5</ext-link>
.
<fig id="gku117-F4" position="float">
<label>Figure 4.</label>
<caption>
<p>Systematic biases in HT-SELEX technology. (
<bold>A</bold>
) Properties of the most frequent 8-mer in different cycles. For each cycle, the fraction of times the most frequent 8-mer in the HT-SELEX experiment was poly(A), poly(C) or matched the most frequent 8-mer computed from PBM data is presented (see text). (
<bold>B</bold>
) The 8-mer frequency density plots for each cycle. The 8-mers were partitioned into three categories: palindromes, poly(C) and all the rest. For each category, a smoothed density plot of its 8-mer frequencies is shown. (
<bold>C</bold>
) Abundant false oligos in Atf7 HT-SELEX experiment. For cycles 3, 4 and 5, the seven most frequent oligos are shown along with their counts. The consensus sequence is highlighted in yellow (none of the top seven oligos in cycle 5 contain the consensus).</p>
</caption>
<graphic xlink:href="gku117f4p"></graphic>
</fig>
</p>
<p>Among the most frequent 8-mers in the different cycles, we observed many A-rich and C-rich 8-mers. To quantify this phenomenon, we focused on poly(A) and poly(C) 8-mers, defined as 8-mers containing at least 7 As or 7 Cs, respectively.
<xref ref-type="fig" rid="gku117-F4">Figure 4</xref>
A shows an overabundance of such 8-mers as the most frequent 8-mers, especially in cycles 0–2. When comparing the distributions of poly(A), poly(C) and of other 8-mers in each cycle over all experiments, poly(A) and poly(C) 8-mers are much more abundant in the initial pool than the other 8-mers (median frequency 1.0·10
<sup></sup>
<sup>3</sup>
and 5.66·10
<sup></sup>
<sup>4</sup>
in cycle 0 and 9.4·10
<sup></sup>
<sup>4</sup>
and 9.43·10
<sup></sup>
<sup>4</sup>
in cycle 1, respectively,
<italic>P</italic>
-value < 3·10
<sup></sup>
<sup>5</sup>
assuming a uniform null 8-mer distribution).</p>
<p>Moreover, certain 8-mers behaved differently in terms of their frequency changes between cycles. The poly(C) 8-mers were magnified from cycle to cycle much more than other 8-mers (
<xref ref-type="fig" rid="gku117-F4">Figure 4</xref>
B). We also tested palindromic 8-mers (i.e. 8-mers that are identical to their reverse complement). We observed that palindromic 8-mers are less frequent initially (
<italic>P</italic>
-value = 0.002 in cycle 0 assuming a uniform null 8-mer distribution) and are less magnified than the rest of the 8-mers (
<xref ref-type="fig" rid="gku117-F4">Figure 4</xref>
B,
<italic>P</italic>
-value = 2.2·10
<sup></sup>
<sup>6</sup>
using a K–S test for comparing the rate of change between cycle 3 and cycle 4 of the palindromes with the other non-poly(A) and non-poly(C) 8-mers). Complete results are available in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Table S6</ext-link>
. Ratio-based statistics showed the same phenomenon (data not shown).</p>
<p>Several reasons can explain the uneven abundance and magnification of k-mers. First, it can arise from technological artifacts. PCR biases have been observed and studied (
<xref rid="gku117-B24" ref-type="bibr">24</xref>
), and sequence bias is known to exist in high-throughput sequencing technologies, including the technologies used in Jolma
<italic>et al.</italic>
study (Illumina Genome Analyzer IIX and Hiseq2000) (
<xref rid="gku117-B25" ref-type="bibr">25</xref>
). We observed that nucleotide frequencies in the data are far from uniform, which can be explained by biased oligo generation (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
). Note that both the oligo generation and sequencing processes are strand-specific, so the frequencies of A and T (and of G and C) need not be equal. The systematic overrepresentation of specific k-mers has been observed both
<italic>in vivo</italic>
[in ChIP-seq data (
<xref rid="gku117-B26" ref-type="bibr">26</xref>
)] and
<italic>in vitro</italic>
[in PBMs (
<xref rid="gku117-B27" ref-type="bibr">27</xref>
), where it was termed ‘sticky k-mers’]. According to Jiang
<italic>et al.</italic>
, in PBM the set of sticky k-mers are all A-rich except CCCCGCCC, in partial agreement with our observations on HT-SELEX data. An alternative explanation suggested by a recent theoretical study was that TFs bind non-specifically to homogenous sequences (
<xref rid="gku117-B28" ref-type="bibr">28</xref>
). The underrepresentation of palindromes may be due to the formation of secondary structures that hinder PCR of such sequences (See
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
).</p>
</sec>
<sec>
<title>False oligos are common in HT-SELEX</title>
<p>Because whole reads (oligos) are sequenced and selected by the HT-SELEX technology, we also conducted an analysis of the abundance and magnification properties of oligos. For each TF, we identified the most frequent oligos in the last cycles. For the 100 most frequent oligos, we defined as false oligos those that do not contain any of the seeds reported in (
<xref rid="gku117-B12" ref-type="bibr">12</xref>
) allowing one mismatch. We also measured the oligo enrichment ratio, defined as the oligo’s frequency in the last cycle divided by its frequency in the previous cycle.</p>
<p>The false oligos were on average 25% of the 100 most frequent oligos in the last cycle. In 113 experiments (of 547), at least 50 of the 100 most frequent oligos in the last cycle were false. We observed two characteristics common to them. First, they tended to have more skewed nucleotide distribution than true oligos, with high frequency of one nucleotide (C in 75% of the cases). In all, 35% of the false oligos had one nucleotide composing at least 50% of the sequence, compared with 14% in the true oligos. Second, they tended to be extremely magnified, rising from a low count (or zero) in one cycle to a high count in the next. For example, 41% of the false oligos were not observed in the one-before-last cycle, compared with 19% of the true oligos (note that an oligo present in a particular cycle may have not been observed because of limited sampling).
<xref ref-type="fig" rid="gku117-F4">Figure 4</xref>
C shows an example of Atf7 HT-SELEX experiment. Complete results are available in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Table S8</ext-link>
. Of the previous studies, we observed similar biases in (
<xref rid="gku117-B6" ref-type="bibr">6</xref>
) and (
<xref rid="gku117-B8" ref-type="bibr">8</xref>
), but not in (
<xref rid="gku117-B7" ref-type="bibr">7</xref>
) (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Information</ext-link>
).</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>DISCUSSION</title>
<p>Protein–DNA binding has been in the focus of gene regulation studies for years. In the past, binding sites were defined based on few examples and thus had low resolution and limited accuracy. With technological developments, the ability to measure and predict binding sites has improved. A large leap came in the form of PBMs, which measure
<italic>in vitro</italic>
the binding intensity of a specific TF to thousands of probes, designed to cover all 10-mers (
<xref rid="gku117-B4" ref-type="bibr">4</xref>
). Binding models derived from these data performed well on other PBM data but less so on
<italic>in vivo</italic>
data (
<xref rid="gku117-B10" ref-type="bibr">10</xref>
). One possible explanation was that they reflect PBM artifacts together with the specific binding. How well PBM models represent
<italic>in vivo</italic>
TF–DNA binding remained an open question.</p>
<p>The emergence of new high-throughput
<italic>in vitro</italic>
technologies allowed us to deepen our understanding on this question. The HT-SELEX technology measures TF–DNA binding using high-throughput sequencing (
<xref rid="gku117-B6" ref-type="bibr">6–8</xref>
). Recently, Jolma
<italic>et al.</italic>
(
<xref rid="gku117-B12" ref-type="bibr">12</xref>
) reported HT-SELEX experiments covering hundreds of TFs, where many of them had been tested on PBM as well. This gave the first opportunity to compare on a large-scale models derived from two independent high-throughput
<italic>in vitro</italic>
technologies. Through this comparison, we could identify some of the advantages and disadvantages of each technology and determine how relevant
<italic>in vitro</italic>
models are to
<italic>in vivo</italic>
binding. A small-scale comparison by Jolma
<italic>et al.</italic>
(
<xref rid="gku117-B12" ref-type="bibr">12</xref>
) covering 14 models reported a few differences.</p>
<p>Our comparison shows that for most TFs the PBM and HT-SELEX technologies produce PWM models that are in good agreement. On average over 246 PBM experiments, the AUC when using the HT-SELEX-derived model for predicting PBM probe binding was 0.875. Moreover, in a model-independent comparison, the average correlation between HT-SELEX 8-mer counts in cycle 3 and PBM average binding intensities over the set of top 100 PBM-ranked 8-mers was 0.74. We observed that PBM-based 8-mer ranking is more accurate and robust than HT-SELEX-based ranking, and that the ranking 8-mers by their occurrence frequency in the Jolma
<italic>et al.</italic>
HT-SELEX data is better than ranking by between-cycle ratio score. We speculate that this is due to the relatively low read coverage in these experiments [compared with SELEX-seq data, where ratio-based scores were used (
<xref rid="gku117-B7" ref-type="bibr">7</xref>
)]. Although each HT-SELEX experiment reported hundreds of thousands of oligos, the SELEX-seq experiments had millions. We conclude that high coverage is necessary to derive accurate ratio scores. For some families of TFs, the two technologies give discordant results, perhaps because of differences in DNA structure [e.g. the HMG proteins, for which structure plays a larger role in binding (
<xref rid="gku117-B21" ref-type="bibr">21</xref>
)]. In comparison with
<italic>in vivo</italic>
data from ChIP-seq experiments, HT-SELEX models had better binding prediction, partly because of the ability to model the side positions more accurately. However, the set of TFs for which HT-SELEX, PBM and ChIP-seq data were available was rather modest, and larger tests are needed.</p>
<p>In analyzing the similarity between the top 8-mers determined by PBM and by HT-SELEX in each cycle, we observed the previously reported phenomenon of overspecification. Although 8-mer frequencies in the initial HT-SELEX cycles are too non-specific and similar to the initial pool (i.e. closer to random), the last cycles can, in some cases, be too specific. There is a trade-off between better coverage of top k-mers in later cycles, which can improve the binding model accuracy, and overrepresentation of few top k-mers, which can make the model too narrow, disregarding weaker binding motifs. This was noted in (
<xref rid="gku117-B6" ref-type="bibr">6</xref>
) and in previous studies using the SELEX technology (
<xref rid="gku117-B29" ref-type="bibr">29</xref>
).</p>
<p>In the course of our analysis, we observed and characterized several strong biases in many experiments in the HT-SELEX technology. First, we found a systematic bias toward certain types of k-mers [similar but not identical to the ‘sticky k-mers’, reported for PBM data (
<xref rid="gku117-B27" ref-type="bibr">27</xref>
)]. For many TFs, in the last cycle C-rich 8-mers are among the most frequent (
<xref ref-type="fig" rid="gku117-F4">Figure 4</xref>
). For example, in 7% of the experiments the most frequent 8-mer in the last cycle contained at least 7 Cs. These phenomena can be explained by PCR and sequencing biases (
<xref rid="gku117-B25" ref-type="bibr">25</xref>
) or perhaps by non-specific TF binding (
<xref rid="gku117-B28" ref-type="bibr">28</xref>
). Moreover, when measuring oligo (whole read) frequencies, we found that in some experiments the oligos with the highest frequency and those whose frequencies increased fastest between cycles did not contain the binding site; we call them ‘false oligos’. We observed these phenomena in the previous studies (
<xref rid="gku117-B6" ref-type="bibr">6</xref>
) and (
<xref rid="gku117-B8" ref-type="bibr">8</xref>
), but not in (
<xref rid="gku117-B7" ref-type="bibr">7</xref>
). Slattery
<italic>et al.</italic>
were the only ones to isolate bound oligos through a mobility shift assay, which suggests that this phase removes false oligos and thus improves the quality of the data.</p>
<p>Our analysis suggests that each of the HT-SELEX and PBM technologies has its advantages. PBM data are more accurate and robust in 8-mer ranking; HT-SELEX seems to be superior in
<italic>in vivo</italic>
binding prediction and allows better learning of longer motifs. We recommend using higher read coverage in HT-SELEX experiments, as was done in (
<xref rid="gku117-B7" ref-type="bibr">7</xref>
), to produce more sensitive models. We note that our comparisons and conclusions are limited to the specific technological implementations of HT-SELEX and PBM tested, for which the large-scale overlap exists. Unfortunately, we could not compare SELEX-seq and context-genomic PBMs because of fewer data sets.</p>
<p>Our study aimed to provide deeper and broader analysis of the properties of HT-SELEX experiments and to put them in the context of other high-throughput technologies for evaluating TF–DNA binding
<italic>in vivo</italic>
and
<italic>in vitro</italic>
. In the future, we plan to extend this work in several directions. First, we intend to use the new insights to design better motif finding algorithms based on HT-SELEX data. Second, we can learn a binding model based on the biomechanical mechanism of TF–DNA binding using regression methods that use k-mer counts [as in (
<xref rid="gku117-B8" ref-type="bibr">8</xref>
)]. Third, we plan to learn more complex binding models. More specifically, we plan to incorporate in the models 2-mer features as well as DNA shape features, as was done recently using custom PBM (
<xref rid="gku117-B30" ref-type="bibr">30</xref>
), and demonstrated using existing motif databases (
<xref rid="gku117-B31" ref-type="bibr">31</xref>
). The rich and broadly available HT-SELEX data provide a great opportunity to improve our understanding of TF–DNA binding.</p>
</sec>
<sec>
<title>SUPPLEMENTARY DATA</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gku117/-/DC1">Supplementary Data</ext-link>
are available at NAR Online.</p>
</sec>
<sec>
<title>FUNDING</title>
<p>
<funding-source>Israel Science Foundation (ISF)</funding-source>
[
<award-id>802/08, 317/13</award-id>
];
<funding-source>Edmond J. Safra Center for Bioinformatics at Tel Aviv University, the Dan David Foundation, and the Israeli Center for Research Excellence (I-CORE)</funding-source>
,
<funding-source>Gene Regulation in Complex Human Disease, center 41/11</funding-source>
(to Y.O). Funding for Open Access charge: ISF grant [317/13] and I-CORE.</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_42_8_e63__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gku117_nar-03321-met-k-2013-File006.docx"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_gku117_nar-03321-met-k-2013-File007.xls"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_gku117_nar-03321-met-k-2013-File008.xls"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>The authors thank Jussi Taipale for fruitful discussions and for providing them with the HT-SELEX oligo batch numbers and Kobi Perl for his comments on the manuscript.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="gku117-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aparicio</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Geisberg</surname>
<given-names>JV</given-names>
</name>
<name>
<surname>Struhl</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences
<italic>in vivo</italic>
</article-title>
<source>Curr. Protoc. Cell Biol.</source>
<year>2004</year>
<comment>
<bold>Chapter 17</bold>
, Unit 17.7</comment>
</element-citation>
</ref>
<ref id="gku117-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Genome-wide mapping of
<italic>in vivo</italic>
protein-DNA interactions</article-title>
<source>Science</source>
<year>2007</year>
<volume>316</volume>
<fpage>1497</fpage>
<lpage>1502</lpage>
<pub-id pub-id-type="pmid">17540862</pub-id>
</element-citation>
</ref>
<ref id="gku117-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rhee</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Pugh</surname>
<given-names>BF</given-names>
</name>
</person-group>
<article-title>ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy</article-title>
<source>Curr. Protoc. Mol. Biol.</source>
<year>2012</year>
<comment>
<bold>Chapter 21</bold>
, Unit 21 24</comment>
</element-citation>
</ref>
<ref id="gku117-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Philippakis</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Qureshi</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>He</surname>
<given-names>FS</given-names>
</name>
<name>
<surname>Estep</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities</article-title>
<source>Nat. Biotechnol.</source>
<year>2006</year>
<volume>24</volume>
<fpage>1429</fpage>
<lpage>1435</lpage>
<pub-id pub-id-type="pmid">16998473</pub-id>
</element-citation>
</ref>
<ref id="gku117-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fordyce</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Gerber</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>DeRisi</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Quake</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis</article-title>
<source>Nat. Biotechnol.</source>
<year>2010</year>
<volume>28</volume>
<fpage>970</fpage>
<lpage>975</lpage>
<pub-id pub-id-type="pmid">20802496</pub-id>
</element-citation>
</ref>
<ref id="gku117-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jolma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kivioja</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Toivonen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Enge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Taipale</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vaquerizas</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sillanpaa</surname>
<given-names>MJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities</article-title>
<source>Genome Res.</source>
<year>2010</year>
<volume>20</volume>
<fpage>861</fpage>
<lpage>873</lpage>
<pub-id pub-id-type="pmid">20378718</pub-id>
</element-citation>
</ref>
<ref id="gku117-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Slattery</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Abe</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Gomez-Alcala</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dror</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Honig</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Bussemaker</surname>
<given-names>HJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins</article-title>
<source>Cell</source>
<year>2011</year>
<volume>147</volume>
<fpage>1270</fpage>
<lpage>1282</lpage>
<pub-id pub-id-type="pmid">22153072</pub-id>
</element-citation>
</ref>
<ref id="gku117-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Granas</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Inferring binding energies from selected binding sites</article-title>
<source>PLoS Comput. Biol.</source>
<year>2009</year>
<volume>5</volume>
<fpage>e1000590</fpage>
<pub-id pub-id-type="pmid">19997485</pub-id>
</element-citation>
</ref>
<ref id="gku117-B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robasky</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D124</fpage>
<lpage>D128</lpage>
<pub-id pub-id-type="pmid">21037262</pub-id>
</element-citation>
</ref>
<ref id="gku117-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Orenstein</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Linhart</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shamir</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data</article-title>
<source>PLoS One</source>
<year>2012</year>
<volume>7</volume>
<fpage>e46145</fpage>
<pub-id pub-id-type="pmid">23029415</pub-id>
</element-citation>
</ref>
<ref id="gku117-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weirauch</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Cote</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Norel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Annala</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Saez-Rodriguez</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cokelaer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Vedenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Talukder</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evaluation of methods for modeling transcription factor sequence specificity</article-title>
<source>Nat. Biotechnol.</source>
<year>2013</year>
<volume>31</volume>
<fpage>126</fpage>
<lpage>134</lpage>
<pub-id pub-id-type="pmid">23354101</pub-id>
</element-citation>
</ref>
<ref id="gku117-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jolma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Whitington</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Toivonen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nitta</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Rastas</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Morgunova</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Enge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Taipale</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>DNA-binding specificities of human transcription factors</article-title>
<source>Cell</source>
<year>2013</year>
<volume>152</volume>
<fpage>327</fpage>
<lpage>339</lpage>
<pub-id pub-id-type="pmid">23332764</pub-id>
</element-citation>
</ref>
<ref id="gku117-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Landt</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Marinov</surname>
<given-names>GK</given-names>
</name>
<name>
<surname>Kundaje</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kheradpour</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pauli</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Bickel</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Cayting</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia</article-title>
<source>Genome Res.</source>
<year>2012</year>
<volume>22</volume>
<fpage>1813</fpage>
<lpage>1831</lpage>
<pub-id pub-id-type="pmid">22955991</pub-id>
</element-citation>
</ref>
<ref id="gku117-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>DNA binding sites: representation and discovery</article-title>
<source>Bioinformatics</source>
<year>2000</year>
<volume>16</volume>
<fpage>16</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">10812473</pub-id>
</element-citation>
</ref>
<ref id="gku117-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>Q</given-names>
</name>
</person-group>
<article-title>RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>i72</fpage>
<lpage>i79</lpage>
<pub-id pub-id-type="pmid">17646348</pub-id>
</element-citation>
</ref>
<ref id="gku117-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Philippakis</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Talukder</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gehrke</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Jaeger</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Metzler</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Vedenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Diversity and complexity in DNA recognition by transcription factors</article-title>
<source>Science</source>
<year>2009</year>
<volume>324</volume>
<fpage>1720</fpage>
<lpage>1723</lpage>
<pub-id pub-id-type="pmid">19443739</pub-id>
</element-citation>
</ref>
<ref id="gku117-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Quantitative analysis demonstrates most transcription factors require only simple models of specificity</article-title>
<source>Nat. Biotechnol.</source>
<year>2011</year>
<volume>29</volume>
<fpage>480</fpage>
<lpage>483</lpage>
<pub-id pub-id-type="pmid">21654662</pub-id>
</element-citation>
</ref>
<ref id="gku117-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Orenstein</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Mick</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Shamir</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>RAP: accurate and fast motif finding based on protein-binding microarray data</article-title>
<source>J. Comput. Biol.</source>
<year>2013</year>
<volume>20</volume>
<fpage>375</fpage>
<lpage>382</lpage>
<pub-id pub-id-type="pmid">23464877</pub-id>
</element-citation>
</ref>
<ref id="gku117-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gehrke</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Talukder</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Philippakis</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Pena-Castillo</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Alleyne</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Mnaimneh</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Botvinnik</surname>
<given-names>OB</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>ET</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences</article-title>
<source>Cell</source>
<year>2008</year>
<volume>133</volume>
<fpage>1266</fpage>
<lpage>1276</lpage>
<pub-id pub-id-type="pmid">18585359</pub-id>
</element-citation>
</ref>
<ref id="gku117-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Kivioja</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Palin</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Enge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bonke</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jolma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Varjosalo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gehrke</surname>
<given-names>AR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide analysis of ETS-family DNA-binding
<italic>in vitro</italic>
and
<italic>in vivo</italic>
</article-title>
<source>EMBO J.</source>
<year>2010</year>
<volume>29</volume>
<fpage>2147</fpage>
<lpage>2160</lpage>
<pub-id pub-id-type="pmid">20517297</pub-id>
</element-citation>
</ref>
<ref id="gku117-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>X</given-names>
</name>
<name>
<surname>West</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Honig</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Mann</surname>
<given-names>RS</given-names>
</name>
</person-group>
<article-title>Origins of specificity in protein-DNA recognition</article-title>
<source>Annu. Rev. Biochem.</source>
<year>2010</year>
<volume>79</volume>
<fpage>233</fpage>
<lpage>269</lpage>
<pub-id pub-id-type="pmid">20334529</pub-id>
</element-citation>
</ref>
<ref id="gku117-B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Machanick</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Inferring direct DNA binding from ChIP-seq</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>e128</fpage>
<pub-id pub-id-type="pmid">22610855</pub-id>
</element-citation>
</ref>
<ref id="gku117-B23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>ChIP-seq: advantages and challenges of a maturing technology</article-title>
<source>Nat. Rev. Genet.</source>
<year>2009</year>
<volume>10</volume>
<fpage>669</fpage>
<lpage>680</lpage>
<pub-id pub-id-type="pmid">19736561</pub-id>
</element-citation>
</ref>
<ref id="gku117-B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aird</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Ross</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>WS</given-names>
</name>
<name>
<surname>Danielsson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Russ</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gnirke</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries</article-title>
<source>Genome Biol.</source>
<year>2011</year>
<volume>12</volume>
<fpage>R18</fpage>
<pub-id pub-id-type="pmid">21338519</pub-id>
</element-citation>
</ref>
<ref id="gku117-B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hansen</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Dudoit</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Biases in Illumina transcriptome sequencing caused by random hexamer priming</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>e131</fpage>
<pub-id pub-id-type="pmid">20395217</pub-id>
</element-citation>
</ref>
<ref id="gku117-B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Genomic location analysis by ChIP-Seq</article-title>
<source>J. Cell Biochem.</source>
<year>2009</year>
<volume>107</volume>
<fpage>11</fpage>
<lpage>18</lpage>
<pub-id pub-id-type="pmid">19173299</pub-id>
</element-citation>
</ref>
<ref id="gku117-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Bayesian hierarchical model of protein-binding microarray k-mer data reduces noise and identifies transcription factor subclasses and preferred k-mers</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>1390</fpage>
<lpage>1398</lpage>
<pub-id pub-id-type="pmid">23559638</pub-id>
</element-citation>
</ref>
<ref id="gku117-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afek</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lukatsky</surname>
<given-names>DB</given-names>
</name>
</person-group>
<article-title>Genome-wide organization of eukaryotic preinitiation complex is influenced by nonconsensus protein-DNA binding</article-title>
<source>Biophys. J.</source>
<year>2013</year>
<volume>104</volume>
<fpage>1107</fpage>
<lpage>1115</lpage>
<pub-id pub-id-type="pmid">23473494</pub-id>
</element-citation>
</ref>
<ref id="gku117-B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klug</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Famulok</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>All you wanted to know about SELEX</article-title>
<source>Mol. Biol. Rep.</source>
<year>1994</year>
<volume>20</volume>
<fpage>97</fpage>
<lpage>107</lpage>
<pub-id pub-id-type="pmid">7536299</pub-id>
</element-citation>
</ref>
<ref id="gku117-B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gordan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Dror</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Horton</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape</article-title>
<source>Cell Rep.</source>
<year>2013</year>
<volume>3</volume>
<fpage>1093</fpage>
<lpage>1104</lpage>
<pub-id pub-id-type="pmid">23562153</pub-id>
</element-citation>
</ref>
<ref id="gku117-B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Dror</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Mathelier</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wasserman</surname>
<given-names>WW</given-names>
</name>
<name>
<surname>Gordan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>TFBSshape: a motif database for DNA shape features of transcription factor binding sites</article-title>
<source>Nucleic Acids Res.</source>
<year>2013</year>
<volume>42</volume>
<fpage>D148</fpage>
<lpage>D155</lpage>
<pub-id pub-id-type="pmid">24214955</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F54 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000F54 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4005680
   |texte=   A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24500199" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021