Serveur d'exploration autour du libre accès en Belgique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000405 ( Pmc/Corpus ); précédent : 0004049; suivant : 0004060 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A flexible integrative approach based on random forest improves prediction of transcription factor binding sites</title>
<author>
<name sortKey="Hooghe, Bart" sort="Hooghe, Bart" uniqKey="Hooghe B" first="Bart" last="Hooghe">Bart Hooghe</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Broos, Stefan" sort="Broos, Stefan" uniqKey="Broos S" first="Stefan" last="Broos">Stefan Broos</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Roy, Frans" sort="Van Roy, Frans" uniqKey="Van Roy F" first="Frans" last="Van Roy">Frans Van Roy</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="De Bleser, Pieter" sort="De Bleser, Pieter" uniqKey="De Bleser P" first="Pieter" last="De Bleser">Pieter De Bleser</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22492513</idno>
<idno type="pmc">3413102</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413102</idno>
<idno type="RBID">PMC:3413102</idno>
<idno type="doi">10.1093/nar/gks283</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000405</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A flexible integrative approach based on random forest improves prediction of transcription factor binding sites</title>
<author>
<name sortKey="Hooghe, Bart" sort="Hooghe, Bart" uniqKey="Hooghe B" first="Bart" last="Hooghe">Bart Hooghe</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Broos, Stefan" sort="Broos, Stefan" uniqKey="Broos S" first="Stefan" last="Broos">Stefan Broos</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Roy, Frans" sort="Van Roy, Frans" uniqKey="Van Roy F" first="Frans" last="Van Roy">Frans Van Roy</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="De Bleser, Pieter" sort="De Bleser, Pieter" uniqKey="De Bleser P" first="Pieter" last="De Bleser">Pieter De Bleser</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gks283-AFF1">Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gks283-AFF1">Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Transcription factor binding sites (TFBSs) are DNA sequences of 6–15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Paillard, G" uniqKey="Paillard G">G Paillard</name>
</author>
<author>
<name sortKey="Lavery, R" uniqKey="Lavery R">R Lavery</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaplan, T" uniqKey="Kaplan T">T Kaplan</name>
</author>
<author>
<name sortKey="Friedman, N" uniqKey="Friedman N">N Friedman</name>
</author>
<author>
<name sortKey="Margalit, H" uniqKey="Margalit H">H Margalit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thayer, Km" uniqKey="Thayer K">KM Thayer</name>
</author>
<author>
<name sortKey="Beveridge, Dl" uniqKey="Beveridge D">DL Beveridge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Calladine, Cr" uniqKey="Calladine C">CR Calladine</name>
</author>
<author>
<name sortKey="Drew, Hr" uniqKey="Drew H">HR Drew</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shakked, Z" uniqKey="Shakked Z">Z Shakked</name>
</author>
<author>
<name sortKey="Rabinovich, D" uniqKey="Rabinovich D">D Rabinovich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
<author>
<name sortKey="Jin, X" uniqKey="Jin X">X Jin</name>
</author>
<author>
<name sortKey="West, Sm" uniqKey="West S">SM West</name>
</author>
<author>
<name sortKey="Joshi, R" uniqKey="Joshi R">R Joshi</name>
</author>
<author>
<name sortKey="Honig, B" uniqKey="Honig B">B Honig</name>
</author>
<author>
<name sortKey="Mann, Rs" uniqKey="Mann R">RS Mann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angarica, Ve" uniqKey="Angarica V">VE Angarica</name>
</author>
<author>
<name sortKey="Perez, Ag" uniqKey="Perez A">AG Perez</name>
</author>
<author>
<name sortKey="Vasconcelos, At" uniqKey="Vasconcelos A">AT Vasconcelos</name>
</author>
<author>
<name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
<author>
<name sortKey="Contreras Moreira, B" uniqKey="Contreras Moreira B">B Contreras-Moreira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Man, Tk" uniqKey="Man T">TK Man</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
<author>
<name sortKey="Johnson, Pl" uniqKey="Johnson P">PL Johnson</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benos, Pv" uniqKey="Benos P">PV Benos</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Flanagan, Ra" uniqKey="O Flanagan R">RA O'Flanagan</name>
</author>
<author>
<name sortKey="Paillard, G" uniqKey="Paillard G">G Paillard</name>
</author>
<author>
<name sortKey="Lavery, R" uniqKey="Lavery R">R Lavery</name>
</author>
<author>
<name sortKey="Sengupta, Am" uniqKey="Sengupta A">AM Sengupta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tomovic, A" uniqKey="Tomovic A">A Tomovic</name>
</author>
<author>
<name sortKey="Oakeley, Ej" uniqKey="Oakeley E">EJ Oakeley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, M" uniqKey="Hu M">M Hu</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Taylor, Jm" uniqKey="Taylor J">JM Taylor</name>
</author>
<author>
<name sortKey="Chinnaiyan, Am" uniqKey="Chinnaiyan A">AM Chinnaiyan</name>
</author>
<author>
<name sortKey="Qin, Zs" uniqKey="Qin Z">ZS Qin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gershenzon, Ni" uniqKey="Gershenzon N">NI Gershenzon</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
<author>
<name sortKey="Ioshikhes, Ip" uniqKey="Ioshikhes I">IP Ioshikhes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marinescu, Vd" uniqKey="Marinescu V">VD Marinescu</name>
</author>
<author>
<name sortKey="Kohane, Is" uniqKey="Kohane I">IS Kohane</name>
</author>
<author>
<name sortKey="Riva, A" uniqKey="Riva A">A Riva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Naughton, Bt" uniqKey="Naughton B">BT Naughton</name>
</author>
<author>
<name sortKey="Fratkin, E" uniqKey="Fratkin E">E Fratkin</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
<author>
<name sortKey="Brutlag, Dl" uniqKey="Brutlag D">DL Brutlag</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharon, E" uniqKey="Sharon E">E Sharon</name>
</author>
<author>
<name sortKey="Lubliner, S" uniqKey="Lubliner S">S Lubliner</name>
</author>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karas, H" uniqKey="Karas H">H Karas</name>
</author>
<author>
<name sortKey="Knuppel, R" uniqKey="Knuppel R">R Knuppel</name>
</author>
<author>
<name sortKey="Schulz, W" uniqKey="Schulz W">W Schulz</name>
</author>
<author>
<name sortKey="Sklenar, H" uniqKey="Sklenar H">H Sklenar</name>
</author>
<author>
<name sortKey="Wingender, E" uniqKey="Wingender E">E Wingender</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ponomarenko, Jv" uniqKey="Ponomarenko J">JV Ponomarenko</name>
</author>
<author>
<name sortKey="Ponomarenko, Mp" uniqKey="Ponomarenko M">MP Ponomarenko</name>
</author>
<author>
<name sortKey="Frolov, As" uniqKey="Frolov A">AS Frolov</name>
</author>
<author>
<name sortKey="Vorobyev, Dg" uniqKey="Vorobyev D">DG Vorobyev</name>
</author>
<author>
<name sortKey="Overton, Gc" uniqKey="Overton G">GC Overton</name>
</author>
<author>
<name sortKey="Kolchanov, Na" uniqKey="Kolchanov N">NA Kolchanov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, R" uniqKey="Liu R">R Liu</name>
</author>
<author>
<name sortKey="Blackwell, Tw" uniqKey="Blackwell T">TW Blackwell</name>
</author>
<author>
<name sortKey="States, Dj" uniqKey="States D">DJ States</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burden, He" uniqKey="Burden H">HE Burden</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pudimat, R" uniqKey="Pudimat R">R Pudimat</name>
</author>
<author>
<name sortKey="Schukat Talamazzini, Eg" uniqKey="Schukat Talamazzini E">EG Schukat-Talamazzini</name>
</author>
<author>
<name sortKey="Backofen, R" uniqKey="Backofen R">R Backofen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gunewardena, S" uniqKey="Gunewardena S">S Gunewardena</name>
</author>
<author>
<name sortKey="Jeavons, P" uniqKey="Jeavons P">P Jeavons</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bauer, Al" uniqKey="Bauer A">AL Bauer</name>
</author>
<author>
<name sortKey="Hlavacek, Ws" uniqKey="Hlavacek W">WS Hlavacek</name>
</author>
<author>
<name sortKey="Unkefer, Pj" uniqKey="Unkefer P">PJ Unkefer</name>
</author>
<author>
<name sortKey="Mu, F" uniqKey="Mu F">F Mu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meysman, P" uniqKey="Meysman P">P Meysman</name>
</author>
<author>
<name sortKey="Dang, Th" uniqKey="Dang T">TH Dang</name>
</author>
<author>
<name sortKey="Laukens, K" uniqKey="Laukens K">K Laukens</name>
</author>
<author>
<name sortKey="De Smet, R" uniqKey="De Smet R">R De Smet</name>
</author>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
<author>
<name sortKey="Marchal, K" uniqKey="Marchal K">K Marchal</name>
</author>
<author>
<name sortKey="Engelen, K" uniqKey="Engelen K">K Engelen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morozov, Av" uniqKey="Morozov A">AV Morozov</name>
</author>
<author>
<name sortKey="Siggia, Ed" uniqKey="Siggia E">ED Siggia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fulton, Dl" uniqKey="Fulton D">DL Fulton</name>
</author>
<author>
<name sortKey="Sundararajan, S" uniqKey="Sundararajan S">S Sundararajan</name>
</author>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
<author>
<name sortKey="Wasserman, Ww" uniqKey="Wasserman W">WW Wasserman</name>
</author>
<author>
<name sortKey="Roach, Jc" uniqKey="Roach J">JC Roach</name>
</author>
<author>
<name sortKey="Sladek, R" uniqKey="Sladek R">R Sladek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cho, Bk" uniqKey="Cho B">BK Cho</name>
</author>
<author>
<name sortKey="Knight, Em" uniqKey="Knight E">EM Knight</name>
</author>
<author>
<name sortKey="Barrett, Cl" uniqKey="Barrett C">CL Barrett</name>
</author>
<author>
<name sortKey="Palsson, Bo" uniqKey="Palsson B">BO Palsson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Portales Casamar, E" uniqKey="Portales Casamar E">E Portales-Casamar</name>
</author>
<author>
<name sortKey="Kirov, S" uniqKey="Kirov S">S Kirov</name>
</author>
<author>
<name sortKey="Lim, J" uniqKey="Lim J">J Lim</name>
</author>
<author>
<name sortKey="Lithwick, S" uniqKey="Lithwick S">S Lithwick</name>
</author>
<author>
<name sortKey="Swanson, Mi" uniqKey="Swanson M">MI Swanson</name>
</author>
<author>
<name sortKey="Ticoll, A" uniqKey="Ticoll A">A Ticoll</name>
</author>
<author>
<name sortKey="Snoddy, J" uniqKey="Snoddy J">J Snoddy</name>
</author>
<author>
<name sortKey="Wasserman, Ww" uniqKey="Wasserman W">WW Wasserman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matys, V" uniqKey="Matys V">V Matys</name>
</author>
<author>
<name sortKey="Fricke, E" uniqKey="Fricke E">E Fricke</name>
</author>
<author>
<name sortKey="Geffers, R" uniqKey="Geffers R">R Geffers</name>
</author>
<author>
<name sortKey="Gossling, E" uniqKey="Gossling E">E Gossling</name>
</author>
<author>
<name sortKey="Haubrock, M" uniqKey="Haubrock M">M Haubrock</name>
</author>
<author>
<name sortKey="Hehl, R" uniqKey="Hehl R">R Hehl</name>
</author>
<author>
<name sortKey="Hornischer, K" uniqKey="Hornischer K">K Hornischer</name>
</author>
<author>
<name sortKey="Karas, D" uniqKey="Karas D">D Karas</name>
</author>
<author>
<name sortKey="Kel, Ae" uniqKey="Kel A">AE Kel</name>
</author>
<author>
<name sortKey="Kel Margoulis, Ov" uniqKey="Kel Margoulis O">OV Kel-Margoulis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gowrisankar, S" uniqKey="Gowrisankar S">S Gowrisankar</name>
</author>
<author>
<name sortKey="Jegga, Ag" uniqKey="Jegga A">AG Jegga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kel, Ae" uniqKey="Kel A">AE Kel</name>
</author>
<author>
<name sortKey="Gossling, E" uniqKey="Gossling E">E Gossling</name>
</author>
<author>
<name sortKey="Reuter, I" uniqKey="Reuter I">I Reuter</name>
</author>
<author>
<name sortKey="Cheremushkin, E" uniqKey="Cheremushkin E">E Cheremushkin</name>
</author>
<author>
<name sortKey="Kel Margoulis, Ov" uniqKey="Kel Margoulis O">OV Kel-Margoulis</name>
</author>
<author>
<name sortKey="Wingender, E" uniqKey="Wingender E">E Wingender</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
<author>
<name sortKey="Gorin, Aa" uniqKey="Gorin A">AA Gorin</name>
</author>
<author>
<name sortKey="Lu, Xj" uniqKey="Lu X">XJ Lu</name>
</author>
<author>
<name sortKey="Hock, Lm" uniqKey="Hock L">LM Hock</name>
</author>
<author>
<name sortKey="Zhurkin, Vb" uniqKey="Zhurkin V">VB Zhurkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Satchwell, Sc" uniqKey="Satchwell S">SC Satchwell</name>
</author>
<author>
<name sortKey="Drew, Hr" uniqKey="Drew H">HR Drew</name>
</author>
<author>
<name sortKey="Travers, Aa" uniqKey="Travers A">AA Travers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goodsell, Ds" uniqKey="Goodsell D">DS Goodsell</name>
</author>
<author>
<name sortKey="Dickerson, Re" uniqKey="Dickerson R">RE Dickerson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Xj" uniqKey="Lu X">XJ Lu</name>
</author>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fujii, S" uniqKey="Fujii S">S Fujii</name>
</author>
<author>
<name sortKey="Kono, H" uniqKey="Kono H">H Kono</name>
</author>
<author>
<name sortKey="Takenaka, S" uniqKey="Takenaka S">S Takenaka</name>
</author>
<author>
<name sortKey="Go, N" uniqKey="Go N">N Go</name>
</author>
<author>
<name sortKey="Sarai, A" uniqKey="Sarai A">A Sarai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lavery, R" uniqKey="Lavery R">R Lavery</name>
</author>
<author>
<name sortKey="Zakrzewska, K" uniqKey="Zakrzewska K">K Zakrzewska</name>
</author>
<author>
<name sortKey="Beveridge, D" uniqKey="Beveridge D">D Beveridge</name>
</author>
<author>
<name sortKey="Bishop, Tc" uniqKey="Bishop T">TC Bishop</name>
</author>
<author>
<name sortKey="Case, Da" uniqKey="Case D">DA Case</name>
</author>
<author>
<name sortKey="Cheatham, T" uniqKey="Cheatham T">T Cheatham</name>
</author>
<author>
<name sortKey="Dixit, S" uniqKey="Dixit S">S Dixit</name>
</author>
<author>
<name sortKey="Jayaram, B" uniqKey="Jayaram B">B Jayaram</name>
</author>
<author>
<name sortKey="Lankas, F" uniqKey="Lankas F">F Lankas</name>
</author>
<author>
<name sortKey="Laughton, C" uniqKey="Laughton C">C Laughton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gartenberg, Mr" uniqKey="Gartenberg M">MR Gartenberg</name>
</author>
<author>
<name sortKey="Crothers, Dm" uniqKey="Crothers D">DM Crothers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parvin, Jd" uniqKey="Parvin J">JD Parvin</name>
</author>
<author>
<name sortKey="Mccormick, Rj" uniqKey="Mccormick R">RJ McCormick</name>
</author>
<author>
<name sortKey="Sharp, Pa" uniqKey="Sharp P">PA Sharp</name>
</author>
<author>
<name sortKey="Fisher, De" uniqKey="Fisher D">DE Fisher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dickerson, Re" uniqKey="Dickerson R">RE Dickerson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gorin, Aa" uniqKey="Gorin A">AA Gorin</name>
</author>
<author>
<name sortKey="Zhurkin, Vb" uniqKey="Zhurkin V">VB Zhurkin</name>
</author>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
<author>
<name sortKey="West, Sm" uniqKey="West S">SM West</name>
</author>
<author>
<name sortKey="Sosinsky, A" uniqKey="Sosinsky A">A Sosinsky</name>
</author>
<author>
<name sortKey="Liu, P" uniqKey="Liu P">P Liu</name>
</author>
<author>
<name sortKey="Mann, Rs" uniqKey="Mann R">RS Mann</name>
</author>
<author>
<name sortKey="Honig, B" uniqKey="Honig B">B Honig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Svozil, D" uniqKey="Svozil D">D Svozil</name>
</author>
<author>
<name sortKey="Kalina, J" uniqKey="Kalina J">J Kalina</name>
</author>
<author>
<name sortKey="Omelka, M" uniqKey="Omelka M">M Omelka</name>
</author>
<author>
<name sortKey="Schneider, B" uniqKey="Schneider B">B Schneider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spolar, Rs" uniqKey="Spolar R">RS Spolar</name>
</author>
<author>
<name sortKey="Record, Mt" uniqKey="Record M">MT Record</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Xj" uniqKey="Lu X">XJ Lu</name>
</author>
<author>
<name sortKey="Shakked, Z" uniqKey="Shakked Z">Z Shakked</name>
</author>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Breiman, L" uniqKey="Breiman L">L Breiman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lunetta, Kl" uniqKey="Lunetta K">KL Lunetta</name>
</author>
<author>
<name sortKey="Hayward, Lb" uniqKey="Hayward L">LB Hayward</name>
</author>
<author>
<name sortKey="Segal, J" uniqKey="Segal J">J Segal</name>
</author>
<author>
<name sortKey="Van Eerdewegh, P" uniqKey="Van Eerdewegh P">P Van Eerdewegh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cordell, Hj" uniqKey="Cordell H">HJ Cordell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ruiz, R" uniqKey="Ruiz R">R Ruiz</name>
</author>
<author>
<name sortKey="Jos, Rc" uniqKey="Jos R">RC Jos</name>
</author>
<author>
<name sortKey="Aguilar Ruiz, Js" uniqKey="Aguilar Ruiz J">JS Aguilar-Ruiz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hall, M" uniqKey="Hall M">M Hall</name>
</author>
<author>
<name sortKey="Frank, E" uniqKey="Frank E">E Frank</name>
</author>
<author>
<name sortKey="Holmes, G" uniqKey="Holmes G">G Holmes</name>
</author>
<author>
<name sortKey="Pfahringer, B" uniqKey="Pfahringer B">B Pfahringer</name>
</author>
<author>
<name sortKey="Reutemann, P" uniqKey="Reutemann P">P Reutemann</name>
</author>
<author>
<name sortKey="Witten, Ih" uniqKey="Witten I">IH Witten</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medina Rivera, A" uniqKey="Medina Rivera A">A Medina-Rivera</name>
</author>
<author>
<name sortKey="Abreu Goodger, C" uniqKey="Abreu Goodger C">C Abreu-Goodger</name>
</author>
<author>
<name sortKey="Thomas Chollier, M" uniqKey="Thomas Chollier M">M Thomas-Chollier</name>
</author>
<author>
<name sortKey="Salgado, H" uniqKey="Salgado H">H Salgado</name>
</author>
<author>
<name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J van Helden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ernst, J" uniqKey="Ernst J">J Ernst</name>
</author>
<author>
<name sortKey="Plasterer, Hl" uniqKey="Plasterer H">HL Plasterer</name>
</author>
<author>
<name sortKey="Simon, I" uniqKey="Simon I">I Simon</name>
</author>
<author>
<name sortKey="Bar Joseph, Z" uniqKey="Bar Joseph Z">Z Bar-Joseph</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Narang, V" uniqKey="Narang V">V Narang</name>
</author>
<author>
<name sortKey="Mittal, A" uniqKey="Mittal A">A Mittal</name>
</author>
<author>
<name sortKey="Sung, Wk" uniqKey="Sung W">WK Sung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramsey, Sa" uniqKey="Ramsey S">SA Ramsey</name>
</author>
<author>
<name sortKey="Knijnenburg, Ta" uniqKey="Knijnenburg T">TA Knijnenburg</name>
</author>
<author>
<name sortKey="Kennedy, Ka" uniqKey="Kennedy K">KA Kennedy</name>
</author>
<author>
<name sortKey="Zak, De" uniqKey="Zak D">DE Zak</name>
</author>
<author>
<name sortKey="Gilchrist, M" uniqKey="Gilchrist M">M Gilchrist</name>
</author>
<author>
<name sortKey="Gold, Es" uniqKey="Gold E">ES Gold</name>
</author>
<author>
<name sortKey="Johnson, Cd" uniqKey="Johnson C">CD Johnson</name>
</author>
<author>
<name sortKey="Lampano, Ae" uniqKey="Lampano A">AE Lampano</name>
</author>
<author>
<name sortKey="Litvak, V" uniqKey="Litvak V">V Litvak</name>
</author>
<author>
<name sortKey="Navarro, G" uniqKey="Navarro G">G Navarro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gama Castro, S" uniqKey="Gama Castro S">S Gama-Castro</name>
</author>
<author>
<name sortKey="Jimenez Jacinto, V" uniqKey="Jimenez Jacinto V">V Jimenez-Jacinto</name>
</author>
<author>
<name sortKey="Peralta Gil, M" uniqKey="Peralta Gil M">M Peralta-Gil</name>
</author>
<author>
<name sortKey="Santos Zavaleta, A" uniqKey="Santos Zavaleta A">A Santos-Zavaleta</name>
</author>
<author>
<name sortKey="Penaloza Spinola, Mi" uniqKey="Penaloza Spinola M">MI Penaloza-Spinola</name>
</author>
<author>
<name sortKey="Contreras Moreira, B" uniqKey="Contreras Moreira B">B Contreras-Moreira</name>
</author>
<author>
<name sortKey="Segura Salazar, J" uniqKey="Segura Salazar J">J Segura-Salazar</name>
</author>
<author>
<name sortKey="Muniz Rascado, L" uniqKey="Muniz Rascado L">L Muniz-Rascado</name>
</author>
<author>
<name sortKey="Martinez Flores, I" uniqKey="Martinez Flores I">I Martinez-Flores</name>
</author>
<author>
<name sortKey="Salgado, H" uniqKey="Salgado H">H Salgado</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mokry, M" uniqKey="Mokry M">M Mokry</name>
</author>
<author>
<name sortKey="Hatzis, P" uniqKey="Hatzis P">P Hatzis</name>
</author>
<author>
<name sortKey="De Bruijn, E" uniqKey="De Bruijn E">E de Bruijn</name>
</author>
<author>
<name sortKey="Koster, J" uniqKey="Koster J">J Koster</name>
</author>
<author>
<name sortKey="Versteeg, R" uniqKey="Versteeg R">R Versteeg</name>
</author>
<author>
<name sortKey="Schuijers, J" uniqKey="Schuijers J">J Schuijers</name>
</author>
<author>
<name sortKey="Van De Wetering, M" uniqKey="Van De Wetering M">M van de Wetering</name>
</author>
<author>
<name sortKey="Guryev, V" uniqKey="Guryev V">V Guryev</name>
</author>
<author>
<name sortKey="Clevers, H" uniqKey="Clevers H">H Clevers</name>
</author>
<author>
<name sortKey="Cuppen, E" uniqKey="Cuppen E">E Cuppen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wunderlich, Z" uniqKey="Wunderlich Z">Z Wunderlich</name>
</author>
<author>
<name sortKey="Mirny, La" uniqKey="Mirny L">LA Mirny</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hendrickson, W" uniqKey="Hendrickson W">W Hendrickson</name>
</author>
<author>
<name sortKey="Schleif, R" uniqKey="Schleif R">R Schleif</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Y" uniqKey="Lu Y">Y Lu</name>
</author>
<author>
<name sortKey="Flaherty, C" uniqKey="Flaherty C">C Flaherty</name>
</author>
<author>
<name sortKey="Hendrickson, W" uniqKey="Hendrickson W">W Hendrickson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martinez Hackert, E" uniqKey="Martinez Hackert E">E Martinez-Hackert</name>
</author>
<author>
<name sortKey="Stock, Am" uniqKey="Stock A">AM Stock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Toro Roman, A" uniqKey="Toro Roman A">A Toro-Roman</name>
</author>
<author>
<name sortKey="Mack, Tr" uniqKey="Mack T">TR Mack</name>
</author>
<author>
<name sortKey="Stock, Am" uniqKey="Stock A">AM Stock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pan, Cq" uniqKey="Pan C">CQ Pan</name>
</author>
<author>
<name sortKey="Finkel, Se" uniqKey="Finkel S">SE Finkel</name>
</author>
<author>
<name sortKey="Cramton, Se" uniqKey="Cramton S">SE Cramton</name>
</author>
<author>
<name sortKey="Feng, Ja" uniqKey="Feng J">JA Feng</name>
</author>
<author>
<name sortKey="Sigman, Ds" uniqKey="Sigman D">DS Sigman</name>
</author>
<author>
<name sortKey="Johnson, Rc" uniqKey="Johnson R">RC Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Afflerbach, H" uniqKey="Afflerbach H">H Afflerbach</name>
</author>
<author>
<name sortKey="Schroder, O" uniqKey="Schroder O">O Schroder</name>
</author>
<author>
<name sortKey="Wagner, R" uniqKey="Wagner R">R Wagner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Travers, A" uniqKey="Travers A">A Travers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schneider, Td" uniqKey="Schneider T">TD Schneider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Ap" uniqKey="Zhang A">AP Zhang</name>
</author>
<author>
<name sortKey="Pigli, Yz" uniqKey="Pigli Y">YZ Pigli</name>
</author>
<author>
<name sortKey="Rice, Pa" uniqKey="Rice P">PA Rice</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lewis, Lk" uniqKey="Lewis L">LK Lewis</name>
</author>
<author>
<name sortKey="Harlow, Gr" uniqKey="Harlow G">GR Harlow</name>
</author>
<author>
<name sortKey="Gregg Jolly, La" uniqKey="Gregg Jolly L">LA Gregg-Jolly</name>
</author>
<author>
<name sortKey="Mount, Dw" uniqKey="Mount D">DW Mount</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kajimura, S" uniqKey="Kajimura S">S Kajimura</name>
</author>
<author>
<name sortKey="Aida, K" uniqKey="Aida K">K Aida</name>
</author>
<author>
<name sortKey="Duan, C" uniqKey="Duan C">C Duan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Michel, G" uniqKey="Michel G">G Michel</name>
</author>
<author>
<name sortKey="Minet, E" uniqKey="Minet E">E Minet</name>
</author>
<author>
<name sortKey="Ernest, I" uniqKey="Ernest I">I Ernest</name>
</author>
<author>
<name sortKey="Roland, I" uniqKey="Roland I">I Roland</name>
</author>
<author>
<name sortKey="Durant, F" uniqKey="Durant F">F Durant</name>
</author>
<author>
<name sortKey="Remacle, J" uniqKey="Remacle J">J Remacle</name>
</author>
<author>
<name sortKey="Michiels, C" uniqKey="Michiels C">C Michiels</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camenisch, G" uniqKey="Camenisch G">G Camenisch</name>
</author>
<author>
<name sortKey="Stroka, Dm" uniqKey="Stroka D">DM Stroka</name>
</author>
<author>
<name sortKey="Gassmann, M" uniqKey="Gassmann M">M Gassmann</name>
</author>
<author>
<name sortKey="Wenger, Rh" uniqKey="Wenger R">RH Wenger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, E" uniqKey="Kim E">E Kim</name>
</author>
<author>
<name sortKey="Albrechtsen, N" uniqKey="Albrechtsen N">N Albrechtsen</name>
</author>
<author>
<name sortKey="Deppert, W" uniqKey="Deppert W">W Deppert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, Yg" uniqKey="Shi Y">YG Shi</name>
</author>
<author>
<name sortKey="Berg, Jm" uniqKey="Berg J">JM Berg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marco, E" uniqKey="Marco E">E Marco</name>
</author>
<author>
<name sortKey="Garcia Nieto, R" uniqKey="Garcia Nieto R">R Garcia-Nieto</name>
</author>
<author>
<name sortKey="Gago, F" uniqKey="Gago F">F Gago</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, Wg" uniqKey="Zhu W">WG Zhu</name>
</author>
<author>
<name sortKey="Srinivasan, K" uniqKey="Srinivasan K">K Srinivasan</name>
</author>
<author>
<name sortKey="Dai, Zy" uniqKey="Dai Z">ZY Dai</name>
</author>
<author>
<name sortKey="Duan, Wr" uniqKey="Duan W">WR Duan</name>
</author>
<author>
<name sortKey="Druhan, Lj" uniqKey="Druhan L">LJ Druhan</name>
</author>
<author>
<name sortKey="Ding, Hm" uniqKey="Ding H">HM Ding</name>
</author>
<author>
<name sortKey="Yee, L" uniqKey="Yee L">L Yee</name>
</author>
<author>
<name sortKey="Villalona Calero, Ma" uniqKey="Villalona Calero M">MA Villalona-Calero</name>
</author>
<author>
<name sortKey="Plass, C" uniqKey="Plass C">C Plass</name>
</author>
<author>
<name sortKey="Otterson, Ga" uniqKey="Otterson G">GA Otterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, Xm" uniqKey="Chen X">XM Chen</name>
</author>
<author>
<name sortKey="Vinkemeier, U" uniqKey="Vinkemeier U">U Vinkemeier</name>
</author>
<author>
<name sortKey="Zhao, Yx" uniqKey="Zhao Y">YX Zhao</name>
</author>
<author>
<name sortKey="Jeruzalmi, D" uniqKey="Jeruzalmi D">D Jeruzalmi</name>
</author>
<author>
<name sortKey="Darnell, Je" uniqKey="Darnell J">JE Darnell</name>
</author>
<author>
<name sortKey="Kuriyan, J" uniqKey="Kuriyan J">J Kuriyan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ehret, Gb" uniqKey="Ehret G">GB Ehret</name>
</author>
<author>
<name sortKey="Reichenbach, P" uniqKey="Reichenbach P">P Reichenbach</name>
</author>
<author>
<name sortKey="Schindler, U" uniqKey="Schindler U">U Schindler</name>
</author>
<author>
<name sortKey="Horvath, Cm" uniqKey="Horvath C">CM Horvath</name>
</author>
<author>
<name sortKey="Fritz, S" uniqKey="Fritz S">S Fritz</name>
</author>
<author>
<name sortKey="Nabholz, M" uniqKey="Nabholz M">M Nabholz</name>
</author>
<author>
<name sortKey="Bucher, P" uniqKey="Bucher P">P Bucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Powell, Rm" uniqKey="Powell R">RM Powell</name>
</author>
<author>
<name sortKey="Parkhurst, Km" uniqKey="Parkhurst K">KM Parkhurst</name>
</author>
<author>
<name sortKey="Parkhurst, Lj" uniqKey="Parkhurst L">LJ Parkhurst</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Juo, Zs" uniqKey="Juo Z">ZS Juo</name>
</author>
<author>
<name sortKey="Chiu, Tk" uniqKey="Chiu T">TK Chiu</name>
</author>
<author>
<name sortKey="Leiberman, Pm" uniqKey="Leiberman P">PM Leiberman</name>
</author>
<author>
<name sortKey="Baikalov, I" uniqKey="Baikalov I">I Baikalov</name>
</author>
<author>
<name sortKey="Berk, Aj" uniqKey="Berk A">AJ Berk</name>
</author>
<author>
<name sortKey="Dickerson, Re" uniqKey="Dickerson R">RE Dickerson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davis, Na" uniqKey="Davis N">NA Davis</name>
</author>
<author>
<name sortKey="Majee, Ss" uniqKey="Majee S">SS Majee</name>
</author>
<author>
<name sortKey="Kahn, Jd" uniqKey="Kahn J">JD Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gardiner, Ej" uniqKey="Gardiner E">EJ Gardiner</name>
</author>
<author>
<name sortKey="Hunter, Ca" uniqKey="Hunter C">CA Hunter</name>
</author>
<author>
<name sortKey="Lu, Xj" uniqKey="Lu X">XJ Lu</name>
</author>
<author>
<name sortKey="Willett, P" uniqKey="Willett P">P Willett</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parker, Sc" uniqKey="Parker S">SC Parker</name>
</author>
<author>
<name sortKey="Hansen, L" uniqKey="Hansen L">L Hansen</name>
</author>
<author>
<name sortKey="Abaan, Ho" uniqKey="Abaan H">HO Abaan</name>
</author>
<author>
<name sortKey="Tullius, Td" uniqKey="Tullius T">TD Tullius</name>
</author>
<author>
<name sortKey="Margulies, Eh" uniqKey="Margulies E">EH Margulies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenbaum, Ja" uniqKey="Greenbaum J">JA Greenbaum</name>
</author>
<author>
<name sortKey="Pang, B" uniqKey="Pang B">B Pang</name>
</author>
<author>
<name sortKey="Tullius, Td" uniqKey="Tullius T">TD Tullius</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abeel, T" uniqKey="Abeel T">T Abeel</name>
</author>
<author>
<name sortKey="Saeys, Y" uniqKey="Saeys Y">Y Saeys</name>
</author>
<author>
<name sortKey="Bonnet, E" uniqKey="Bonnet E">E Bonnet</name>
</author>
<author>
<name sortKey="Rouze, P" uniqKey="Rouze P">P Rouze</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tullius, T" uniqKey="Tullius T">T Tullius</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohs, R" uniqKey="Rohs R">R Rohs</name>
</author>
<author>
<name sortKey="West, Sm" uniqKey="West S">SM West</name>
</author>
<author>
<name sortKey="Liu, P" uniqKey="Liu P">P Liu</name>
</author>
<author>
<name sortKey="Honig, B" uniqKey="Honig B">B Honig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Friedel, M" uniqKey="Friedel M">M Friedel</name>
</author>
<author>
<name sortKey="Nikolajewa, S" uniqKey="Nikolajewa S">S Nikolajewa</name>
</author>
<author>
<name sortKey="Suhnel, J" uniqKey="Suhnel J">J Suhnel</name>
</author>
<author>
<name sortKey="Wilhelm, T" uniqKey="Wilhelm T">T Wilhelm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Williams, N" uniqKey="Williams N">N Williams</name>
</author>
<author>
<name sortKey="Misleh, C" uniqKey="Misleh C">C Misleh</name>
</author>
<author>
<name sortKey="Li, Ww" uniqKey="Li W">WW Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Long, D" uniqKey="Long D">D Long</name>
</author>
<author>
<name sortKey="Lee, R" uniqKey="Lee R">R Lee</name>
</author>
<author>
<name sortKey="Williams, P" uniqKey="Williams P">P Williams</name>
</author>
<author>
<name sortKey="Chan, Cy" uniqKey="Chan C">CY Chan</name>
</author>
<author>
<name sortKey="Ambros, V" uniqKey="Ambros V">V Ambros</name>
</author>
<author>
<name sortKey="Ding, Y" uniqKey="Ding Y">Y Ding</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-id journal-id-type="hwp">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22492513</article-id>
<article-id pub-id-type="pmc">3413102</article-id>
<article-id pub-id-type="doi">10.1093/nar/gks283</article-id>
<article-id pub-id-type="publisher-id">gks283</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods Online</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A flexible integrative approach based on random forest improves prediction of transcription factor binding sites</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hooghe</surname>
<given-names>Bart</given-names>
</name>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Broos</surname>
<given-names>Stefan</given-names>
</name>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="gks283-COR1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>van Roy</surname>
<given-names>Frans</given-names>
</name>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>De Bleser</surname>
<given-names>Pieter</given-names>
</name>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gks283-AFF1">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="gks283-COR1">*</xref>
</contrib>
</contrib-group>
<aff id="gks283-AFF1">
<sup>1</sup>
Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium and
<sup>2</sup>
Department for Molecular Biomedical Research, VIB, B-9052 Ghent, Belgium</aff>
<author-notes>
<corresp id="gks283-COR1">*To whom correspondence should be addressed. Tel: +32 9 331 36 93; Fax: +32 9 331 36 09; Email:
<email>Stefan.Broos@dmbr.vib-ugent.be</email>
</corresp>
<corresp>Correspondence may also be addressed to Pieter De Bleser. Tel: +32 9 331 36 93; Fax: +32 9 331 36 09; E-mail:
<email>Pieter.DeBleser@dmbr.vib-ugent.be</email>
</corresp>
<fn>
<p>The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors</p>
</fn>
</author-notes>
<pmc-comment>For NAR both ppub and collection dates generated for PMC processing 1/27/05 beck</pmc-comment>
<pub-date pub-type="collection">
<month>8</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="ppub">
<month>8</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>5</day>
<month>4</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>5</day>
<month>4</month>
<year>2012</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>40</volume>
<issue>14</issue>
<fpage>e106</fpage>
<lpage>e106</lpage>
<history>
<date date-type="received">
<day>30</day>
<month>4</month>
<year>2010</year>
</date>
<date date-type="rev-recd">
<day>14</day>
<month>3</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>3</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2012. Published by Oxford University Press.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">http://creativecommons.org/licenses/by-nc/3.0</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Transcription factor binding sites (TFBSs) are DNA sequences of 6–15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.</p>
</abstract>
<counts>
<page-count count="15"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>INTRODUCTION</title>
<p>DNA-binding specificity of transcription factors (TFs) is traditionally viewed as consisting of a direct and an indirect readout component, and the proportion between them differs from one TF to another (
<xref ref-type="bibr" rid="gks283-B1">1</xref>
). The direct readout mechanism is well defined and involves recognition of specific DNA bases by amino acids. However, there is no deterministic recognition code for the interaction between DNA and protein sequences, essentially because of the influence of the three-dimensional (3D) structures of both macromolecules. The influence of the structure of the DNA-binding domain of the TF on the direct recognition code has been clearly shown for some TFs (
<xref ref-type="bibr" rid="gks283-B2">2</xref>
). If DNA-binding specificity were determined only by direct readout, then a probabilistic approach to TF–DNA recognition would suffice. The direct readout does not, however, fully explain the observed variety of sequence composition and binding affinity of binding sites for a specific TF (
<xref ref-type="bibr" rid="gks283-B3">3</xref>
). This is where the indirect readout mechanism comes in. Indirect readout is much less well defined but takes into consideration protein–DNA interactions that depend on base pairs that are not directly contacted by the protein. These protein–DNA interactions essentially reflect the influence of the structure and thermodynamic properties of the DNA before or upon binding by the TF. DNA is flexible and exhibits sequence-dependent deviations from the idealized B-DNA structure: the deviations arise from the stacking interactions of successive dinucleotides (
<xref ref-type="bibr" rid="gks283-B4">4</xref>
,
<xref ref-type="bibr" rid="gks283-B5">5</xref>
). These structural details have usually been neglected in the analysis of TF–DNA interactions: a probabilistic approach to direct readout is most commonly used as the sole component for prediction of transcription factor binding sites TFBSs, with varying degrees of success. Rohs
<italic>et al.</italic>
(
<xref ref-type="bibr" rid="gks283-B6">6</xref>
) recently emphasized the importance of the 3D structures of both macromolecules. Direct readout and indirect readout were renamed as base readout and shape readout, respectively. Base readout was subdivided according to either the major or the minor groove of the DNA, whereas shape readout was subdivided into global and local shape recognition. It was argued that individual TFs combine multiple readout mechanisms to achieve DNA-binding specificity.</p>
<p>Methods for identifying TFBSs can be classified into two main groups on the basis of the type of data used to model the TF–DNA binding specificity. Sequence-based methods model the binding specificity from a collection of aligned sequences known to bind the TF
<italic>in vitro</italic>
or
<italic>in vivo</italic>
. Structure-based methods use information from available crystal structures of TF–DNA complexes [reviewed in Ref. (
<xref ref-type="bibr" rid="gks283-B7">7</xref>
)]. Most sequence-based methods treat DNA as a uniform static structure that is independent of the nucleotide sequence. For example, the widely used position weight matrix (PWM) method (
<xref ref-type="bibr" rid="gks283-B8">8</xref>
) takes into account only the nucleotide frequency at each position of the TFBS and assumes independence between those positions. The assumption that the nucleotides add to the binding affinity of TFs independently from each other is called the ‘additivity’ assumption. Based on theoretical concerns and a few experiments for some TFs (
<xref ref-type="bibr" rid="gks283-B9 gks283-B10 gks283-B11 gks283-B12">9–12</xref>
), the correctness of this assumption and the quality of the approximation it yields have been discussed in the previous years (
<xref ref-type="bibr" rid="gks283-B13 gks283-B14 gks283-B15">13–15</xref>
). Recently, thanks to larger amounts of experimental data, it was shown that for most TFs, dependencies exist between nucleotide positions in their binding sites (
<xref ref-type="bibr" rid="gks283-B16">16</xref>
). This could be expected because it has been suggested that nucleotide positional dependencies observed within TFBSs arise from the structure and biophysical interactions of unbound and TF-bound DNA (
<xref ref-type="bibr" rid="gks283-B15">15</xref>
). Nucleotide positional dependencies are symptoms of shape readout rather than base readout. Nowadays, many sequence-based methods try to model nucleotide dependencies between positions, and thus they implicitly recognize the structural aspects of TF–DNA binding. They yield accuracy improvement over the classic PWM method for most TFs [e.g. Refs (
<xref ref-type="bibr" rid="gks283-B17 gks283-B18 gks283-B19 gks283-B20">17–20</xref>
)]. A few publications present sequence-based methods that use sequence-dependent structural characteristics explicitly (
<xref ref-type="bibr" rid="gks283-B21 gks283-B22 gks283-B23 gks283-B24 gks283-B25 gks283-B26 gks283-B27 gks283-B28">21–28</xref>
). Some of these methods, e.g. (
<xref ref-type="bibr" rid="gks283-B25">25</xref>
,
<xref ref-type="bibr" rid="gks283-B28">28</xref>
), report higher accuracies than those obtained by methods that model only nucleotide dependencies. Structure-based methods, by definition, take into account at least some structural characteristics of TF–DNA binding. Some of these methods are valuable for comparative modeling and they seem promising for TFBS prediction as well [e.g. (
<xref ref-type="bibr" rid="gks283-B7">7</xref>
,
<xref ref-type="bibr" rid="gks283-B29">29</xref>
)]. However, none of the structure-based methods have offered substantial improvement on the PWM method yet.</p>
<p>In this manuscript we present a sequence-based method that uses the random forest (RF) algorithm with features that cover either nucleotide positional dependencies or nucleotide sequence-dependent structural characteristics of the TFBS and its flanking sequences. We call the corresponding models the positional dependencies of nucleotides (NPD) model and the structural model. We also let our method combine both models and tried to integrate the PWM score in the combined model. The set of one-type models and combined models presented in this article should be seen as the products of our flexible integrative method, which can easily determine the most appropriate model to use. We measure the accuracy with which our models separate TFBSs from randomly selected genomic sequences, and we compare this measured value to the accuracy of the classic PWM method and the most recent alternative method, namely CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
).</p>
<p>Results are given for five eukaryotic TFs that bind differently to DNA: HIF1 (zipper-type group/Helix–Loop–Helix family), P53 (zinc-coordinating group/Loop–Sheet–Helix family), SP1 (zinc-coordinating group/BetaBetaAlpha-zinc finger family), STAT1 (Stat protein family) and TBP (Beta-sheet group/TATA box-binding family) (
<xref ref-type="bibr" rid="gks283-B30">30</xref>
). Our method was also used on seven prokaryotic data sets that were presented along with CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
) and a more recent Fis data set (
<xref ref-type="bibr" rid="gks283-B31">31</xref>
).</p>
</sec>
<sec sec-type="materials|methods">
<title>MATERIALS AND METHODS</title>
<sec>
<title>Data</title>
<p>Positive sequences are those that are bound
<italic>in vivo</italic>
at least under some cellular conditions. They were extracted from various sources. Binding sites for HIF1, STAT1 and TBP were fetched from Pazar (
<xref ref-type="bibr" rid="gks283-B32">32</xref>
), for SP1 from TRANSFAC (licensed version 2008.4) (
<xref ref-type="bibr" rid="gks283-B33">33</xref>
), and for P53 from another paper (
<xref ref-type="bibr" rid="gks283-B34">34</xref>
). TBP binding sites were from human, mouse and rat. The binding sites for the other TFs were all human. When necessary, TFBSs were mapped back to genomic coordinates. PWMs available from TRANSFAC (licensed version 2008.4) (
<xref ref-type="bibr" rid="gks283-B33">33</xref>
) were used with the search algorithm MATCH (
<xref ref-type="bibr" rid="gks283-B35">35</xref>
) to align the fetched binding sites. These matrices were V$STAT1_01, V$SP1_Q2_01, V$TBP_01 and V$HIF1_Q3. The known TFBSs were positioned to the nearest TFBS predicted by the appropriate PWM using the TRANSFAC-given threshold values to minimize false negatives (minFN threshold values). These threshold values enable recognition of at least 90% of positive sequences, but come along with a high rate of false positives. We excluded the sequence if no predicted TFBS was found within 20 bp on either side of the position given by the database. The P53 binding sites from the paper were not re-aligned because they were already annotated in sufficient detail. We considered only P53 binding sites that were tagged as qualitative and gapless (
<xref ref-type="bibr" rid="gks283-B34">34</xref>
). In this way, our data sets of positives consisted of 55 binding sites for HIF1, 87 for P53, 243 for SP1, 209 binding sites for STAT1 and 88 for TBP. In order to assess the performance on prokaryotic data sets, we used binding sites for AraC (13 sites), ArcA (44 sites), Fis (135 sites), FlhDC (12 sites), IHF (70 sites), LexA (13 sites) and PurR (17 sites) from the CRoSSeD article (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
). As an additional control for the prokaryotic data, we also used the large and qualitative ChIP-chip data set for Fis published by Cho
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gks283-B31">31</xref>
).</p>
<p>‘Negative’ or ‘background sequences’ are randomly selected from the human or
<italic>Escherichia coli</italic>
genome. We take 10 times as many negative sequences as the corresponding number of positives. We must provide enough negatives to ensure consistency of results, but not so many that the RF algorithm could suffer from an imbalance of the training data set, which would cause the focus to be too much on the classification accuracy of the majority class.</p>
</sec>
<sec>
<title>Structural characteristics</title>
<p>Structural characteristics used for this manuscript comprises characteristics calculated from scratch (see below for curvature and torsion calculations) and characteristics extracted from the literature. Most of these are correlated to some extent, but we let a feature selection procedure decide which characteristics and combinations thereof are most useful for identifying binding sites for each TF. Each DNA sequence-dependent structural characteristic is described by a list of all possible polynucleotides of a certain length, to which a numerical value describing the structural characteristic is assigned. For every characteristic, positions in a DNA sequence are scored by the value of the appropriate polynucleotide.</p>
<p>The calculation of sequence-dependent structural values requires an assumption of a certain 3D structure of the DNA. As we did not want to assume one specific DNA structural model, we implemented three different models: a model derived from protein-bound DNA (
<xref ref-type="bibr" rid="gks283-B36">36</xref>
), one from unbound DNA (
<xref ref-type="bibr" rid="gks283-B23">23</xref>
) and another from nucleosome-bound DNA (
<xref ref-type="bibr" rid="gks283-B37">37</xref>
,
<xref ref-type="bibr" rid="gks283-B38">38</xref>
). Each of these DNA structural models consists of values for all base-pair step parameters (roll, twist, tilt, rise, shift and slide) for each dinucleotide or trinucleotide. This enabled us to convert DNA sequences into 3D coordinates by using the rebuilding part of 3DNA (
<xref ref-type="bibr" rid="gks283-B39">39</xref>
), a program for analysis, rebuilding and visualization of 3D nucleic acid structures. For each of the DNA structural models, we did this conversion on 10 000 randomly generated sequences of 100 bp. From the resulting 3D coordinates, we then calculated the values of our structural characteristics. Values calculated for a specific structural characteristic but with coordinates coming from different DNA structural models were eventually treated as values for different structural characteristics. Curvature and torsion of the helix’s axis were calculated from the coordinates of this axis only, each for the highest possible resolution. The formulas we used are as follows:
<list list-type="roman-lower">
<list-item>
<p>Curvature: If a, b and c are three consecutive points on the helix’s axis, then
<inline-formula>
<inline-graphic xlink:href="gks283i1.jpg"></inline-graphic>
</inline-formula>
is orthogonal to the plane A formed by a, b and c. The curvature in b of the line containing a, b and c is given by the following equation:
<disp-formula>
<graphic xlink:href="gks283um1"></graphic>
</disp-formula>
</p>
</list-item>
<list-item>
<p>Torsion (dihedral angle): If a, b, c, d are four consecutive points on the helix’s axis, then
<inline-formula>
<inline-graphic xlink:href="gks283i2.jpg"></inline-graphic>
</inline-formula>
is orthogonal to the plane A formed by a, b and c, and
<inline-formula>
<inline-graphic xlink:href="gks283i3.jpg"></inline-graphic>
</inline-formula>
is orthogonal to the plane B formed by b, c and d. Then the dihedral angle is given by the following equation:
<disp-formula>
<graphic xlink:href="gks283um2"></graphic>
</disp-formula>
</p>
</list-item>
</list>
</p>
<p>These calculations provide a value for every base position. However, this value is calculated with coordinates of more than just this one base (see equations above) and these coordinates are dependent on the identity of neighboring bases. We sought to determine an accurate relation between sequence and calculated structural values, and so we took the shortest length of polynucleotides for which the relative standard deviation on the corresponding mean structural value was <1%. This polynucleotide length is 3, 4 or 5, depending on the characteristic and the DNA structural model. The calculated values of sequence-dependent structural characteristics (curvature and torsion of helix’s axis) are available from the authors upon request. Other structural characteristics used in this manuscript were extracted from the ‘literature’ and comprise properties derived from either unbound or TF-bound DNA. They are all given as a value per dinucleotide, mostly with a considerably large standard deviation. The standard deviations, and their lack when expanding to polynucleotides longer than two bases, indicate that the structural characteristics of base-pair steps depend on the identity of neighboring nucleotides. Although we used higher nucleotide lengths having nearly no standard deviation on their mean value for the structural characteristics we calculated ourselves, the calculation was still based on the assumption of DNA structural models described by only dinucleotides or trinucleotides. The structure of a dinucleotide is known to be influenced by the identity of the neighboring nucleotides (
<xref ref-type="bibr" rid="gks283-B27">27</xref>
,
<xref ref-type="bibr" rid="gks283-B40">40</xref>
,
<xref ref-type="bibr" rid="gks283-B41">41</xref>
), and taking into account these next-nearest-neighbor effects might further improve the accuracy of the structural model. A description of the structural characteristics we used is given below.</p>
<p>‘Curvature’ and ‘torsion’ describe the DNA backbone in its highest resolution and thus provide at least a measure of bending. The characteristic we implemented, ‘directed bending’, does the same (
<xref ref-type="bibr" rid="gks283-B42">42</xref>
). Directed bending means the extent to which a dinucleotide tends to bend towards either the major or the minor groove when it is bound by a TF, and it is used as a measure of deformability of DNA. Values are determined on sequences bound by the TF CAP at sites where sequence dependence of bending is maximal (
<xref ref-type="bibr" rid="gks283-B42">42</xref>
). Pre-bending of free DNA (
<xref ref-type="bibr" rid="gks283-B43">43</xref>
) and TF-induced bending (
<xref ref-type="bibr" rid="gks283-B44">44</xref>
) have been recognized for more than a decade as structural motifs common to many TF–DNA complexes. ‘Groove clash distance’ and ‘size’ are both components of the clash function that was constructed to give a quantitative interpretation of the observed sequence dependence of TF–DNA interactions on DNA twist (
<xref ref-type="bibr" rid="gks283-B45">45</xref>
). A steric clash between exocyclic groups results from out-of-plane base-pair distortions. Its size is defined as the sum of the radii for the exocyclic groups interacting in the grooves. Clash distance is the distance between the centers of the interacting groups when they are in an ‘idealized' conformation. Different geometries of the major and minor groove are taken into account and result in separate values per groove type (
<xref ref-type="bibr" rid="gks283-B45">45</xref>
). Groove shape is an interesting characteristic to explore because it was recently acknowledged that most TFs recognize the minor groove width upon specific binding (
<xref ref-type="bibr" rid="gks283-B46">46</xref>
). The value of groove width for prediction of TFBSs was suggested by Liu
<italic>et al.</italic>
(
<xref ref-type="bibr" rid="gks283-B23">23</xref>
) in 2001. ‘Minor groove opening’ is a measure of the degree to which a base step is open in the minor groove, and hence it is related to the above-mentioned measure of groove clash size. The values are derived from high-resolution crystal structures of unbound DNA in BI conformation (
<xref ref-type="bibr" rid="gks283-B23">23</xref>
). ‘Conformational tendency’ is measured by the standardized Pearson residuals for the test of uniformity or homogeneity of the individual dinucleotide steps over different conformations, i.e. structural types of DNA (
<xref ref-type="bibr" rid="gks283-B47">47</xref>
). These values are derived from unbound DNA and represent the tendency of a dinucleotide to favor a specific DNA conformation. Uniformity of dinucleotides is tested between A-type, B-type and combined conformational families (A, B and A + B conformations) and within B-types of DNA (BI, BII, A/B, B/A, RESTB). RESTB is not assigned to any of the existing conformational families. We did not use the conformational tendencies of dinucleotides within A-forms of DNA because the dinucleotide AA/TT does not occur there (
<xref ref-type="bibr" rid="gks283-B47">47</xref>
). Almost one-third of dinucleotides from protein–DNA complexes adopt AI or AII conformations. This plasticity of DNA, which allows the conformation to change locally from the common B-form into an A-form, is one of the ways in which DNA achieves specificity in protein–DNA binding (
<xref ref-type="bibr" rid="gks283-B44">44</xref>
,
<xref ref-type="bibr" rid="gks283-B48">48</xref>
,
<xref ref-type="bibr" rid="gks283-B49">49</xref>
).</p>
</sec>
<sec>
<title>Random Forest algorithm</title>
<p>The RF algorithm (
<xref ref-type="bibr" rid="gks283-B50">50</xref>
) (
<ext-link ext-link-type="uri" xlink:href="http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm">http://www.stat.berkeley.edu/∼breiman/RandomForests/cc_home.htm</ext-link>
) is a tree-based machine-learning algorithm and is the engine of both our structural method and our NPD method. It is an ensemble classifier that consists of many individual decision trees (CARTs: classification and regression trees) and outputs the class that is predicted by the majority of those trees. Tree-based methods consist of non-parametric statistical approaches for regression and classification analyses. Classification trees are grown by recursively partitioning the observations into subgroups with a more homogeneous categorical response. At each node, the explanatory variable giving the most homogeneous subgroups is selected. For the CART tree learning algorithm, this selection is based on Gini impurity, which is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset.</p>
<p>Tree-based methods can be very effective for selecting from large numbers of predictor variables, those that best explain the observations. They make no implicit assumptions about the form of underlying relationships between the predictor variables and the response, and so they might detect non-linear associations. The RF methodology forms an ensemble of unpruned classification or regression trees (CARTs) by bootstrapping samples of the training data and using random feature selection in the tree induction process. It generally exhibits a substantial performance improvement over the single tree classifier such as CART and C4.5. The biggest disadvantage of RF is that its embedded feature selection procedure cannot handle large numbers of irrelevant features. For this reason, we performed a comprehensive filter feature selection and wrapper-based feature selection before the final model is trained (see next section). We used FastRandomForest (
<ext-link ext-link-type="uri" xlink:href="http://fast-random-forest.googlecode.com/">http://fast-random-forest.googlecode.com/</ext-link>
), a parallelized implementation in Java. For further information, we refer to two publications that provide excellent explanations and examples on the use of RF for modeling dependencies among variables (
<xref ref-type="bibr" rid="gks283-B51">51</xref>
,
<xref ref-type="bibr" rid="gks283-B52">52</xref>
).</p>
</sec>
<sec>
<title>Building classification models</title>
<p>In the first stage of building a classification model, one model per characteristic is built. The structural method uses the above-mentioned structural characteristics, whereas the characteristics of the NPD method are represented by mononucleotides and dinucleotides. Hence, each sequence from the positive and negative set is converted to a series of structural vectors or is split up into mononucleotides or dinucleotides (
<xref ref-type="fig" rid="gks283-F1">Figure 1</xref>
A and B).
<fig id="gks283-F1" position="float">
<label>Figure 1.</label>
<caption>
<p>Overview of our approach: (
<bold>A</bold>
) The input from which models are built consists of the two classes of nucleotide sequences that the method should learn to separate. One class contains positive sequences (P, green) known to be bound
<italic>in vivo</italic>
; the other contains negative sequences (N, red) highly unlikely to be bound
<italic>in vivo</italic>
. (
<bold>B</bold>
) Each nucleotide sequence, from either class, is converted into multiple series of values; each series provides values for a specific DNA structural characteristic at all positions of the TFBS and its context (structural model), or simply consists of one base or two base parts of the sequence (NPD). (
<bold>C</bold>
) Basic selection of relevant features (i.e. positions) is made by statistical comparison of distributions of values for positive and negative sequences with mild thresholds. (
<bold>D</bold>
) Further selection is performed through wrapper-based feature selection, i.e
<italic>.</italic>
cross-validation performance evaluation with the RF algorithm. Per characteristic, redundant features are removed by sequential backwards elimination (SBE). Several models with one characteristic might be merged through BIRS. The final NPD model and final structural model can be merged into one integrative model. (
<bold>E</bold>
) The resulting model can be used by RF to predict the likelihood that a nucleotide sequence is a TFBS, after converting the sequence into series of the features contained in the model.</p>
</caption>
<graphic xlink:href="gks283f1"></graphic>
</fig>
</p>
<p>We perform a comprehensive feature selection in order to obtain the final model. A first round of feature selection is performed in a purely statistical way to make a basic selection of positions where a difference exists between the values for the characteristic of the positives and those of the negatives (so-called filter feature selection) (
<xref ref-type="fig" rid="gks283-F1">Figure 1</xref>
C). The statistical tests are applied with mild threshold values in order not to exclude too many features and to permit detection of their interactions by the RF algorithm later on. For the structural model, we consider values for all positions in the TFBS and for the 30 bases flanking it, as well as the mean value over all these positions, as features to be used in building the model. The Kolmogorov–Smirnov test at a false discovery rate threshold of 0.1 is used to determine the significance of differences between values at each position. The Wilcoxon rank test at a threshold of 0.05 is used to determine the significance of differences between values averaged over all 60 positions. For the NPD model, 30 mononucleotides flanking the TFBS start on both sides are considered. The basic selection of positions at which the mononucleotide distribution is different between positives and negatives is determined by the test for equality of proportions. More specifically, a position is selected when the sum of the logs of the
<italic>P</italic>
-values of proportion tests is significantly different from the background using a threshold of 0.1.</p>
<p>In the second round of feature selection, the preliminary model based on one characteristic is subjected to wrapper-based feature selection (
<xref ref-type="fig" rid="gks283-F1">Figure 1</xref>
D). We repeatedly evaluate the accuracy of the model by cross-validation with the RF algorithm and remove features of the basic selection when this does not cause a significant decrease in accuracy (measured as either
<italic>F</italic>
-measure or AUC). AUC (area under the curve) represents the area under the receiver operating characteristic (ROC) curve, whereas
<italic>F</italic>
-measure is the weighted harmonic mean of precision and recall. This procedure of removing insignificant features is also called sequential backwards elimination (SBE). It makes the model sparser, which permits better interpretation of the features it contains and which improves speed upon application.</p>
<p>At this stage, we end up with one model per characteristic. We rank all models according to their classification accuracy as determined by cross-validation (measured as AUC). Starting with the best performing one-characteristic model, we cumulatively merge it with lower-ranked models according to the best incremental ranked subset (BIRS) scheme (
<xref ref-type="bibr" rid="gks283-B53">53</xref>
); this implies the use of wrapper-based feature selection.</p>
<p>Combined models, i.e. models that contain characteristics from two or three different categories (NPD, structural or PWM score) are simply built by merging two or more models that are restricted to one category. The process of finding the combination that gives the best model can be easily automated by an extra round of wrapper-based feature selection.</p>
<p>When building PWMs for the eukaryotic sets, we automatically assigned their lengths by requiring that the start is on the assumed start position of the TFBSs and the end is characterized by three consecutive positions with an information content of at least 1.1. For the prokaryotic sets, it was necessary to use the entire sequence length for the PWM.</p>
</sec>
<sec>
<title>Evaluation of classification models</title>
<p>The evaluation of classification models is based on their prediction scores and provides an estimation of the accuracy of their classification. The prediction score of both the structural method and the NPD method is the RF confidence score, which is assigned to each sequence and indicates the certainty with which this sequence is predicted to belong to either the positive or the negative class. In the case of PWMs, we used the matrix similarity score (
<xref ref-type="bibr" rid="gks283-B35">35</xref>
). The evaluation of performance is visualized by ROC curves and precision-recall curves. Each ROC and precision-recall curve shown is derived from a threshold-based average of 20 curves. Data for each of these 20 curves were obtained by training the model with a randomly taken subset of 80% of the data and testing that trained model on the remaining 20%. Principle component analysis was performed on the full models using the Weka 3 suite (
<xref ref-type="bibr" rid="gks283-B54">54</xref>
) and used to select a top five feature set for each TF (default parameters).</p>
</sec>
</sec>
<sec sec-type="results">
<title>RESULTS</title>
<p>Based on the RF algorithm (
<xref ref-type="bibr" rid="gks283-B50">50</xref>
), we initially built two types of models. The so-called structural model uses one or more structural characteristics by employing their values at specific positions or their average value over all positions in the TFBS and its flanking sequences. The so-called NPD model accounts for positional dependencies at the nucleotide level, utilizing only nucleotide identities (mononucleotides and dinucleotides). The procedure of building and using these models is depicted in
<xref ref-type="fig" rid="gks283-F1">Figure 1</xref>
and explained in detail in the ‘Materials and Methods’ section. We start by discussing the classification accuracy of the classic PWM method, the structural method, the NPD method and combinations thereof, and compare our integrative method with a recent alternative method. This evaluation is performed on five high-quality eukaryotic data sets and eight prokaryotic data sets. Seven of these prokaryotic data sets are rather small and less well annotated. This led us to introduce a second, more qualitative Fis data set in order to assess the influence of data quality on the performance of the different methods. As an additional confirmation of the validity of the RF method, we evaluate the integrative TBP model on external data. Finally, we look at the selected features in each model and try to relate these features to what has been reported in the literature.</p>
<sec>
<title>Classification accuracy</title>
<p>The ROC curve is a standard representation of the trade-off between false positive rate (FPR) and sensitivity. We use details of ROC curves to visualize the classification accuracy of the models. Regular ROC curves and their corresponding measure AUC cover the full range of FPRs from 0 to 1 and are thus of not much use for estimating the discriminatory power of a predictor of TFBSs (
<xref ref-type="bibr" rid="gks283-B55">55</xref>
). Genome-wide predictions performed with an FPR even as small as 0.01 are not really useful because they would return an overload of false positives, e.g. about 6 million for the human genome. Therefore, we focus on the part of the ROC curves, which corresponds to the lower, more relevant range of FPR. We also take our most integrative model as a reference model and for each model we list the FPR that corresponds to the true positive rate (TPR) that has an FPR of 0.01–0.1 for this reference model, corresponding to the bending point of the curves. Statistics of pair-wise comparisons of these FPRs are provided as well. We compare our models with each other and also compare their accuracy with the accuracy of our home-made high-quality PWMs and with the most recently proposed alternative method, CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
). The latter comparison will be discussed extensively in the next section.</p>
<p>For the eukaryotic transcription factors (
<xref ref-type="fig" rid="gks283-F2">Figure 2</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Table S1</ext-link>
), both structural and NPD models perform better than the PWM for four out of five TFs (HIF1, SP1, STAT1, TBP). Overall, the NPD model performs better than the structural model (four out of five cases). This is logical because the structural method almost exclusively captures the shape readout mechanisms of DNA-binding specificity. All base readout information gets lost upon conversion from a nucleotide sequence to vectors of structural characteristics. The NPD model, in contrast, is expected to capture base readout, as well as some portions of the shape readout that can be derived from nucleotide positional dependencies. Nevertheless, the structural models alone perform surprisingly well: they perform better than PWM in four out of five cases. For most eukaryotic TFs, merging the structural model with the NPD model leads to clear synergistic effects and achieves a classification accuracy that is superior to the accuracy of the separate models and PWM (‘NPD_struct’). For three out of five eukaryotic transcription factors, inclusion of the PWM score even led to an additional improvement (‘NPD_struct_PWM’). The RF strategy significantly improved upon the PWM method for all eukaryotic TFs (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Table S1</ext-link>
).
<fig id="gks283-F2" position="float">
<label>Figure 2.</label>
<caption>
<p>Accuracy of classification models in identifying TFBSs, as assessed for five eukaryotic TFs. Details of threshold-averaged ROC curves showing the trade-off between TPR (
<italic>Y</italic>
-axis) and FPR (
<italic>X</italic>
-axis); Classification models applied: PWM (black), NPD (green), struct (blue), NPD_struct (purple), NPD_struct_PWM (orange), CRoSSeD (brown). (
<bold>A–E</bold>
) ROC curves for various transcription factors: (A). HIF1 (B) P53; (C) SP1; (D) STAT1; (E) TBP.</p>
</caption>
<graphic xlink:href="gks283f2"></graphic>
</fig>
</p>
<p>For most prokaryotic models (
<xref ref-type="fig" rid="gks283-F3">Figure 3</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Table S2</ext-link>
), the NPD model and the structural model do not outperform the PWM. When considering the low-resolution prokaryotic data sets alone (
<xref ref-type="fig" rid="gks283-F3">Figure 3</xref>
A–G), the structural or NPD model, or combinations thereof, perform better than the PWM model for only three out of seven TFs (ArcA,FlhDC and IHF). Combining the NPD model and the structural model leads to an improvement in five out of seven cases when compared with the individual models. Adding the PWM score did not result in an additional improvement, except for AraC. Compared with the other prokaryotic models, the high-quality Fis model performs exceptionally well (
<xref ref-type="fig" rid="gks283-F3">Figure 3</xref>
H). This result clearly demonstrates the importance of using qualitative data when building classification models.
<fig id="gks283-F3" position="float">
<label>Figure 3.</label>
<caption>
<p>Accuracy of classification models in identifying TFBSs, as assessed for eight prokaryotic TFs. Threshold-averaged ROC curves showing the trade-off between TPR (
<italic>Y</italic>
-axis) and FPR (
<italic>X</italic>
-axis); Classification models applied: PWM (black), NPD (green), struct (blue), NPD_struct (purple), NPD_struct_PWM (orange), CRoSSeD (brown). (
<bold>A–H</bold>
) ROC curves for various transcription factors: (A) AraC; (B) ArcA; (C) Fis; (D) FlhDC; (E) IHF; (F) LexA; (G) PurR; (H) Fis [ChIP-chip set (
<xref ref-type="bibr" rid="gks283-B31">31</xref>
)].</p>
</caption>
<graphic xlink:href="gks283f3"></graphic>
</fig>
</p>
<p>As an additional test, we also looked into precision-recall curves of the classification models for a growing number of background sequences (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Data S1</ext-link>
). With this type of analysis, we tested the models for their ability to cope with a growing number of background sequences. For each TF we compared the combined RF model with the PWM for 10 different background sizes. We started with a 1:1 ratio and augmented the number of background sequences until we had a 1:10 ratio. Models that are less suited to cope with many background sequences show a sharper decline in the precision-recall curves when facing more negative sequences. The prokaryotic models gave mixed results. Again, the high-quality Fis model performs exceptionally better than the other prokaryotic models. The RF models of ArcA and IHF perform equally well as the PWM, whereas the rest of the TFs did not benefit from the more complex RF model. However, unlike the prokaryotic models, the eukaryotic models gave consistent results. For all five eukaryotic TFs, the RF model turned out to be more robust against a growing number of background sequences compared with the simpler PWM model.</p>
<p>The difference in classification performance between the two Fis sets is striking (
<xref ref-type="fig" rid="gks283-F3">Figure 3</xref>
H and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Table S2</ext-link>
). The results indicate that with the high-quality Fis set, the RF model can improve upon the PWM method. In this case, NPD_struct_PWM is the best model and it is significantly better than all other models. It is clear that the overall classification accuracy of all the methods we compared is much better with the more reliable Fis data set. We speculate that lack of improvement for the RF models in the majority of prokaryotic sets is due to their relatively small sizes and poor quality of annotation, as is illustrated with this example.</p>
</sec>
<sec>
<title>Comparison with alternative sequence-based methods</title>
<p>A comprehensive overview of alternative sequence-based methods is given in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Data S2</ext-link>
. Differences between our method and others includes accounting for the context of the TFBS, the use of several structural characteristics instead of just one, the use of structural values for specific positions rather than just the average value along the TFBS, the use of both structural characteristics and nucleotide positional dependencies, and the use of the RF algorithm. RF does not require any assumptions about the form of underlying relationships between the predictor variables and the response. Hence, there is no need to assume independence or uniform contribution of multiple structural characteristics. Some other sequence-based methods use additional types of data to reduce the FPR of TFBS prediction, such as phylogenetic conservation (
<xref ref-type="bibr" rid="gks283-B56">56</xref>
), genome annotation [e.g. Refs (
<xref ref-type="bibr" rid="gks283-B57">57</xref>
,
<xref ref-type="bibr" rid="gks283-B58">58</xref>
)] or specific experimental results [e.g. Ref. (
<xref ref-type="bibr" rid="gks283-B59">59</xref>
)]. We only consider sequence-based methods not needing such additional information as methods comparable with ours. Some of these methods are SiteSleuth (
<xref ref-type="bibr" rid="gks283-B27">27</xref>
), promapper (
<xref ref-type="bibr" rid="gks283-B25">25</xref>
) and CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
). Each of them is based on a different classification algorithm, namely, support vector machine, Bayesian network and conditional random field, respectively. Furthermore, base readout and shape readout are captured in slightly different ways (e.g
<italic>.</italic>
other structural characteristics) and do not get equal chances due to arbitrary decisions. We conclude that with the exception of CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
), none of all previously presented methods have made clear comparisons to show how accurately their method identifies TFBSs compared with methods modeling dependencies between nucleotide positions, and that CRoSSeD is the current best performing alternative method. Here, we clearly show the worth of each of the ‘pure approaches’ (PWM, nucleotide positional dependencies, structural), and we show that integration of different approaches is beneficial to classification accuracy. We performed a quantitative comparison with the most recent alternative method, namely CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
). We compared our method with CRoSSeD both on the prokaryotic data set from the CRoSSeD article and on our eukaryotic data sets. The results on the eukaryotic data sets are depicted in
<xref ref-type="fig" rid="gks283-F2">Figure 2</xref>
. For all eukaryotic TFs, CRoSSeD separates TFBSs from non-TFBSs less accurately than the PWM. Our integrative model (‘NPD_struct’ and ‘NPD_struct_PWM’) performs significantly better than CRoSSeD for all eukaryotic TFs.</p>
<p>The prokaryotic data sets that were used originally come from RegulonDB (
<xref ref-type="bibr" rid="gks283-B60">60</xref>
) and are remarkably different from the eukaryotic data sets we used. Most of the prokaryotic data sets show very little sequence conservation and only expose weak signals over a long distance [see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Data of Meysman
<italic>et al</italic>
</ext-link>
<italic>.</italic>
(
<xref ref-type="bibr" rid="gks283-B28">28</xref>
)]. The lack of strong nucleotide conservation in most prokaryotic data sets might have caused CRoSSeD to be developed with a different focus from our RF models. The different natures of the prokaryotic data sets are reflected by a much lower level of classification accuracy of the predictors and we were forced to list the FPR that corresponds to the TPR with an FPR of 0.05 or even 0.1 for the reference model ‘NPD_struct_PWM’, instead of the 0.01 used for the eukaryotic data sets (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Table S2</ext-link>
). Our ROC curves and some conclusions differ from those shown in the paper presenting CRoSSeD (
<xref ref-type="bibr" rid="gks283-B28">28</xref>
). The different results must have been caused by differences in the evaluation setup. Many papers, including Meysman
<italic>et al.</italic>
(
<xref ref-type="bibr" rid="gks283-B28">28</xref>
), measure accuracy by the area under the ROC curve (AUC), but differences of its value might be irrelevant or even misleading, depending on the shapes of the ROC curves. Both CRoSSeD and our integrative method are among the best models in three out of seven cases (
<xref ref-type="fig" rid="gks283-F3">Figure 3</xref>
A–G), but what is truly remarkable is that the PWM proves to be the best model in three out of seven cases when considering low FPRs only. We also compared our methods with the CRoSSeD method on the high-quality prokaryotic Fis set (
<xref ref-type="fig" rid="gks283-F3">Figure 3</xref>
H). With this data set, the performance of all methods improves drastically. The RF method performs best, while the CRoSSeD method lags behind. These results make clear that data quality is an important determinant of model performance.</p>
<p>From both comparisons with CRoSSeD, we conclude that our approach performs better overall. The small prokaryotic data sets did not fully meet the requirements of our qualitative approach to evaluation of models, and hence conclusions should be made carefully.</p>
</sec>
<sec>
<title>Evaluation of a model on external data</title>
<p>The seemingly small improvements in accuracy presented here may nevertheless make a huge difference when identifying TFBSs on large DNA sequences and genome-wide. Furthermore, it is interesting to evaluate models on data that do not originate from the same data set with which the models were built. In order to evaluate our method on external data, we tested the TBP model on an independent chIP-seq experiment for TBP (
<xref ref-type="bibr" rid="gks283-B61">61</xref>
). This is a very demanding test, since the models need to identify the TBP binding site in a wider peak region of the chIP-seq experiment. The same is then repeated for a background with the same length distribution. In
<xref ref-type="table" rid="gks283-T1">Table 1</xref>
, we compare the PWM method, our integrated model (containing structural and NPD characteristics) and the CRoSSeD tool in terms of ROC AUC for classification of sequences containing
<italic>in vivo</italic>
TBP binding sites and background sequences. Results clearly show that the PWM (AUC 0.535) and CRoSSeD (AUC 0.574) can barely discriminate between the TBP peaks and the background model, whereas our integrated model fulfills this task much better (AUC 0.774).
<table-wrap id="gks283-T1" position="float">
<label>Table 1.</label>
<caption>
<p>Performance of the TBP model on external ChIP-seq TBP data set (Mokry
<italic>et al</italic>
.), measured in ROC AUC</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1">PWM</th>
<th rowspan="1" colspan="1">RF model</th>
<th rowspan="1" colspan="1">CRoSSeD</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">ROC AUC</td>
<td rowspan="1" colspan="1">0.535</td>
<td rowspan="1" colspan="1">0.774</td>
<td rowspan="1" colspan="1">0.573</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec>
<title>Features contained in the models</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Table S3</ext-link>
shows the features of the RF models. These features can reveal aspects of the DNA–TF binding mechanism. Even though the prokaryotic models do not perform that well in terms of classification, the selected features can tell us something about the binding mode of these TFs. All TFs have different models with different characteristics, representing their DNA-binding specificities. The structural characteristics are correlated to some extent (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Figure S1</ext-link>
), but we let the feature selection procedures and the RF algorithm decide which features are most relevant for each TF. It should be noted that for each TF both the structural model and the NPD model include features at positions that precede the actual TFBS. Moreover, each model contains one or more mean values as feature which implies that the global structural
<italic>in vivo</italic>
context of the TFBS is an important feature next to more local shape readout mechanisms at or close to the binding site location. This global shape readout might reflect the general part of higher order protein–DNA interactions that determine binding specificity and functionality: the tendency of a nucleosome to bind the region in which the TFBS is embedded (
<xref ref-type="bibr" rid="gks283-B6">6</xref>
). It might thus be considered part of a so-called ‘general binding preference’ that was demonstrated to be important for improved prediction of TFBSs (
<xref ref-type="bibr" rid="gks283-B57">57</xref>
). A visualization of the SP1 model (
<xref ref-type="fig" rid="gks283-F4">Figure 4</xref>
) clearly shows how the background genomic sequence in which SP1 binding sites are embedded is very similar to the consensus sequence of such sites. A PWM would thus predict many TFBSs, whereas the NPD model and structural model can look beyond position-independent nucleotide frequencies, each in its own way. In the next section, we will describe the most important features of each model, together with their biological relevance.
<fig id="gks283-F4" position="float">
<label>Figure 4.</label>
<caption>
<p>Visualization of our integrative model for SP1. Top: mononucleotide frequencies with the positions of the NPD model shown as shaded boxes. Bottom, average value of one of the structural characteristics contained in the structural model, namely conformational tendency restB; positions of the structural model are indicated by dotted-line boxes (
<italic>X</italic>
-axes indicate position relative to the aligned start of the SP1 binding sites).</p>
</caption>
<graphic xlink:href="gks283f4"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Biological relevance of the selected features</title>
<p>To assess the biological relevance of the selected features, we decided to do a principal component analysis (PCA) on the different TF models. For each model, we selected the top five principal components (
<xref ref-type="table" rid="gks283-T2">Table 2</xref>
), meaning the five most relevant features according to the PCA. We relate all of the selected features to what is known in the literature about structural protein–DNA complex formation. Unfortunately, for torsion-related features, we were unable to find explanations in the literature because this feature is not discussed in most protein–DNA reports. The PWM score is an important feature in most models and is considered the primary feature for direct readout. It should be noted that a strong deviation in the bending toward the major groove also means a deviation in the bending toward the minor groove. That is why we discuss these features as ‘bending toward the major/minor groove’. The same goes for the conformational tendency of the DNA. We were able to explain most of the top features of each model, but unable to provide an explanation for the selected features for ‘FlhDC’.
<table-wrap id="gks283-T2" position="float">
<label>Table 2.</label>
<caption>
<p>Results of the PCA analysis. For each TF model, we selected the five best features according to Weka PCA analysis</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">TF model</th>
<th rowspan="1" colspan="1">Feature</th>
<th rowspan="1" colspan="1">TF model</th>
<th rowspan="1" colspan="1">Feature</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<bold>AraC</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1">
<bold>HIF1</bold>
</td>
<td rowspan="1" colspan="1">uniformity_A_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">minor_groove_clash_size_fullseqmean</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p5=CG</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">minor_groove_clash_size_p18</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p19=G</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p6=GT</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p0=A</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p7=TG</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>ArcA</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1">
<bold>P53</bold>
</td>
<td rowspan="1" colspan="1">uniformity_A_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">groovewidth_unboundLiu_fullseqmean</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">homogeneity_BI_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">groovewidth_unboundLiu_p0</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">homogeneity_RESTB_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">groovewidth_unboundLiu_p1</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">groovewidth_unboundLiu_p-1</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">homogeneity_RESTB_p2</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>Fis</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1">
<bold>SP1</bold>
</td>
<td rowspan="1" colspan="1">homogeneity_RESTB_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMcorescore_general</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">uniformity_A_p-2</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">uniformity_AB_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">uniformity_A_fullseqmean</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p5=CC</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">uniformity_A_p-3</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p6=CC</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>IHF</bold>
</td>
<td rowspan="1" colspan="1">bend_toward_major_groove_fullseqmean</td>
<td rowspan="1" colspan="1">
<bold>STAT1</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_toward_minor_groove_fullseqmean</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p13=AA</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p5=TT</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_toward_major_groove_p-6</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p12=GA</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_toward_major_groove_p-7</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p7=TC</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>FlhDC</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1">
<bold>TBP</bold>
</td>
<td rowspan="1" colspan="1">bend_toward_major_groove_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p-3=C</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_toward_minor_groove_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p-20=G</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">homogeneity_BII_fullseqmean</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p-3=T</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_toward_minor_groove_p8</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">tors_1_nucleosome_p-7</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_toward_major_groove_p8</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>LexA</bold>
</td>
<td rowspan="1" colspan="1">minor_groove_clash_distance_p-8</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p-8=GC</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">minor_groove_clash_distance_p-7</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">minor_groove_clash_distance_p-9</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>PurR</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
<td rowspan="1" colspan="1">
<bold>Fis ChIP-chip</bold>
</td>
<td rowspan="1" colspan="1">PWMmatrixscore_general</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">PWMcorescore_general</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p14=C</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p-5=A</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">bend_towards_minor_groove_p6</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p-4=A</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">dint_p9=TT</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p1=T</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">monont_p0=G</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Although many prokaryotic classification models, in contrast to the eukaryotic models, did not result in any significant improvements over the simpler methods, the selected features and models can provide us with some valuable information about the binding mode of the protein. This information can be used to gain some insight even before any crystal structures are solved. In most prokaryotic models, the role of direct readout is very important. This is represented by the PWM score feature. This feature will not be discussed separately for every TF.</p>
<p>It is striking that for prokaryotic TFs the PWM score is the best feature in six out of eight models, whereas for eukaryotic TFs it is the best feature in only one out of five models. This can be explained by a recent systematic study on the differences between prokaryotic and eukaryotic TFBSs published by Wunderlich
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gks283-B62">62</xref>
), in which the authors calculated the average information content (IC) of both prokaryotic and eukaryotic TFBSs. They conclude that the average IC of a prokaryotic TFBS is 23 bits compared with 12.1 bits for eukaryotic TFBSs. This remarkable difference is mainly due to the shorter average length of the eukaryotic binding sites.</p>
<p>‘AraC’, a regulator of the araBAD operon in
<italic>E. coli</italic>
, binds as a dimer to the DNA (
<xref ref-type="bibr" rid="gks283-B63">63</xref>
). AraC proteins make all sequence-specific contacts in the major groove. Structural reports indicate that both monomers of the dimeric AraC proteins are separated by an AT-rich linker, resulting in an overall bend and a smaller overall minor groove clash size (
<xref ref-type="bibr" rid="gks283-B64">64</xref>
). This last feature is clearly reflected in the top five feature list of the AraC model.</p>
<p>In the ‘ArcA’ model, the groove width is a very important feature both as a positional feature and as a global mean feature. This is in agreement with the data on the OmpR/PhoB family of TFs, of which ArcA is a member (
<xref ref-type="bibr" rid="gks283-B65">65</xref>
,
<xref ref-type="bibr" rid="gks283-B66">66</xref>
). Just like clash size, width of both the major and the minor groove is an important feature in the winged helix–turn–helix (HTH) family of TFs. In this family of TFs, a helix is inserted in the major groove of the DNA, whereas the wings of the protein dimer are inserted in the minor groove (
<xref ref-type="bibr" rid="gks283-B65">65</xref>
).</p>
<p>‘Fis’ is known as one of the nucleoid-associated proteins (NAPs). Such proteins are responsible for the packing of the prokaryotic chromosome by bending and supercoiling of the DNA (
<xref ref-type="bibr" rid="gks283-B67">67</xref>
). For Fis, two models are available: one with a limited number of binding sites and one more trustworthy chIP-chip model, which we used as a quality control case. The smaller of the two models contains, among the direct readout features many features concerning the A/B-DNA tendency signifying the reported deviations from standard B-DNA (
<xref ref-type="bibr" rid="gks283-B68">68</xref>
). The top features of the chIP-chip model are a bit more diverse. Since Fis is one of the NAPs proteins, the appearance of the bending property in the list of PCA top features should come as no surprise. Other important features are both G/C mononucleotides on position 0 and +14. The presence of these features is very important because methylation of these positions on either strand is known to completely inhibit Fis binding (
<xref ref-type="bibr" rid="gks283-B67">67</xref>
). The location of these nucleotides is in agreement with the major groove contacts by Fis. The TT dinucleotide feature is also an important
<italic>in vivo</italic>
feature: it corresponds to the center of the AT-track that is responsible for the bending properties of the DNA in the binding site (
<xref ref-type="bibr" rid="gks283-B31">31</xref>
).</p>
<p>The top five components in the ‘IHF’ model consist mainly of features concerning DNA bending towards the major/minor groove. Since IHF is one of the most extreme DNA benders known, also called ‘the master bender’ (
<xref ref-type="bibr" rid="gks283-B69">69</xref>
,
<xref ref-type="bibr" rid="gks283-B70">70</xref>
), the inclusion and importance of the selected features should not be a surprise. This is also reflected in the RF model. The most important feature of this protein is the overall mean of the bend towards major/minor groove, making it one of the few prokaryotic models with a biophysical feature as a top feature, which is in agreement with the IHF’s title as master bender.</p>
<p>For ‘LexA’, the most noticeable features are the minor groove clash size features between −7 and −9 (the linker region between two LexA half sites). This is also reported in the literature, where an unusually narrow minor groove and important clash interactions are observed in the linker region between two LexA half sites in order to fit into the network of interactions between the two half sites (
<xref ref-type="bibr" rid="gks283-B71">71</xref>
). The selected GC dinucleotide feature is also of importance to the minor groove clash size: the occurrence of GC is disfavored because this dinucleotide has the largest minor groove clash size of all nucleotides. This is in agreement with earlier reports, which state that LexA has a preference for A/T-rich spacer regions (
<xref ref-type="bibr" rid="gks283-B71">71</xref>
,
<xref ref-type="bibr" rid="gks283-B72">72</xref>
).</p>
<p>In the model of the purine repressor (PurR), the top five features consist only of monomeric sequence features and PWM scores. This suggests that this model focuses on the direct readout of PurR binding.</p>
<p>For the ‘HIF1’ TF, three out of five top features are dinucleotide features. The dinucleotides together, one after the other, build the pattern 5′-CGTG-3′, known as the hypoxia-response element (HRE). This pattern is the most important determining factor of HIF1 binding and is fully conserved in every HIF1 binding site. These HREs are
<italic>cis</italic>
-regulatory DNA sequences for the specific binding to HIF1 and are necessary for transcription upon hypoxic conditions (
<xref ref-type="bibr" rid="gks283-B73 gks283-B74 gks283-B75">73–75</xref>
). The model was able to capture this sequence element very well.</p>
<p>For ‘P53’, the majority of important features concern the DNA conformation and the tendency to the A/B-DNA conformation. The DNA conformation is shown to be a very important determinant in the sequence-specific binding by P53. Although P53 binding sites are very degenerate, P53 can bind strongly to a wide range of binding sites. It has been suggested that a shift to a non-standard B-DNA conformation can drastically alter the binding capacity of P53 and that this conformational shift is responsible for the specific binding to the wide variety of P53 motifs (
<xref ref-type="bibr" rid="gks283-B76">76</xref>
).</p>
<p>‘SP1’ is known to unwind the DNA from 10.5 to 11.2 residues per turn, thereby greatly distorting the standard B-structure of the DNA toward a more A-DNA oriented structure and other deviant structures (
<xref ref-type="bibr" rid="gks283-B77">77</xref>
,
<xref ref-type="bibr" rid="gks283-B78">78</xref>
). Two out of five top features of the SP1 model confirm the importance of DNA conformational features in aiding the binding specificity of SP1 to the DNA, both of which are global features. The other top features are more sequence oriented. The two CC-dinucleotide features in the model are an indication of the cytosine enrichment in the canonical SP1 recognition element (CCCGCC). Furthermore, the importance of CC dinucleotides has been discussed by Zhu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gks283-B79">79</xref>
) who found that methylation of the central CG dinucleotide did not impair SP1 binding, but methylation of the first CC dinucleotides significantly decreased SP1 binding specificity. This important feature of the specific binding of SP1 was correctly included as one of the top features in the RF model.</p>
<p>‘STAT1’, like all other STATs, shows a very strong preference for sequences containing two palindromic half-sites (TTC…GAA), leading to a dyad symmetry, to which the STAT1 dimer can bind (
<xref ref-type="bibr" rid="gks283-B80">80</xref>
). The inclusion of the dinucleotide features for AA, TT, GA and TC, together TTTC…GAAA, is the most specific variant of all STAT1 binding motifs according to an analysis made by Ehret
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gks283-B81">81</xref>
).</p>
<p>‘TBP’ is one of the most well known DNA benders (
<xref ref-type="bibr" rid="gks283-B82">82</xref>
,
<xref ref-type="bibr" rid="gks283-B83">83</xref>
) and it was shown that the unbound TATA box is already pre-bent (
<xref ref-type="bibr" rid="gks283-B84">84</xref>
). The properties of introducing a kink in the DNA are also well reflected in the model. When looking at the top five features, four out of five top features contain properties about DNA bending, confirming the tendency of TBP to bend the DNA.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>DISCUSSION</title>
<p>It has been known for a few decades that the structure of DNA varies in a sequence-dependent manner (
<xref ref-type="bibr" rid="gks283-B4">4</xref>
,
<xref ref-type="bibr" rid="gks283-B5">5</xref>
). Some recent papers stressed the importance of sequence-dependent structural properties of DNA by showing that they are much less diverse than the nucleotide sequences, but at the same time they contain more information (
<xref ref-type="bibr" rid="gks283-B85">85</xref>
,
<xref ref-type="bibr" rid="gks283-B86">86</xref>
). That makes the structure space better suited than the nucleotide sequence space for seeking patterns (
<xref ref-type="bibr" rid="gks283-B86 gks283-B87 gks283-B88">86–88</xref>
). Several papers pointed specifically to the role of DNA shape in protein–DNA recognition (
<xref ref-type="bibr" rid="gks283-B46">46</xref>
,
<xref ref-type="bibr" rid="gks283-B86">86</xref>
,
<xref ref-type="bibr" rid="gks283-B89">89</xref>
,
<xref ref-type="bibr" rid="gks283-B90">90</xref>
). Rohs
<italic>et al.</italic>
(
<xref ref-type="bibr" rid="gks283-B6">6</xref>
) published a comprehensive review on this topic. In the past decade, only few proposed methods for TFBS identification explicitly took into account the nucleotide-sequence-dependent structural properties of DNA. However, many other methods implicitly capture some part of shape readout mechanisms of DNA-binding specificity when they model positional dependencies of nucleotides, and they tend to predict TFBSs more accurately than the widely used PWM.</p>
<p>For prokaryotes, the apparent lack of improvement for the more complex RF models can have several causes. The size of these data sets is relatively small, whereas complex models like the structural or NPD model might require bigger and better annotated data sets. The additional tests on the more qualitative Fis control set seem to confirm this hypothesis. A simpler method, like a PWM-based strategy, was developed for use with small data sets and apparently performs quite well on most prokaryotic data sets. An alternative, more biological explanation for the poor performance of our models on prokaryotes lies in the differences between prokaryotic and eukaryotic TFs. A systematic analysis of the differences in binding strategy between prokaryotic and eukaryotic binding sites revealed that prokaryotic binding sites tend to be longer and that they have more information content (
<xref ref-type="bibr" rid="gks283-B62">62</xref>
). In eukaryotes, the presence of the binding site alone is not enough and binding is often aided by signals in the flanking regions. Prokaryotes have few spurious binding sites, making the presence of one binding site alone a distinctive feature. This, in combination with the smaller and less qualitative set of binding sites, might lead to an overall decrease in performance of the more complex models and give the more simple PWM an advantage, as revealed by comparing the two Fis sets.</p>
<p>For eukaryotes, our results indicate that the inherent structural properties of DNA are involved in specific recognition by the TFs to an extent that depends on each TF, and that these properties can be used to refine predictions. Our results show that a purely structural model performs worse than a model capturing the positional dependencies of nucleotides most of the time. The latter type of model is represented in our comparison by our NPD model, which we believe models both base readout and a big portion of shape readout. The relative importance of the more simple NPD characteristic consequently cannot be ignored when analyzing TFBS binding patterns in the eukaryotic models. We demonstrate, however, that structural properties contain information other than the nucleotide sequence, and that the use of this information can be used to further improve classification accuracy. We demonstrate that the PWM score that merely represents base readout in its most simple form, is sometimes complementary to the model combining the structural model and NPD model. Most importantly, we present an integrative approach that can easily combine two or three different approaches to establish the best possible prediction of TFBSs.</p>
<p>Further improvements of our purely structural model might be achieved by using higher resolution descriptions of structural characteristics and incorporation of additional ones, such as those available in the database for dinucleotide properties (
<xref ref-type="bibr" rid="gks283-B91">91</xref>
). Additionally, input for sequence-based methods is currently gathered in a way that favors the performance of detection methods using nucleotide identities. Sequences containing TFBSs are aligned by methods focusing on nucleotide conservation only, such as existing PWMs or multiple EM (expectation maximization) for motif elicitation (MEME) (
<xref ref-type="bibr" rid="gks283-B92">92</xref>
). It could be worthwhile to improve the alignment correction in a way that it takes into account structural vectors. This might even lead to a further improvement for the structural models.</p>
<p>Shape readout is thought to fine-tune binding affinity rather than determine the binding event (
<xref ref-type="bibr" rid="gks283-B6">6</xref>
). In this respect, the structural part of the combinatorial model might prove itself more important for discerning binding sites of TFs from the same TF family, as they have very similar or identical base readout mechanisms. Our method could also be useful for detecting binding sites of miRNAs because structure plays a dominant role in the RNA–RNA interaction (
<xref ref-type="bibr" rid="gks283-B93">93</xref>
).</p>
<p>Despite high-throughput experimental approaches to identification of TFBSs, improved
<italic>in silico</italic>
prediction of TFBSs is of great value. It allows more accurate identification of potential
<italic>in vivo</italic>
TFBSs on rapidly sequenced genomes and enhances our understanding of the TF binding processes. Our integrative method seems to be a good candidate for this purpose.</p>
</sec>
<sec>
<title>SUPPLEMENTARY DATA</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gks283/DC1">Supplementary Data</ext-link>
are available at NAR Online: Supplementary Tables 1–3, Supplementary Data sets 1, 2 and Supplementary Figure 1.</p>
</sec>
<sec>
<title>FUNDING</title>
<p>Funding for open access charge:
<funding-source>Flanders Institute for Biotechnology (VIB)</funding-source>
;
<funding-source>Research Foundation Flanders (FWO)</funding-source>
;
<funding-source>Agency for Innovation through Science and Technology in Flanders (IWT)</funding-source>
[
<award-id>SB-091213</award-id>
to S.B.].</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_40_14_e106__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gks283_nar-01633-met-n-2011-File008.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gks283_nar-01633-met-n-2011-File009.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gks283_nar-01633-met-n-2011-File010.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gks283_nar-01633-met-n-2011-File012.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gks283_nar-01633-met-n-2011-File013.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gks283_nar-01633-met-n-2011-File011.doc"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>We thank Dr Amin Bredan for careful linguistic editing and the four anonymous referees for their constructive comments, which greatly helped improve upon the original version of the manuscript. We also thank the ICT Department of Ghent University for partial support of this work.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="gks283-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paillard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lavery</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Analyzing protein-DNA recognition mechanisms</article-title>
<source>Structure</source>
<year>2004</year>
<volume>12</volume>
<fpage>113</fpage>
<lpage>122</lpage>
<pub-id pub-id-type="pmid">14725771</pub-id>
</element-citation>
</ref>
<ref id="gks283-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaplan</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Margalit</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Ab initio prediction of transcription factor targets using structural knowledge</article-title>
<source>PLoS Comput. Biol.</source>
<year>2005</year>
<volume>1</volume>
<fpage>e1</fpage>
<pub-id pub-id-type="pmid">16103898</pub-id>
</element-citation>
</ref>
<ref id="gks283-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thayer</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Beveridge</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>Hidden Markov models from molecular dynamics simulations on DNA</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2002</year>
<volume>99</volume>
<fpage>8642</fpage>
<lpage>8647</lpage>
<pub-id pub-id-type="pmid">12072566</pub-id>
</element-citation>
</ref>
<ref id="gks283-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Calladine</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Drew</surname>
<given-names>HR</given-names>
</name>
</person-group>
<article-title>Principles of sequence-dependent flexure of DNA</article-title>
<source>J. Mol. Biol.</source>
<year>1986</year>
<volume>192</volume>
<fpage>907</fpage>
<lpage>918</lpage>
<pub-id pub-id-type="pmid">3586013</pub-id>
</element-citation>
</ref>
<ref id="gks283-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shakked</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Rabinovich</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>The effect of the base sequence on the fine structure of the DNA double helix</article-title>
<source>Prog. Biophys. Mol. Biol.</source>
<year>1986</year>
<volume>47</volume>
<fpage>159</fpage>
<lpage>195</lpage>
<pub-id pub-id-type="pmid">3544051</pub-id>
</element-citation>
</ref>
<ref id="gks283-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>X</given-names>
</name>
<name>
<surname>West</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Honig</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Mann</surname>
<given-names>RS</given-names>
</name>
</person-group>
<article-title>Origins of specificity in protein-DNA recognition</article-title>
<source>Annu. Rev. Biochem.</source>
<year>2010</year>
<volume>79</volume>
<fpage>233</fpage>
<lpage>269</lpage>
<pub-id pub-id-type="pmid">20334529</pub-id>
</element-citation>
</ref>
<ref id="gks283-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Angarica</surname>
<given-names>VE</given-names>
</name>
<name>
<surname>Perez</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Vasconcelos</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Collado-Vides</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Contreras-Moreira</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Prediction of TF target sites based on atomistic models of protein-DNA complexes</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>436</fpage>
<pub-id pub-id-type="pmid">18922190</pub-id>
</element-citation>
</ref>
<ref id="gks283-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>DNA binding sites: representation and discovery</article-title>
<source>Bioinformatics</source>
<year>2000</year>
<volume>16</volume>
<fpage>16</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">10812473</pub-id>
</element-citation>
</ref>
<ref id="gks283-B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Man</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay</article-title>
<source>Nucleic Acids Res.</source>
<year>2001</year>
<volume>29</volume>
<fpage>2471</fpage>
<lpage>2478</lpage>
<pub-id pub-id-type="pmid">11410653</pub-id>
</element-citation>
</ref>
<ref id="gks283-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>PL</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
</person-group>
<article-title>Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors</article-title>
<source>Nucleic Acids Res.</source>
<year>2002</year>
<volume>30</volume>
<fpage>1255</fpage>
<lpage>1261</lpage>
<pub-id pub-id-type="pmid">11861919</pub-id>
</element-citation>
</ref>
<ref id="gks283-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Quantitative analysis of EGR proteins binding to DNA: assessing additivity in both the binding site and the protein</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>176</fpage>
<pub-id pub-id-type="pmid">16014175</pub-id>
</element-citation>
</ref>
<ref id="gks283-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>1850</fpage>
<lpage>1857</lpage>
<pub-id pub-id-type="pmid">18586699</pub-id>
</element-citation>
</ref>
<ref id="gks283-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benos</surname>
<given-names>PV</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Additivity in protein-DNA interactions: how good an approximation is it?</article-title>
<source>Nucleic Acids Res.</source>
<year>2002</year>
<volume>30</volume>
<fpage>4442</fpage>
<lpage>4451</lpage>
<pub-id pub-id-type="pmid">12384591</pub-id>
</element-citation>
</ref>
<ref id="gks283-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>O'Flanagan</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Paillard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lavery</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sengupta</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Non-additivity in protein-DNA binding</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>2254</fpage>
<lpage>2263</lpage>
<pub-id pub-id-type="pmid">15746285</pub-id>
</element-citation>
</ref>
<ref id="gks283-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tomovic</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Oakeley</surname>
<given-names>EJ</given-names>
</name>
</person-group>
<article-title>Position dependencies in transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>933</fpage>
<lpage>941</lpage>
<pub-id pub-id-type="pmid">17308339</pub-id>
</element-citation>
</ref>
<ref id="gks283-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Chinnaiyan</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>ZS</given-names>
</name>
</person-group>
<article-title>On the detection and refinement of transcription factor binding sites using ChIP-Seq data</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>2154</fpage>
<lpage>2167</lpage>
<pub-id pub-id-type="pmid">20056654</pub-id>
</element-citation>
</ref>
<ref id="gks283-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gershenzon</surname>
<given-names>NI</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Ioshikhes</surname>
<given-names>IP</given-names>
</name>
</person-group>
<article-title>Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites</article-title>
<source>Nucleic Acids Res.</source>
<year>2005</year>
<volume>33</volume>
<fpage>2290</fpage>
<lpage>2301</lpage>
<pub-id pub-id-type="pmid">15849315</pub-id>
</element-citation>
</ref>
<ref id="gks283-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marinescu</surname>
<given-names>VD</given-names>
</name>
<name>
<surname>Kohane</surname>
<given-names>IS</given-names>
</name>
<name>
<surname>Riva</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>79</fpage>
<pub-id pub-id-type="pmid">15799782</pub-id>
</element-citation>
</ref>
<ref id="gks283-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Naughton</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Fratkin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Brutlag</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<fpage>5730</fpage>
<lpage>5739</lpage>
<pub-id pub-id-type="pmid">17041233</pub-id>
</element-citation>
</ref>
<ref id="gks283-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharon</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lubliner</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Segal</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>A feature-based approach to modeling protein-DNA interactions</article-title>
<source>PLoS Comput. Biol.</source>
<year>2008</year>
<volume>4</volume>
<fpage>e1000154</fpage>
<pub-id pub-id-type="pmid">18725950</pub-id>
</element-citation>
</ref>
<ref id="gks283-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karas</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Knuppel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sklenar</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Wingender</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements</article-title>
<source>Comput Appl. Biosci.</source>
<year>1996</year>
<volume>12</volume>
<fpage>441</fpage>
<lpage>446</lpage>
<pub-id pub-id-type="pmid">8996793</pub-id>
</element-citation>
</ref>
<ref id="gks283-B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ponomarenko</surname>
<given-names>JV</given-names>
</name>
<name>
<surname>Ponomarenko</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Frolov</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Vorobyev</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Overton</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Kolchanov</surname>
<given-names>NA</given-names>
</name>
</person-group>
<article-title>Conformational and physicochemical DNA features specific for transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>1999</year>
<volume>15</volume>
<fpage>654</fpage>
<lpage>668</lpage>
<pub-id pub-id-type="pmid">10487873</pub-id>
</element-citation>
</ref>
<ref id="gks283-B23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Blackwell</surname>
<given-names>TW</given-names>
</name>
<name>
<surname>States</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Conformational model for binding site recognition by the
<italic>E.coli</italic>
MetJ transcription factor</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>622</fpage>
<lpage>633</lpage>
<pub-id pub-id-type="pmid">11448880</pub-id>
</element-citation>
</ref>
<ref id="gks283-B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burden</surname>
<given-names>HE</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Identification of conserved structural features at sequentially degenerate locations in transcription factor binding sites</article-title>
<source>Genome Inform.</source>
<year>2005</year>
<volume>16</volume>
<fpage>49</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">16362906</pub-id>
</element-citation>
</ref>
<ref id="gks283-B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pudimat</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Schukat-Talamazzini</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Backofen</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>A multiple-feature framework for modelling and predicting transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>3082</fpage>
<lpage>3088</lpage>
<pub-id pub-id-type="pmid">15905283</pub-id>
</element-citation>
</ref>
<ref id="gks283-B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gunewardena</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jeavons</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Enhancing the prediction of transcription factor binding sites by incorporating structural properties and nucleotide covariations</article-title>
<source>J. Comput. Biol.</source>
<year>2006</year>
<volume>13</volume>
<fpage>929</fpage>
<lpage>945</lpage>
<pub-id pub-id-type="pmid">16761919</pub-id>
</element-citation>
</ref>
<ref id="gks283-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bauer</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Hlavacek</surname>
<given-names>WS</given-names>
</name>
<name>
<surname>Unkefer</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Mu</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites</article-title>
<source>PLoS Comput. Biol.</source>
<year>2010</year>
<volume>6</volume>
<fpage>e1001007</fpage>
<pub-id pub-id-type="pmid">21124945</pub-id>
</element-citation>
</ref>
<ref id="gks283-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meysman</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dang</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Laukens</surname>
<given-names>K</given-names>
</name>
<name>
<surname>De Smet</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Marchal</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Engelen</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Use of structural DNA properties for the prediction of transcription-factor binding sites in
<italic>Escherichia coli</italic>
</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>e6</fpage>
<pub-id pub-id-type="pmid">21051340</pub-id>
</element-citation>
</ref>
<ref id="gks283-B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Morozov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Siggia</surname>
<given-names>ED</given-names>
</name>
</person-group>
<article-title>Connecting protein structure with predictions of regulatory sites</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2007</year>
<volume>104</volume>
<fpage>7068</fpage>
<lpage>7073</lpage>
<pub-id pub-id-type="pmid">17438293</pub-id>
</element-citation>
</ref>
<ref id="gks283-B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fulton</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Sundararajan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Wasserman</surname>
<given-names>WW</given-names>
</name>
<name>
<surname>Roach</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Sladek</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>TFCat: the curated catalog of mouse and human transcription factors</article-title>
<source>Genome Biol.</source>
<year>2009</year>
<volume>10</volume>
<fpage>R29</fpage>
<pub-id pub-id-type="pmid">19284633</pub-id>
</element-citation>
</ref>
<ref id="gks283-B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cho</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Barrett</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Palsson</surname>
<given-names>BO</given-names>
</name>
</person-group>
<article-title>Genome-wide analysis of Fis binding in
<italic>Escherichia coli</italic>
indicates a causative role for A-/AT-tracts</article-title>
<source>Genome Res.</source>
<year>2008</year>
<volume>18</volume>
<fpage>900</fpage>
<lpage>910</lpage>
<pub-id pub-id-type="pmid">18340041</pub-id>
</element-citation>
</ref>
<ref id="gks283-B32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Portales-Casamar</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kirov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lithwick</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Swanson</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Ticoll</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Snoddy</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wasserman</surname>
<given-names>WW</given-names>
</name>
</person-group>
<article-title>PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation</article-title>
<source>Genome Biol.</source>
<year>2007</year>
<volume>8</volume>
<fpage>R207</fpage>
<pub-id pub-id-type="pmid">17916232</pub-id>
</element-citation>
</ref>
<ref id="gks283-B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matys</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Fricke</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Geffers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gossling</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Haubrock</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hehl</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hornischer</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Karas</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kel</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Kel-Margoulis</surname>
<given-names>OV</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TRANSFAC: transcriptional regulation, from patterns to profiles</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>374</fpage>
<lpage>378</lpage>
<pub-id pub-id-type="pmid">12520026</pub-id>
</element-citation>
</ref>
<ref id="gks283-B34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gowrisankar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jegga</surname>
<given-names>AG</given-names>
</name>
</person-group>
<article-title>Regression based predictor for p53 transactivation</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>215</fpage>
<pub-id pub-id-type="pmid">19602281</pub-id>
</element-citation>
</ref>
<ref id="gks283-B35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kel</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Gossling</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Reuter</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Cheremushkin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kel-Margoulis</surname>
<given-names>OV</given-names>
</name>
<name>
<surname>Wingender</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>MATCH: A tool for searching transcription factor binding sites in DNA sequences</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>3576</fpage>
<lpage>3579</lpage>
<pub-id pub-id-type="pmid">12824369</pub-id>
</element-citation>
</ref>
<ref id="gks283-B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Olson</surname>
<given-names>WK</given-names>
</name>
<name>
<surname>Gorin</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>XJ</given-names>
</name>
<name>
<surname>Hock</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Zhurkin</surname>
<given-names>VB</given-names>
</name>
</person-group>
<article-title>DNA sequence-dependent deformability deduced from protein-DNA crystal complexes</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>1998</year>
<volume>95</volume>
<fpage>11163</fpage>
<lpage>11168</lpage>
<pub-id pub-id-type="pmid">9736707</pub-id>
</element-citation>
</ref>
<ref id="gks283-B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Satchwell</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Drew</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Travers</surname>
<given-names>AA</given-names>
</name>
</person-group>
<article-title>Sequence periodicities in chicken nucleosome core DNA</article-title>
<source>J. Mol. Biol.</source>
<year>1986</year>
<volume>191</volume>
<fpage>659</fpage>
<lpage>675</lpage>
<pub-id pub-id-type="pmid">3806678</pub-id>
</element-citation>
</ref>
<ref id="gks283-B38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goodsell</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Dickerson</surname>
<given-names>RE</given-names>
</name>
</person-group>
<article-title>Bending and curvature calculations in B-DNA</article-title>
<source>Nucleic Acids Res.</source>
<year>1994</year>
<volume>22</volume>
<fpage>5497</fpage>
<lpage>5503</lpage>
<pub-id pub-id-type="pmid">7816643</pub-id>
</element-citation>
</ref>
<ref id="gks283-B39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>XJ</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>WK</given-names>
</name>
</person-group>
<article-title>3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures</article-title>
<source>Nat. Protoc.</source>
<year>2008</year>
<volume>3</volume>
<fpage>1213</fpage>
<lpage>1227</lpage>
<pub-id pub-id-type="pmid">18600227</pub-id>
</element-citation>
</ref>
<ref id="gks283-B40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fujii</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kono</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Takenaka</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Go</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sarai</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Sequence-dependent DNA deformability studied using molecular dynamics simulations</article-title>
<source>Nucleic Acids Res.</source>
<year>2007</year>
<volume>35</volume>
<fpage>6063</fpage>
<lpage>6074</lpage>
<pub-id pub-id-type="pmid">17766249</pub-id>
</element-citation>
</ref>
<ref id="gks283-B41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lavery</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zakrzewska</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Beveridge</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bishop</surname>
<given-names>TC</given-names>
</name>
<name>
<surname>Case</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Cheatham</surname>
<given-names>T</given-names>
<suffix>3rd</suffix>
</name>
<name>
<surname>Dixit</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jayaram</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lankas</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Laughton</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA</article-title>
<source>Nucleic Acids Res.</source>
<year>2009</year>
<volume>38</volume>
<fpage>299</fpage>
<lpage>313</lpage>
<pub-id pub-id-type="pmid">19850719</pub-id>
</element-citation>
</ref>
<ref id="gks283-B42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gartenberg</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Crothers</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>DNA sequence determinants of CAP-induced bending and protein binding affinity</article-title>
<source>Nature</source>
<year>1988</year>
<volume>333</volume>
<fpage>824</fpage>
<lpage>829</lpage>
<pub-id pub-id-type="pmid">2838756</pub-id>
</element-citation>
</ref>
<ref id="gks283-B43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parvin</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>McCormick</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Sharp</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Fisher</surname>
<given-names>DE</given-names>
</name>
</person-group>
<article-title>Pre-bending of a promoter sequence enhances affinity for the TATA-binding factor</article-title>
<source>Nature</source>
<year>1995</year>
<volume>373</volume>
<fpage>724</fpage>
<lpage>727</lpage>
<pub-id pub-id-type="pmid">7854460</pub-id>
</element-citation>
</ref>
<ref id="gks283-B44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dickerson</surname>
<given-names>RE</given-names>
</name>
</person-group>
<article-title>DNA bending: the prevalence of kinkiness and the virtues of normality</article-title>
<source>Nucleic Acids Res.</source>
<year>1998</year>
<volume>26</volume>
<fpage>1906</fpage>
<lpage>1926</lpage>
<pub-id pub-id-type="pmid">9518483</pub-id>
</element-citation>
</ref>
<ref id="gks283-B45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gorin</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Zhurkin</surname>
<given-names>VB</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>WK</given-names>
</name>
</person-group>
<article-title>B-DNA twisting correlates with base-pair morphology</article-title>
<source>J. Mol. Biol.</source>
<year>1995</year>
<volume>247</volume>
<fpage>34</fpage>
<lpage>48</lpage>
<pub-id pub-id-type="pmid">7897660</pub-id>
</element-citation>
</ref>
<ref id="gks283-B46">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
<name>
<surname>West</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Sosinsky</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mann</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Honig</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>The role of DNA shape in protein-DNA recognition</article-title>
<source>Nature</source>
<year>2009</year>
<volume>461</volume>
<fpage>1248</fpage>
<lpage>1253</lpage>
<pub-id pub-id-type="pmid">19865164</pub-id>
</element-citation>
</ref>
<ref id="gks283-B47">
<label>47</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Svozil</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kalina</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Omelka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>DNA conformations and their sequence preferences</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>3690</fpage>
<lpage>3706</lpage>
<pub-id pub-id-type="pmid">18477633</pub-id>
</element-citation>
</ref>
<ref id="gks283-B48">
<label>48</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spolar</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Record</surname>
<given-names>MT</given-names>
<suffix>Jr</suffix>
</name>
</person-group>
<article-title>Coupling of local folding to site-specific binding of proteins to DNA</article-title>
<source>Science</source>
<year>1994</year>
<volume>263</volume>
<fpage>777</fpage>
<lpage>784</lpage>
<pub-id pub-id-type="pmid">8303294</pub-id>
</element-citation>
</ref>
<ref id="gks283-B49">
<label>49</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>XJ</given-names>
</name>
<name>
<surname>Shakked</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>WK</given-names>
</name>
</person-group>
<article-title>A-form conformational motifs in ligand-bound DNA structures</article-title>
<source>J. Mol. Biol.</source>
<year>2000</year>
<volume>300</volume>
<fpage>819</fpage>
<lpage>840</lpage>
<pub-id pub-id-type="pmid">10891271</pub-id>
</element-citation>
</ref>
<ref id="gks283-B50">
<label>50</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Random forests</article-title>
<source>Machine Learning</source>
<year>2001</year>
<volume>45</volume>
<fpage>28</fpage>
</element-citation>
</ref>
<ref id="gks283-B51">
<label>51</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lunetta</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Hayward</surname>
<given-names>LB</given-names>
</name>
<name>
<surname>Segal</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Van Eerdewegh</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Screening large-scale association study data: exploiting interactions using random forests</article-title>
<source>BMC Genet.</source>
<year>2004</year>
<volume>5</volume>
<fpage>32</fpage>
<pub-id pub-id-type="pmid">15588316</pub-id>
</element-citation>
</ref>
<ref id="gks283-B52">
<label>52</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cordell</surname>
<given-names>HJ</given-names>
</name>
</person-group>
<article-title>Detecting gene-gene interactions that underlie human diseases</article-title>
<source>Nat. Rev. Genet.</source>
<year>2009</year>
<volume>10</volume>
<fpage>392</fpage>
<lpage>404</lpage>
<pub-id pub-id-type="pmid">19434077</pub-id>
</element-citation>
</ref>
<ref id="gks283-B53">
<label>53</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruiz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jos</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Aguilar-Ruiz</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>Incremental wrapper-based gene selection from microarray data for cancer classification</article-title>
<source>Pattern Recogn.</source>
<year>2006</year>
<volume>39</volume>
<fpage>2383</fpage>
<lpage>2392</lpage>
</element-citation>
</ref>
<ref id="gks283-B54">
<label>54</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hall</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Frank</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pfahringer</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Reutemann</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Witten</surname>
<given-names>IH</given-names>
</name>
</person-group>
<article-title>The WEKA data mining software</article-title>
<source>ACM SIGKDD Explorations Newsletter</source>
<year>2009</year>
<volume>11</volume>
<fpage>10</fpage>
</element-citation>
</ref>
<ref id="gks283-B55">
<label>55</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Medina-Rivera</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Abreu-Goodger</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Thomas-Chollier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Salgado</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Collado-Vides</surname>
<given-names>J</given-names>
</name>
<name>
<surname>van Helden</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Theoretical and empirical quality assessment of transcription factor-binding motifs</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>808</fpage>
<lpage>824</lpage>
<pub-id pub-id-type="pmid">20923783</pub-id>
</element-citation>
</ref>
<ref id="gks283-B56">
<label>56</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements</article-title>
<source>J. Biol.</source>
<year>2003</year>
<volume>2</volume>
<fpage>11</fpage>
<pub-id pub-id-type="pmid">12814519</pub-id>
</element-citation>
</ref>
<ref id="gks283-B57">
<label>57</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ernst</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Plasterer</surname>
<given-names>HL</given-names>
</name>
<name>
<surname>Simon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Bar-Joseph</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Integrating multiple evidence sources to predict transcription factor binding in the human genome</article-title>
<source>Genome Res.</source>
<year>2010</year>
<volume>20</volume>
<fpage>526</fpage>
<lpage>536</lpage>
<pub-id pub-id-type="pmid">20219943</pub-id>
</element-citation>
</ref>
<ref id="gks283-B58">
<label>58</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Narang</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sung</surname>
<given-names>WK</given-names>
</name>
</person-group>
<article-title>Localized motif discovery in gene regulatory sequences</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<fpage>1152</fpage>
<lpage>1159</lpage>
<pub-id pub-id-type="pmid">20223835</pub-id>
</element-citation>
</ref>
<ref id="gks283-B59">
<label>59</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramsey</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Knijnenburg</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Kennedy</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Zak</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Gilchrist</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gold</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Lampano</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Litvak</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Navarro</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<fpage>2071</fpage>
<lpage>2075</lpage>
<pub-id pub-id-type="pmid">20663846</pub-id>
</element-citation>
</ref>
<ref id="gks283-B60">
<label>60</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gama-Castro</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jimenez-Jacinto</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Peralta-Gil</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Santos-Zavaleta</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Penaloza-Spinola</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Contreras-Moreira</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Segura-Salazar</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Muniz-Rascado</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Martinez-Flores</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Salgado</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<article-title>RegulonDB (version 6.0): gene regulation model of
<italic>Escherichia coli</italic>
K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>D120</fpage>
<lpage>D124</lpage>
<pub-id pub-id-type="pmid">18158297</pub-id>
</element-citation>
</ref>
<ref id="gks283-B61">
<label>61</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mokry</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hatzis</surname>
<given-names>P</given-names>
</name>
<name>
<surname>de Bruijn</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Koster</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Versteeg</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Schuijers</surname>
<given-names>J</given-names>
</name>
<name>
<surname>van de Wetering</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Guryev</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Clevers</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cuppen</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Efficient double fragmentation ChIP-seq provides nucleotide resolution protein-DNA binding profiles</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<fpage>e15092</fpage>
<pub-id pub-id-type="pmid">21152096</pub-id>
</element-citation>
</ref>
<ref id="gks283-B62">
<label>62</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wunderlich</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Mirny</surname>
<given-names>LA</given-names>
</name>
</person-group>
<article-title>Different gene regulation strategies revealed by analysis of binding motifs</article-title>
<source>Trends Genet.</source>
<year>2009</year>
<volume>25</volume>
<fpage>434</fpage>
<lpage>440</lpage>
<pub-id pub-id-type="pmid">19815308</pub-id>
</element-citation>
</ref>
<ref id="gks283-B63">
<label>63</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hendrickson</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Schleif</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>A dimer of AraC protein contacts three adjacent major groove regions of the araI DNA site</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>1985</year>
<volume>82</volume>
<fpage>3129</fpage>
<lpage>3133</lpage>
<pub-id pub-id-type="pmid">3858809</pub-id>
</element-citation>
</ref>
<ref id="gks283-B64">
<label>64</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Flaherty</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hendrickson</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Arac protein contacts asymmetric sites in the
<italic>Escherichia-coli</italic>
Arafgh promoter</article-title>
<source>J. Biol. Chem.</source>
<year>1992</year>
<volume>267</volume>
<fpage>24848</fpage>
<lpage>24857</lpage>
<pub-id pub-id-type="pmid">1447222</pub-id>
</element-citation>
</ref>
<ref id="gks283-B65">
<label>65</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martinez-Hackert</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stock</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Structural relationships in the OmpR family of winged-helix transcription factors</article-title>
<source>J. Mol. Biol.</source>
<year>1997</year>
<volume>269</volume>
<fpage>301</fpage>
<lpage>312</lpage>
<pub-id pub-id-type="pmid">9199401</pub-id>
</element-citation>
</ref>
<ref id="gks283-B66">
<label>66</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Toro-Roman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mack</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Stock</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Structural analysis and solution studies of the activated regulatory domain of the response regulator ArcA: a symmetric dimer mediated by the alpha4-beta5-alpha5 face</article-title>
<source>J. Mol. Biol.</source>
<year>2005</year>
<volume>349</volume>
<fpage>11</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="pmid">15876365</pub-id>
</element-citation>
</ref>
<ref id="gks283-B67">
<label>67</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>CQ</given-names>
</name>
<name>
<surname>Finkel</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Cramton</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Sigman</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>Variable structures of Fis-DNA complexes determined by flanking DNA-protein contacts</article-title>
<source>J. Mol. Biol.</source>
<year>1996</year>
<volume>264</volume>
<fpage>675</fpage>
<lpage>695</lpage>
<pub-id pub-id-type="pmid">8980678</pub-id>
</element-citation>
</ref>
<ref id="gks283-B68">
<label>68</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afflerbach</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Schroder</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Conformational changes of the upstream DNA mediated by H-NS and FIS regulate
<italic>E. coli</italic>
RrnB P1 promoter activity</article-title>
<source>J. Mol. Biol.</source>
<year>1999</year>
<volume>286</volume>
<fpage>339</fpage>
<lpage>353</lpage>
<pub-id pub-id-type="pmid">9973555</pub-id>
</element-citation>
</ref>
<ref id="gks283-B69">
<label>69</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Travers</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>DNA-protein interactions: IHF–the master bender</article-title>
<source>Curr. Biol.</source>
<year>1997</year>
<volume>7</volume>
<fpage>R252</fpage>
<lpage>R254</lpage>
<pub-id pub-id-type="pmid">9162504</pub-id>
</element-citation>
</ref>
<ref id="gks283-B70">
<label>70</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schneider</surname>
<given-names>TD</given-names>
</name>
</person-group>
<article-title>Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation</article-title>
<source>Nucleic Acids Res.</source>
<year>2001</year>
<volume>29</volume>
<fpage>4881</fpage>
<lpage>4891</lpage>
<pub-id pub-id-type="pmid">11726698</pub-id>
</element-citation>
</ref>
<ref id="gks283-B71">
<label>71</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Pigli</surname>
<given-names>YZ</given-names>
</name>
<name>
<surname>Rice</surname>
<given-names>PA</given-names>
</name>
</person-group>
<article-title>Structure of the LexA-DNA complex and implications for SOS box measurement</article-title>
<source>Nature</source>
<year>2010</year>
<volume>466</volume>
<fpage>883</fpage>
<lpage>886</lpage>
<pub-id pub-id-type="pmid">20703307</pub-id>
</element-citation>
</ref>
<ref id="gks283-B72">
<label>72</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>LK</given-names>
</name>
<name>
<surname>Harlow</surname>
<given-names>GR</given-names>
</name>
<name>
<surname>Gregg-Jolly</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Mount</surname>
<given-names>DW</given-names>
</name>
</person-group>
<article-title>Identification of high affinity binding sites for LexA which define new DNA damage-inducible genes in
<italic>Escherichia coli</italic>
</article-title>
<source>J. Mol. Biol.</source>
<year>1994</year>
<volume>241</volume>
<fpage>507</fpage>
<lpage>523</lpage>
<pub-id pub-id-type="pmid">8057377</pub-id>
</element-citation>
</ref>
<ref id="gks283-B73">
<label>73</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kajimura</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Aida</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Duan</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Understanding hypoxia-induced gene expression in early development: in vitro and in vivo analysis of hypoxia-inducible factor 1-regulated zebra fish insulin-like growth factor binding protein 1 gene expression</article-title>
<source>Mol. Cell Biol.</source>
<year>2006</year>
<volume>26</volume>
<fpage>1142</fpage>
<lpage>1155</lpage>
<pub-id pub-id-type="pmid">16428465</pub-id>
</element-citation>
</ref>
<ref id="gks283-B74">
<label>74</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Michel</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Minet</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ernest</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Roland</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Durant</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Remacle</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Michiels</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>A model for the complex between the hypoxia-inducible factor-1 (HIF-1) and its consensus DNA sequence</article-title>
<source>J. Biomol. Struct. Dyn.</source>
<year>2000</year>
<volume>18</volume>
<fpage>169</fpage>
<lpage>179</lpage>
<pub-id pub-id-type="pmid">11089639</pub-id>
</element-citation>
</ref>
<ref id="gks283-B75">
<label>75</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camenisch</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Stroka</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Gassmann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wenger</surname>
<given-names>RH</given-names>
</name>
</person-group>
<article-title>Attenuation of HIF-1 DNA-binding activity limits hypoxia-inducible endothelin-1 expression</article-title>
<source>Pflugers Arch.</source>
<year>2001</year>
<volume>443</volume>
<fpage>240</fpage>
<lpage>249</lpage>
<pub-id pub-id-type="pmid">11713650</pub-id>
</element-citation>
</ref>
<ref id="gks283-B76">
<label>76</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Albrechtsen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Deppert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>DNA-conformation is an important determinant of sequence-specific DNA binding by tumor suppressor p53</article-title>
<source>Oncogene</source>
<year>1997</year>
<volume>15</volume>
<fpage>857</fpage>
<lpage>869</lpage>
<pub-id pub-id-type="pmid">9266973</pub-id>
</element-citation>
</ref>
<ref id="gks283-B77">
<label>77</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>YG</given-names>
</name>
<name>
<surname>Berg</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>DNA unwinding induced by zinc finger protein binding</article-title>
<source>Biochemistry</source>
<year>1996</year>
<volume>35</volume>
<fpage>3845</fpage>
<lpage>3848</lpage>
<pub-id pub-id-type="pmid">8620008</pub-id>
</element-citation>
</ref>
<ref id="gks283-B78">
<label>78</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marco</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Garcia-Nieto</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gago</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Assessment by molecular dynamics simulations of the structural determinants of DNA-binding specificity for transcription factor Sp1</article-title>
<source>J. Mol. Biol.</source>
<year>2003</year>
<volume>328</volume>
<fpage>9</fpage>
<lpage>32</lpage>
<pub-id pub-id-type="pmid">12683994</pub-id>
</element-citation>
</ref>
<ref id="gks283-B79">
<label>79</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>WG</given-names>
</name>
<name>
<surname>Srinivasan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>ZY</given-names>
</name>
<name>
<surname>Duan</surname>
<given-names>WR</given-names>
</name>
<name>
<surname>Druhan</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Yee</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Villalona-Calero</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Plass</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Otterson</surname>
<given-names>GA</given-names>
</name>
</person-group>
<article-title>Methylation of adjacent CpG sites affects Sp1/Sp3 binding and activity in the p21(Cip1) promoter</article-title>
<source>Mol. Cell. Biol.</source>
<year>2003</year>
<volume>23</volume>
<fpage>4056</fpage>
<lpage>4065</lpage>
<pub-id pub-id-type="pmid">12773551</pub-id>
</element-citation>
</ref>
<ref id="gks283-B80">
<label>80</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>XM</given-names>
</name>
<name>
<surname>Vinkemeier</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>YX</given-names>
</name>
<name>
<surname>Jeruzalmi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Darnell</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Kuriyan</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Crystal structure of a tyrosine phosphorylated STAT-1 dimer bound to DNA</article-title>
<source>Cell</source>
<year>1998</year>
<volume>93</volume>
<fpage>827</fpage>
<lpage>839</lpage>
<pub-id pub-id-type="pmid">9630226</pub-id>
</element-citation>
</ref>
<ref id="gks283-B81">
<label>81</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ehret</surname>
<given-names>GB</given-names>
</name>
<name>
<surname>Reichenbach</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schindler</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Horvath</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Fritz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nabholz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bucher</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>DNA binding specificity of different STAT proteins - Comparison of in vitro specificity with natural target sites</article-title>
<source>J. Biol. Chem.</source>
<year>2001</year>
<volume>276</volume>
<fpage>6675</fpage>
<lpage>6688</lpage>
<pub-id pub-id-type="pmid">11053426</pub-id>
</element-citation>
</ref>
<ref id="gks283-B82">
<label>82</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Powell</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Parkhurst</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Parkhurst</surname>
<given-names>LJ</given-names>
</name>
</person-group>
<article-title>Comparison of TATA-binding protein recognition of a variant and consensus DNA promoters</article-title>
<source>J. Biol. Chem.</source>
<year>2002</year>
<volume>277</volume>
<fpage>7776</fpage>
<lpage>7784</lpage>
<pub-id pub-id-type="pmid">11726667</pub-id>
</element-citation>
</ref>
<ref id="gks283-B83">
<label>83</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Juo</surname>
<given-names>ZS</given-names>
</name>
<name>
<surname>Chiu</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Leiberman</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Baikalov</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Berk</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Dickerson</surname>
<given-names>RE</given-names>
</name>
</person-group>
<article-title>How proteins recognize the TATA box</article-title>
<source>J. Mol. Biol.</source>
<year>1996</year>
<volume>261</volume>
<fpage>239</fpage>
<lpage>254</lpage>
<pub-id pub-id-type="pmid">8757291</pub-id>
</element-citation>
</ref>
<ref id="gks283-B84">
<label>84</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davis</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Majee</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Kahn</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>TATA box DNA deformation with and without the TATA box-binding protein</article-title>
<source>J. Mol. Biol.</source>
<year>1999</year>
<volume>291</volume>
<fpage>249</fpage>
<lpage>265</lpage>
<pub-id pub-id-type="pmid">10438619</pub-id>
</element-citation>
</ref>
<ref id="gks283-B85">
<label>85</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gardiner</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Hunter</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>XJ</given-names>
</name>
<name>
<surname>Willett</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>A structural similarity analysis of double-helical DNA</article-title>
<source>J. Mol. Biol.</source>
<year>2004</year>
<volume>343</volume>
<fpage>879</fpage>
<lpage>889</lpage>
<pub-id pub-id-type="pmid">15476807</pub-id>
</element-citation>
</ref>
<ref id="gks283-B86">
<label>86</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parker</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Hansen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Abaan</surname>
<given-names>HO</given-names>
</name>
<name>
<surname>Tullius</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Margulies</surname>
<given-names>EH</given-names>
</name>
</person-group>
<article-title>Local DNA topography correlates with functional noncoding regions of the human genome</article-title>
<source>Science</source>
<year>2009</year>
<volume>324</volume>
<fpage>389</fpage>
<lpage>392</lpage>
<pub-id pub-id-type="pmid">19286520</pub-id>
</element-citation>
</ref>
<ref id="gks283-B87">
<label>87</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Greenbaum</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Pang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Tullius</surname>
<given-names>TD</given-names>
</name>
</person-group>
<article-title>Construction of a genome-scale structural map at single-nucleotide resolution</article-title>
<source>Genome Res.</source>
<year>2007</year>
<volume>17</volume>
<fpage>947</fpage>
<lpage>953</lpage>
<pub-id pub-id-type="pmid">17568010</pub-id>
</element-citation>
</ref>
<ref id="gks283-B88">
<label>88</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abeel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Saeys</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bonnet</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Rouze</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Generic eukaryotic core promoter prediction using structural features of DNA</article-title>
<source>Genome Res.</source>
<year>2008</year>
<volume>18</volume>
<fpage>310</fpage>
<lpage>323</lpage>
<pub-id pub-id-type="pmid">18096745</pub-id>
</element-citation>
</ref>
<ref id="gks283-B89">
<label>89</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tullius</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Structural biology: DNA binding shapes up</article-title>
<source>Nature</source>
<year>2009</year>
<volume>461</volume>
<fpage>1225</fpage>
<lpage>1226</lpage>
<pub-id pub-id-type="pmid">19865161</pub-id>
</element-citation>
</ref>
<ref id="gks283-B90">
<label>90</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohs</surname>
<given-names>R</given-names>
</name>
<name>
<surname>West</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Honig</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Nuance in the double-helix and its role in protein-DNA recognition</article-title>
<source>Curr. Opin. Struct. Biol.</source>
<year>2009</year>
<volume>19</volume>
<fpage>171</fpage>
<lpage>177</lpage>
<pub-id pub-id-type="pmid">19362815</pub-id>
</element-citation>
</ref>
<ref id="gks283-B91">
<label>91</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nikolajewa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Suhnel</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wilhelm</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>DiProDB: a database for dinucleotide properties</article-title>
<source>Nucleic Acids Res.</source>
<year>2009</year>
<volume>37</volume>
<fpage>D37</fpage>
<lpage>D40</lpage>
<pub-id pub-id-type="pmid">18805906</pub-id>
</element-citation>
</ref>
<ref id="gks283-B92">
<label>92</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Misleh</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>WW</given-names>
</name>
</person-group>
<article-title>MEME: discovering and analyzing DNA and protein sequence motifs</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<fpage>W369</fpage>
<lpage>W373</lpage>
<pub-id pub-id-type="pmid">16845028</pub-id>
</element-citation>
</ref>
<ref id="gks283-B93">
<label>93</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Long</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>CY</given-names>
</name>
<name>
<surname>Ambros</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Potent effect of target structure on microRNA function</article-title>
<source>Nat. Struct. Mol. Biol.</source>
<year>2007</year>
<volume>14</volume>
<fpage>287</fpage>
<lpage>294</lpage>
<pub-id pub-id-type="pmid">17401373</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Belgique/explor/OpenAccessBelV2/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000405  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000405  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Belgique
   |area=    OpenAccessBelV2
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Dec 1 00:43:49 2016. Site generation: Wed Mar 6 14:51:30 2024