Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

DNA motif elucidation using belief propagation

Identifieur interne : 000F52 ( Pmc/Corpus ); précédent : 000F51; suivant : 000F53

DNA motif elucidation using belief propagation

Auteurs : Ka-Chun Wong ; Tak-Ming Chan ; Chengbin Peng ; Yue Li ; Zhaolei Zhang

Source :

RBID : PMC:3763557

Abstract

Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors’ websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.


Url:
DOI: 10.1093/nar/gkt574
PubMed: 23814189
PubMed Central: 3763557

Links to Exploration step

PMC:3763557

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">DNA motif elucidation using belief propagation</title>
<author>
<name sortKey="Wong, Ka Chun" sort="Wong, Ka Chun" uniqKey="Wong K" first="Ka-Chun" last="Wong">Ka-Chun Wong</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chan, Tak Ming" sort="Chan, Tak Ming" uniqKey="Chan T" first="Tak-Ming" last="Chan">Tak-Ming Chan</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Peng, Chengbin" sort="Peng, Chengbin" uniqKey="Peng C" first="Chengbin" last="Peng">Chengbin Peng</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, KSA,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Yue" sort="Li, Yue" uniqKey="Li Y" first="Yue" last="Li">Yue Li</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Zhaolei" sort="Zhang, Zhaolei" uniqKey="Zhang Z" first="Zhaolei" last="Zhang">Zhaolei Zhang</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff wicri:cut=" and" id="gkt574-AFF1">Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23814189</idno>
<idno type="pmc">3763557</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3763557</idno>
<idno type="RBID">PMC:3763557</idno>
<idno type="doi">10.1093/nar/gkt574</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000F52</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F52</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">DNA motif elucidation using belief propagation</title>
<author>
<name sortKey="Wong, Ka Chun" sort="Wong, Ka Chun" uniqKey="Wong K" first="Ka-Chun" last="Wong">Ka-Chun Wong</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chan, Tak Ming" sort="Chan, Tak Ming" uniqKey="Chan T" first="Tak-Ming" last="Chan">Tak-Ming Chan</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Peng, Chengbin" sort="Peng, Chengbin" uniqKey="Peng C" first="Chengbin" last="Peng">Chengbin Peng</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, KSA,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Yue" sort="Li, Yue" uniqKey="Li Y" first="Yue" last="Li">Yue Li</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Zhaolei" sort="Zhang, Zhaolei" uniqKey="Zhang Z" first="Zhaolei" last="Zhang">Zhaolei Zhang</name>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff wicri:cut=" and" id="gkt574-AFF1">Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkt574-AFF1">Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors’ websites: e.g.
<ext-link ext-link-type="uri" xlink:href="http://www.cs.toronto.edu/~wkc/kmerHMM">http://www.cs.toronto.edu/∼wkc/kmerHMM</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
<author>
<name sortKey="Li, N" uniqKey="Li N">N Li</name>
</author>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
<author>
<name sortKey="Moor, Bd" uniqKey="Moor B">BD Moor</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Favorov, Av" uniqKey="Favorov A">AV Favorov</name>
</author>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Galas, Dj" uniqKey="Galas D">DJ Galas</name>
</author>
<author>
<name sortKey="Schmitz, A" uniqKey="Schmitz A">A Schmitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garner, Mm" uniqKey="Garner M">MM Garner</name>
</author>
<author>
<name sortKey="Revzin, A" uniqKey="Revzin A">A Revzin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
<author>
<name sortKey="Robert, F" uniqKey="Robert F">F Robert</name>
</author>
<author>
<name sortKey="Wyrick, Jj" uniqKey="Wyrick J">JJ Wyrick</name>
</author>
<author>
<name sortKey="Aparicio, O" uniqKey="Aparicio O">O Aparicio</name>
</author>
<author>
<name sortKey="Jennings, Eg" uniqKey="Jennings E">EG Jennings</name>
</author>
<author>
<name sortKey="Simon, I" uniqKey="Simon I">I Simon</name>
</author>
<author>
<name sortKey="Zeitlinger, J" uniqKey="Zeitlinger J">J Zeitlinger</name>
</author>
<author>
<name sortKey="Schreiber, J" uniqKey="Schreiber J">J Schreiber</name>
</author>
<author>
<name sortKey="Hannett, N" uniqKey="Hannett N">N Hannett</name>
</author>
<author>
<name sortKey="Kanin, E" uniqKey="Kanin E">E Kanin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Xs" uniqKey="Liu X">XS Liu</name>
</author>
<author>
<name sortKey="Brutlag, Dl" uniqKey="Brutlag D">DL Brutlag</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Philippakis, Aa" uniqKey="Philippakis A">AA Philippakis</name>
</author>
<author>
<name sortKey="Qureshi, Am" uniqKey="Qureshi A">AM Qureshi</name>
</author>
<author>
<name sortKey="He, Fs" uniqKey="He F">FS He</name>
</author>
<author>
<name sortKey="Estep, Pw" uniqKey="Estep P">PW Estep</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fordyce, Pm" uniqKey="Fordyce P">PM Fordyce</name>
</author>
<author>
<name sortKey="Gerber, D" uniqKey="Gerber D">D Gerber</name>
</author>
<author>
<name sortKey="Tran, D" uniqKey="Tran D">D Tran</name>
</author>
<author>
<name sortKey="Zheng, J" uniqKey="Zheng J">J Zheng</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Derisi, Jl" uniqKey="Derisi J">JL DeRisi</name>
</author>
<author>
<name sortKey="Quake, Sr" uniqKey="Quake S">SR Quake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, S" uniqKey="Hu S">S Hu</name>
</author>
<author>
<name sortKey="Xie, Z" uniqKey="Xie Z">Z Xie</name>
</author>
<author>
<name sortKey="Onishi, A" uniqKey="Onishi A">A Onishi</name>
</author>
<author>
<name sortKey="Yu, X" uniqKey="Yu X">X Yu</name>
</author>
<author>
<name sortKey="Jiang, L" uniqKey="Jiang L">L Jiang</name>
</author>
<author>
<name sortKey="Lin, J" uniqKey="Lin J">J Lin</name>
</author>
<author>
<name sortKey="Rho, Hs" uniqKey="Rho H">HS Rho</name>
</author>
<author>
<name sortKey="Woodard, C" uniqKey="Woodard C">C Woodard</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Jeong, Js" uniqKey="Jeong J">JS Jeong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ho, Sw" uniqKey="Ho S">SW Ho</name>
</author>
<author>
<name sortKey="Jona, G" uniqKey="Jona G">G Jona</name>
</author>
<author>
<name sortKey="Chen, Ct" uniqKey="Chen C">CT Chen</name>
</author>
<author>
<name sortKey="Johnston, M" uniqKey="Johnston M">M Johnston</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matys, V" uniqKey="Matys V">V Matys</name>
</author>
<author>
<name sortKey="Kel Margoulis, Ov" uniqKey="Kel Margoulis O">OV Kel-Margoulis</name>
</author>
<author>
<name sortKey="Fricke, E" uniqKey="Fricke E">E Fricke</name>
</author>
<author>
<name sortKey="Liebich, I" uniqKey="Liebich I">I Liebich</name>
</author>
<author>
<name sortKey="Land, S" uniqKey="Land S">S Land</name>
</author>
<author>
<name sortKey="Barre Dirrie, A" uniqKey="Barre Dirrie A">A Barre-Dirrie</name>
</author>
<author>
<name sortKey="Reuter, I" uniqKey="Reuter I">I Reuter</name>
</author>
<author>
<name sortKey="Chekmenev, D" uniqKey="Chekmenev D">D Chekmenev</name>
</author>
<author>
<name sortKey="Krull, M" uniqKey="Krull M">M Krull</name>
</author>
<author>
<name sortKey="Hornischer, K" uniqKey="Hornischer K">K Hornischer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Portales Casamar, E" uniqKey="Portales Casamar E">E Portales-Casamar</name>
</author>
<author>
<name sortKey="Thongjuea, S" uniqKey="Thongjuea S">S Thongjuea</name>
</author>
<author>
<name sortKey="Kwon, At" uniqKey="Kwon A">AT Kwon</name>
</author>
<author>
<name sortKey="Arenillas, D" uniqKey="Arenillas D">D Arenillas</name>
</author>
<author>
<name sortKey="Zhao, X" uniqKey="Zhao X">X Zhao</name>
</author>
<author>
<name sortKey="Valen, E" uniqKey="Valen E">E Valen</name>
</author>
<author>
<name sortKey="Yusuf, D" uniqKey="Yusuf D">D Yusuf</name>
</author>
<author>
<name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
<author>
<name sortKey="Wasserman, Ww" uniqKey="Wasserman W">WW Wasserman</name>
</author>
<author>
<name sortKey="Sandelin, A" uniqKey="Sandelin A">A Sandelin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bateman, A" uniqKey="Bateman A">A Bateman</name>
</author>
<author>
<name sortKey="Coin, L" uniqKey="Coin L">L Coin</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
<author>
<name sortKey="Hollich, V" uniqKey="Hollich V">V Hollich</name>
</author>
<author>
<name sortKey="Grifrths Jones, S" uniqKey="Grifrths Jones S">S GrifRths-Jones</name>
</author>
<author>
<name sortKey="Khanna, A" uniqKey="Khanna A">A Khanna</name>
</author>
<author>
<name sortKey="Marshall, M" uniqKey="Marshall M">M Marshall</name>
</author>
<author>
<name sortKey="Moxon, S" uniqKey="Moxon S">S Moxon</name>
</author>
<author>
<name sortKey="Sonnhammer, Ell" uniqKey="Sonnhammer E">ELL Sonnhammer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robasky, K" uniqKey="Robasky K">K Robasky</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spivak, At" uniqKey="Spivak A">AT Spivak</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pfreundt, U" uniqKey="Pfreundt U">U Pfreundt</name>
</author>
<author>
<name sortKey="James, Dp" uniqKey="James D">DP James</name>
</author>
<author>
<name sortKey="Tweedie, S" uniqKey="Tweedie S">S Tweedie</name>
</author>
<author>
<name sortKey="Wilson, D" uniqKey="Wilson D">D Wilson</name>
</author>
<author>
<name sortKey="Teichmann, Sa" uniqKey="Teichmann S">SA Teichmann</name>
</author>
<author>
<name sortKey="Adryan, B" uniqKey="Adryan B">B Adryan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deboer, Cg" uniqKey="Deboer C">CG deBoer</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xie, Z" uniqKey="Xie Z">Z Xie</name>
</author>
<author>
<name sortKey="Hu, S" uniqKey="Hu S">S Hu</name>
</author>
<author>
<name sortKey="Blackshaw, S" uniqKey="Blackshaw S">S Blackshaw</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Qian, J" uniqKey="Qian J">J Qian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fulton, Dl" uniqKey="Fulton D">DL Fulton</name>
</author>
<author>
<name sortKey="Sundararajan, S" uniqKey="Sundararajan S">S Sundararajan</name>
</author>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
<author>
<name sortKey="Wasserman, Ww" uniqKey="Wasserman W">WW Wasserman</name>
</author>
<author>
<name sortKey="Roach, Jc" uniqKey="Roach J">JC Roach</name>
</author>
<author>
<name sortKey="Sladek, R" uniqKey="Sladek R">R Sladek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luscombe, Nm" uniqKey="Luscombe N">NM Luscombe</name>
</author>
<author>
<name sortKey="Austin, Se" uniqKey="Austin S">SE Austin</name>
</author>
<author>
<name sortKey="Berman, Hm" uniqKey="Berman H">HM Berman</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luscombe, Nm" uniqKey="Luscombe N">NM Luscombe</name>
</author>
<author>
<name sortKey="Laskowski, Ra" uniqKey="Laskowski R">RA Laskowski</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krishna, Ss" uniqKey="Krishna S">SS Krishna</name>
</author>
<author>
<name sortKey="Majumdar, I" uniqKey="Majumdar I">I Majumdar</name>
</author>
<author>
<name sortKey="Grishin, Nv" uniqKey="Grishin N">NV Grishin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luscombe, Nm" uniqKey="Luscombe N">NM Luscombe</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, S" uniqKey="Jones S">S Jones</name>
</author>
<author>
<name sortKey="Van Heyningen, P" uniqKey="Van Heyningen P">P van Heyningen</name>
</author>
<author>
<name sortKey="Berman, Hm" uniqKey="Berman H">HM Berman</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, S" uniqKey="Jones S">S Jones</name>
</author>
<author>
<name sortKey="Shanahan, Hp" uniqKey="Shanahan H">HP Shanahan</name>
</author>
<author>
<name sortKey="Berman, Hm" uniqKey="Berman H">HM Berman</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gunewardena, S" uniqKey="Gunewardena S">S Gunewardena</name>
</author>
<author>
<name sortKey="Jeavons, P" uniqKey="Jeavons P">P Jeavons</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sarai, A" uniqKey="Sarai A">A Sarai</name>
</author>
<author>
<name sortKey="Kono, H" uniqKey="Kono H">H Kono</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Q" uniqKey="Zhou Q">Q Zhou</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ahmad, S" uniqKey="Ahmad S">S Ahmad</name>
</author>
<author>
<name sortKey="Gromiha, Mm" uniqKey="Gromiha M">MM Gromiha</name>
</author>
<author>
<name sortKey="Sarai, A" uniqKey="Sarai A">A Sarai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ahmad, S" uniqKey="Ahmad S">S Ahmad</name>
</author>
<author>
<name sortKey="Keskin, O" uniqKey="Keskin O">O Keskin</name>
</author>
<author>
<name sortKey="Sarai, A" uniqKey="Sarai A">A Sarai</name>
</author>
<author>
<name sortKey="Nussinov, R" uniqKey="Nussinov R">R Nussinov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pham, Th" uniqKey="Pham T">TH Pham</name>
</author>
<author>
<name sortKey="Clemente, Jc" uniqKey="Clemente J">JC Clemente</name>
</author>
<author>
<name sortKey="Satou, K" uniqKey="Satou K">K Satou</name>
</author>
<author>
<name sortKey="Ho, Tb" uniqKey="Ho T">TB Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ofran, Y" uniqKey="Ofran Y">Y Ofran</name>
</author>
<author>
<name sortKey="Mysore, V" uniqKey="Mysore V">V Mysore</name>
</author>
<author>
<name sortKey="Rost, B" uniqKey="Rost B">B Rost</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wong, Kc" uniqKey="Wong K">KC Wong</name>
</author>
<author>
<name sortKey="Peng, C" uniqKey="Peng C">C Peng</name>
</author>
<author>
<name sortKey="Wong, Mh" uniqKey="Wong M">MH Wong</name>
</author>
<author>
<name sortKey="Leung, Ks" uniqKey="Leung K">KS Leung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leung, Ks" uniqKey="Leung K">KS Leung</name>
</author>
<author>
<name sortKey="Wong, Kc" uniqKey="Wong K">KC Wong</name>
</author>
<author>
<name sortKey="Chan, Tm" uniqKey="Chan T">TM Chan</name>
</author>
<author>
<name sortKey="Wong, Mh" uniqKey="Wong M">MH Wong</name>
</author>
<author>
<name sortKey="Lee, Kh" uniqKey="Lee K">KH Lee</name>
</author>
<author>
<name sortKey="Lau, Ck" uniqKey="Lau C">CK Lau</name>
</author>
<author>
<name sortKey="Tsui, Sk" uniqKey="Tsui S">SK Tsui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Tm" uniqKey="Chan T">TM Chan</name>
</author>
<author>
<name sortKey="Wong, Kc" uniqKey="Wong K">KC Wong</name>
</author>
<author>
<name sortKey="Lee, Kh" uniqKey="Lee K">KH Lee</name>
</author>
<author>
<name sortKey="Wong, Mh" uniqKey="Wong M">MH Wong</name>
</author>
<author>
<name sortKey="Lau, Ck" uniqKey="Lau C">CK Lau</name>
</author>
<author>
<name sortKey="Tsui, Sk" uniqKey="Tsui S">SK Tsui</name>
</author>
<author>
<name sortKey="Leung, Ks" uniqKey="Leung K">KS Leung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macisaac, Kd" uniqKey="Macisaac K">KD MacIsaac</name>
</author>
<author>
<name sortKey="Fraenkel, E" uniqKey="Fraenkel E">E Fraenkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kel, Ae" uniqKey="Kel A">AE Kel</name>
</author>
<author>
<name sortKey="Goessling, E" uniqKey="Goessling E">E Goessling</name>
</author>
<author>
<name sortKey="Reuter, I" uniqKey="Reuter I">I Reuter</name>
</author>
<author>
<name sortKey="Cheremushkin, E" uniqKey="Cheremushkin E">E Cheremushkin</name>
</author>
<author>
<name sortKey="Kel Margoulis, Ov" uniqKey="Kel Margoulis O">OV Kel-Margoulis</name>
</author>
<author>
<name sortKey="Wingender, E" uniqKey="Wingender E">E Wingender</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jensen, St" uniqKey="Jensen S">ST Jensen</name>
</author>
<author>
<name sortKey="Liu, Xs" uniqKey="Liu X">XS Liu</name>
</author>
<author>
<name sortKey="Zhou, Q" uniqKey="Zhou Q">Q Zhou</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sandve, Gk" uniqKey="Sandve G">GK Sandve</name>
</author>
<author>
<name sortKey="Abul, O" uniqKey="Abul O">O Abul</name>
</author>
<author>
<name sortKey="Walseng, V" uniqKey="Walseng V">V Walseng</name>
</author>
<author>
<name sortKey="Drablos, F" uniqKey="Drablos F">F Drablos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hughes, Jd" uniqKey="Hughes J">JD Hughes</name>
</author>
<author>
<name sortKey="Estep, Pw" uniqKey="Estep P">PW Estep</name>
</author>
<author>
<name sortKey="Tavazoie, S" uniqKey="Tavazoie S">S Tavazoie</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thijs, G" uniqKey="Thijs G">G Thijs</name>
</author>
<author>
<name sortKey="Lescot, M" uniqKey="Lescot M">M Lescot</name>
</author>
<author>
<name sortKey="Marchal, K" uniqKey="Marchal K">K Marchal</name>
</author>
<author>
<name sortKey="Rombauts, S" uniqKey="Rombauts S">S Rombauts</name>
</author>
<author>
<name sortKey="Demoor, B" uniqKey="Demoor B">B DeMoor</name>
</author>
<author>
<name sortKey="Rouze, P" uniqKey="Rouze P">P Rouze</name>
</author>
<author>
<name sortKey="Moreau, Y" uniqKey="Moreau Y">Y Moreau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ao, W" uniqKey="Ao W">W Ao</name>
</author>
<author>
<name sortKey="Gaudet, J" uniqKey="Gaudet J">J Gaudet</name>
</author>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
<author>
<name sortKey="Muttumu, S" uniqKey="Muttumu S">S Muttumu</name>
</author>
<author>
<name sortKey="Mango, Se" uniqKey="Mango S">SE Mango</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Workman, Ct" uniqKey="Workman C">CT Workman</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Favorov, Av" uniqKey="Favorov A">AV Favorov</name>
</author>
<author>
<name sortKey="Gelfand, Ms" uniqKey="Gelfand M">MS Gelfand</name>
</author>
<author>
<name sortKey="Gerasimova, Av" uniqKey="Gerasimova A">AV Gerasimova</name>
</author>
<author>
<name sortKey="Ravcheev, Da" uniqKey="Ravcheev D">DA Ravcheev</name>
</author>
<author>
<name sortKey="Mironov, Aa" uniqKey="Mironov A">AA Mironov</name>
</author>
<author>
<name sortKey="Makeev, Vj" uniqKey="Makeev V">VJ Makeev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Tm" uniqKey="Chan T">TM Chan</name>
</author>
<author>
<name sortKey="Leung, Ks" uniqKey="Leung K">KS Leung</name>
</author>
<author>
<name sortKey="Lee, Kh" uniqKey="Lee K">KH Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hertz, Gz" uniqKey="Hertz G">GZ Hertz</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Hansen, U" uniqKey="Hansen U">U Hansen</name>
</author>
<author>
<name sortKey="Spouge, Jl" uniqKey="Spouge J">JL Spouge</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J van Helden</name>
</author>
<author>
<name sortKey="Andre, B" uniqKey="Andre B">B Andre</name>
</author>
<author>
<name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gunewardena, S" uniqKey="Gunewardena S">S Gunewardena</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Regnier, M" uniqKey="Regnier M">M Régnier</name>
</author>
<author>
<name sortKey="Denise, A" uniqKey="Denise A">A Denise</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavesi, G" uniqKey="Pavesi G">G Pavesi</name>
</author>
<author>
<name sortKey="Mereghetti, P" uniqKey="Mereghetti P">P Mereghetti</name>
</author>
<author>
<name sortKey="Mauri, G" uniqKey="Mauri G">G Mauri</name>
</author>
<author>
<name sortKey="Pesole, G" uniqKey="Pesole G">G Pesole</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sinha, S" uniqKey="Sinha S">S Sinha</name>
</author>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Philippakis, Aa" uniqKey="Philippakis A">AA Philippakis</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
<author>
<name sortKey="Gehrke, Ar" uniqKey="Gehrke A">AR Gehrke</name>
</author>
<author>
<name sortKey="Jaeger, Sa" uniqKey="Jaeger S">SA Jaeger</name>
</author>
<author>
<name sortKey="Chan, T" uniqKey="Chan T">T Chan</name>
</author>
<author>
<name sortKey="Metzler, G" uniqKey="Metzler G">G Metzler</name>
</author>
<author>
<name sortKey="Vedenko, A" uniqKey="Vedenko A">A Vedenko</name>
</author>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
<author>
<name sortKey="Morris, Q" uniqKey="Morris Q">Q Morris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Foat, Bc" uniqKey="Foat B">BC Foat</name>
</author>
<author>
<name sortKey="Houshmandi, Ss" uniqKey="Houshmandi S">SS Houshmandi</name>
</author>
<author>
<name sortKey="Olivas, Wm" uniqKey="Olivas W">WM Olivas</name>
</author>
<author>
<name sortKey="Bussemaker, Hj" uniqKey="Bussemaker H">HJ Bussemaker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tanay, A" uniqKey="Tanay A">A Tanay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weirauch, Mt" uniqKey="Weirauch M">MT Weirauch</name>
</author>
<author>
<name sortKey="Cote, A" uniqKey="Cote A">A Cote</name>
</author>
<author>
<name sortKey="Norel, R" uniqKey="Norel R">R Norel</name>
</author>
<author>
<name sortKey="Annala, M" uniqKey="Annala M">M Annala</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Riley, Tr" uniqKey="Riley T">TR Riley</name>
</author>
<author>
<name sortKey="Saez Rodriguez, J" uniqKey="Saez Rodriguez J">J Saez-Rodriguez</name>
</author>
<author>
<name sortKey="Cokelaer, T" uniqKey="Cokelaer T">T Cokelaer</name>
</author>
<author>
<name sortKey="Vedenko, A" uniqKey="Vedenko A">A Vedenko</name>
</author>
<author>
<name sortKey="Talukder, S" uniqKey="Talukder S">S Talukder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berg, Og" uniqKey="Berg O">OG Berg</name>
</author>
<author>
<name sortKey="Von Hippel, Ph" uniqKey="Von Hippel P">PH von Hippel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author>
<name sortKey="Krogh, A" uniqKey="Krogh A">A Krogh</name>
</author>
<author>
<name sortKey="Mitchison, G" uniqKey="Mitchison G">G Mitchison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rabiner, Lr" uniqKey="Rabiner L">LR Rabiner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frey, Bj" uniqKey="Frey B">BJ Frey</name>
</author>
<author>
<name sortKey="Mohammad, N" uniqKey="Mohammad N">N Mohammad</name>
</author>
<author>
<name sortKey="Morris, Qd" uniqKey="Morris Q">QD Morris</name>
</author>
<author>
<name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author>
<name sortKey="Robinson, Md" uniqKey="Robinson M">MD Robinson</name>
</author>
<author>
<name sortKey="Mnaimneh, S" uniqKey="Mnaimneh S">S Mnaimneh</name>
</author>
<author>
<name sortKey="Chang, R" uniqKey="Chang R">R Chang</name>
</author>
<author>
<name sortKey="Pan, Q" uniqKey="Pan Q">Q Pan</name>
</author>
<author>
<name sortKey="Sat, E" uniqKey="Sat E">E Sat</name>
</author>
<author>
<name sortKey="Rossant, J" uniqKey="Rossant J">J Rossant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frey, Bj" uniqKey="Frey B">BJ Frey</name>
</author>
<author>
<name sortKey="Dueck, D" uniqKey="Dueck D">D Dueck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barash, Y" uniqKey="Barash Y">Y Barash</name>
</author>
<author>
<name sortKey="Calarco, Ja" uniqKey="Calarco J">JA Calarco</name>
</author>
<author>
<name sortKey="Gao, W" uniqKey="Gao W">W Gao</name>
</author>
<author>
<name sortKey="Pan, Q" uniqKey="Pan Q">Q Pan</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Shai, O" uniqKey="Shai O">O Shai</name>
</author>
<author>
<name sortKey="Blencowe, Bj" uniqKey="Blencowe B">BJ Blencowe</name>
</author>
<author>
<name sortKey="Frey, Bj" uniqKey="Frey B">BJ Frey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weiss, Y" uniqKey="Weiss Y">Y Weiss</name>
</author>
<author>
<name sortKey="Freeman, Wt" uniqKey="Freeman W">WT Freeman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barber, D" uniqKey="Barber D">D Barber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahony, S" uniqKey="Mahony S">S Mahony</name>
</author>
<author>
<name sortKey="Benos, Pv" uniqKey="Benos P">PV Benos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Verrijzer, Cp" uniqKey="Verrijzer C">CP Verrijzer</name>
</author>
<author>
<name sortKey="Alkema, Mj" uniqKey="Alkema M">MJ Alkema</name>
</author>
<author>
<name sortKey="Van Weperen, Ww" uniqKey="Van Weperen W">WW van Weperen</name>
</author>
<author>
<name sortKey="Vanleeuwen, Hc" uniqKey="Vanleeuwen H">HC VanLeeuwen</name>
</author>
<author>
<name sortKey="Strating, Mj" uniqKey="Strating M">MJ Strating</name>
</author>
<author>
<name sortKey="Vander Vliet, Pc" uniqKey="Vander Vliet P">PC vander Vliet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gordan, R" uniqKey="Gordan R">R Gordan</name>
</author>
<author>
<name sortKey="Murphy, Kf" uniqKey="Murphy K">KF Murphy</name>
</author>
<author>
<name sortKey="Mccord, Rp" uniqKey="Mccord R">RP McCord</name>
</author>
<author>
<name sortKey="Zhu, C" uniqKey="Zhu C">C Zhu</name>
</author>
<author>
<name sortKey="Vedenko, A" uniqKey="Vedenko A">A Vedenko</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morris, Q" uniqKey="Morris Q">Q Morris</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-id journal-id-type="hwp">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23814189</article-id>
<article-id pub-id-type="pmc">3763557</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkt574</article-id>
<article-id pub-id-type="publisher-id">gkt574</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods Online</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>DNA motif elucidation using belief propagation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wong</surname>
<given-names>Ka-Chun</given-names>
</name>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chan</surname>
<given-names>Tak-Ming</given-names>
</name>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Peng</surname>
<given-names>Chengbin</given-names>
</name>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Yue</given-names>
</name>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Zhaolei</given-names>
</name>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>5</sup>
</xref>
<xref ref-type="aff" rid="gkt574-AFF1">
<sup>6</sup>
</xref>
<xref ref-type="corresp" rid="gkt574-COR1">*</xref>
</contrib>
</contrib-group>
<aff id="gkt574-AFF1">
<sup>1</sup>
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada,
<sup>2</sup>
Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada,
<sup>3</sup>
Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA,
<sup>4</sup>
Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, KSA,
<sup>5</sup>
Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada and
<sup>6</sup>
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada</aff>
<author-notes>
<corresp id="gkt574-COR1">*To whom correspondence should be addressed. Tel:
<phone>+1 416 946 0924</phone>
; Fax:
<fax>+1 416 946 0924</fax>
; Email:
<email>zhaolei.zhang@utoronto.ca</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>9</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>6</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>29</day>
<month>6</month>
<year>2013</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>41</volume>
<issue>16</issue>
<fpage>e153</fpage>
<lpage>e153</lpage>
<history>
<date date-type="received">
<day>27</day>
<month>3</month>
<year>2013</year>
</date>
<date date-type="rev-recd">
<day>16</day>
<month>5</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>7</day>
<month>6</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2013. Published by Oxford University Press.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by/3.0/">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors’ websites: e.g.
<ext-link ext-link-type="uri" xlink:href="http://www.cs.toronto.edu/~wkc/kmerHMM">http://www.cs.toronto.edu/∼wkc/kmerHMM</ext-link>
.</p>
</abstract>
<counts>
<page-count count="12"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>INTRODUCTION</title>
<p>In human and other higher eukaryotes, gene expression is regulated by the binding of various modulatory transcription factors (TF) onto cis-regulatory DNA elements near genes. Binding of different combinations of TFs may result in a gene being expressed in different tissues or at different developmental stages. To fully understand a gene’s function, it is essential to identify the TFs that regulate the gene and the corresponding TF-binding sites (TFBS). Traditionally, these regulatory sites were determined by labor-intensive experiments such as DNAse footprinting or gel-shift assays. Various computational approaches have been developed to predict TFBS
<italic>in silico</italic>
, which is an active research area in bioinformatics (
<xref ref-type="bibr" rid="gkt574-B1">1</xref>
). TFBS are relatively short (10–20 bp) and highly degenerate sequence motifs, which make their effective identification a computationally challenging task. A number of high-throughput experimental technologies were also developed recently to determine protein–DNA-binding affinity.</p>
<p>It is expensive and laborious to experimentally identify TF-TFBS sequence pairs, for example, using DNA footprinting (
<xref ref-type="bibr" rid="gkt574-B2">2</xref>
) or gel electrophoresis (
<xref ref-type="bibr" rid="gkt574-B3">3</xref>
). The technology of Chromatin immunoprecipitation (ChIP) followed by microarray or sequencing (
<xref ref-type="bibr" rid="gkt574-B4">4</xref>
,
<xref ref-type="bibr" rid="gkt574-B5">5</xref>
) measures the binding occupancy of a particular TF to the nucleotide sequences of co-regulated genes on a genome-wide scale
<italic>in vivo</italic>
but at low resolution. Further processing is needed to extract precise TFBSs (
<xref ref-type="bibr" rid="gkt574-B6">6</xref>
). On the other hand,
<italic>in vitro</italic>
techniques such as protein-binding microarray (PBM) (
<xref ref-type="bibr" rid="gkt574-B7">7</xref>
), microfluidic affinity analysis (
<xref ref-type="bibr" rid="gkt574-B8">8</xref>
) and protein microarray assays (
<xref ref-type="bibr" rid="gkt574-B9">9</xref>
,
<xref ref-type="bibr" rid="gkt574-B10">10</xref>
) enable us to measure the DNA sequence binding of TFs
<italic>in vitro</italic>
completely. TRANSFAC is one of the largest databases for regulatory elements including TFs, TFBSs, weight matrices of the TFBSs and regulated genes (
<xref ref-type="bibr" rid="gkt574-B11">11</xref>
). JASPAR is a comprehensive collection of TF DNA-binding preferences (
<xref ref-type="bibr" rid="gkt574-B12">12</xref>
). Other annotation databases are also available [e.g. Pfam, UniProbe, ScerTF, FlyTF, YeTFaSCo, hPDI and TFcat (
<xref ref-type="bibr" rid="gkt574-B13 gkt574-B14 gkt574-B15 gkt574-B16 gkt574-B17 gkt574-B18 gkt574-B19">13–19</xref>
)].</p>
<sec>
<title>Background</title>
<p>Numerous studies have been carried out to analyze existing protein–DNA-binding 3D structures comprehensively (
<xref ref-type="bibr" rid="gkt574-B20">20</xref>
,
<xref ref-type="bibr" rid="gkt574-B21">21</xref>
) or with focus on specific families [e.g. zinc fingers (
<xref ref-type="bibr" rid="gkt574-B22">22</xref>
)]. Various properties have been discovered concerning, e.g. bonding and force types, TF conservation and mutation (
<xref ref-type="bibr" rid="gkt574-B23">23</xref>
) and bending of the DNA (
<xref ref-type="bibr" rid="gkt574-B24">24</xref>
). Some are already applied to predict binding amino acids on the TF side, e.g. (
<xref ref-type="bibr" rid="gkt574-B25">25</xref>
,
<xref ref-type="bibr" rid="gkt574-B26">26</xref>
). Alternatively, researchers have sought for general binding ‘code’ between proteins and DNA, in particular, the one-to-one mapping between amino acids from TFs and nucleotides from TFBSs. Despite many proposed one–one-binding propensity mappings, it has come to a consensus that there is no simple binding ‘code’ (
<xref ref-type="bibr" rid="gkt574-B27">27</xref>
).</p>
<p>To have a better understanding on protein–DNA-binding motifs, many data mining approaches were proposed and reviewed (
<xref ref-type="bibr" rid="gkt574-B28">28</xref>
). Researchers use and transfer additional detailed information such as base compositions, structures, thermodynamic properties (
<xref ref-type="bibr" rid="gkt574-B29">29</xref>
,
<xref ref-type="bibr" rid="gkt574-B30">30</xref>
) as well as expressions (
<xref ref-type="bibr" rid="gkt574-B31">31</xref>
), into sophisticated features to fit into certain data mining techniques. These methods usually extract complicated features rather than working on interpretable data directly. Many data-mining techniques, such as neural networks, support vector machines (
<xref ref-type="bibr" rid="gkt574-B32">32</xref>
) and regressions (
<xref ref-type="bibr" rid="gkt574-B28">28</xref>
), may generate rules that are difficult to interpret. Furthermore, many data-mining approaches were based on specific protein families or particular data sets. On the other hand, DNA and protein sequences are often the only primary data, which carry important information for protein–DNA-bindings (
<xref ref-type="bibr" rid="gkt574-B27">27</xref>
,
<xref ref-type="bibr" rid="gkt574-B33">33</xref>
). Therefore, it is desirable to make use of the existing comprehensive sequence data to discover motif models (
<xref ref-type="bibr" rid="gkt574-B34">34</xref>
,
<xref ref-type="bibr" rid="gkt574-B35">35</xref>
).</p>
</sec>
<sec>
<title>Related works</title>
<p>Motif discovery (
<xref ref-type="bibr" rid="gkt574-B36">36</xref>
) can be categorized into two types: motif scanning and
<italic>de novo</italic>
motif discovery. (i) Motif scanning is to identify putative TFBSs based on motif knowledge obtained from annotated data (
<xref ref-type="bibr" rid="gkt574-B37">37</xref>
). (ii)
<italic>de novo</italic>
motif discovery predicts conserved patterns without knowledge on their appearances, based on mathematical modeling and scoring functions (
<xref ref-type="bibr" rid="gkt574-B38">38</xref>
,
<xref ref-type="bibr" rid="gkt574-B39">39</xref>
) from a set of protein/DNA promoter sequences with similar regulatory functions. Although
<italic>de novo</italic>
motif discovery is successful for well-conserved amino acid domain motifs, the counterpart for DNA remains challenging with less-than-perfect performance on real benchmarks (
<xref ref-type="bibr" rid="gkt574-B1">1</xref>
,
<xref ref-type="bibr" rid="gkt574-B40">40</xref>
,
<xref ref-type="bibr" rid="gkt574-B36">36</xref>
).</p>
<p>To tackle this problem, researchers have used a number of methods to optimize statistical measures, such as Gibbs sampling, expectation maximization, artificial neural network, Markov Chain Monte Carlo, genetic algorithm, maximal information content greedy search approach, simulated annealing, tree data structure, k-mer frequency table, dinucleotide modeling and exhaustive searches (
<xref ref-type="bibr" rid="gkt574-B41 gkt574-B42 gkt574-B43 gkt574-B44 gkt574-B45 gkt574-B46 gkt574-B47 gkt574-B48 gkt574-B49 gkt574-B50 gkt574-B51 gkt574-B52 gkt574-B53 gkt574-B54 gkt574-B55">41–55</xref>
).</p>
<p>It had been pointed out that a fundamental bottleneck in TFBS identification is the lack of quantitative binding affinity data for a large proportion of the TFs. The advancement of new high-throughput technologies such as ChIP-chip, ChIP-seq, protein microarray assays and PBM has made it possible to determine the binding affinity of these TFs (
<xref ref-type="bibr" rid="gkt574-B9">9</xref>
,
<xref ref-type="bibr" rid="gkt574-B10">10</xref>
,
<xref ref-type="bibr" rid="gkt574-B56">56</xref>
). In light of this deluge of quantitative affinity data, traditional approaches that rely on thresholds are no longer adequate. Instead, more robust and probabilistic methods were developed to take into account these quantitative affinity data. Later in the text, we briefly review some of these methods. Seed and Wobble has been proposed as a seed-based approach using rank statistics (
<xref ref-type="bibr" rid="gkt574-B7">7</xref>
). RankMotif++ was proposed to maximize the log likelihood of their probabilistic model of binding preferences (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). MatrixREDUCE was proposed to perform forward variable selections to minimize the sum of squared deviations (
<xref ref-type="bibr" rid="gkt574-B58">58</xref>
). MDScan was proposed to combine two search strategies together, namely, word enumeration and position-specific weight matrix updating (
<xref ref-type="bibr" rid="gkt574-B6">6</xref>
). PREGO was proposed to maximize the Spearman rank correlation between the predicted binding intensities and the measured binding intensities (
<xref ref-type="bibr" rid="gkt574-B59">59</xref>
). BEEML-PBM was proposed as a regression method to learn an accurate energy model from noisy PBM data (
<xref ref-type="bibr" rid="gkt574-B60">60</xref>
).</p>
</sec>
<sec>
<title>Problem description</title>
<p>PBM was developed to measure the binding preference of a protein to a complete set of k-mers
<italic>in vitro</italic>
(
<xref ref-type="bibr" rid="gkt574-B7">7</xref>
,
<xref ref-type="bibr" rid="gkt574-B61">61</xref>
). The PBM method has unprecedentedly high resolution and rapid throughput, comparing with the other traditional techniques. It has also been shown to be largely consistent with those generated by
<italic>in vivo</italic>
genome-wide location analysis (ChIP-chip) (
<xref ref-type="bibr" rid="gkt574-B7">7</xref>
,
<xref ref-type="bibr" rid="gkt574-B61">61</xref>
). As a result, researchers have applied this technique onto many TFs, and a large amount of PBM data have been being accumulated and deposited to the UniProbe database (
<xref ref-type="bibr" rid="gkt574-B14">14</xref>
).</p>
<p>Given a set of DNA sequences, PBM can be used to measure their binding signal intensities for a given DNA-binding protein. Specifically, each probe sequence is associated with a normalized signal intensity value. The higher the normalized signal intensity, the stronger is the binding preference of the DNA-binding protein to the corresponding probe sequence. The actual mathematical relationship between the real binding affinity and the normalized signal intensity is unknown, as it still depends on specific experimental settings (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). Given such data, our goal is to uncover a motif model, which can summarize and represent the DNA-binding preference of the DNA-binding protein. The most common motif model is the Position Weight Matrix (PWM), which assumes independence between adjacent motif positions, justified by the experimental and theoretical statistical mechanical study (
<xref ref-type="bibr" rid="gkt574-B62">62</xref>
). Although a recent attempt has been made to generalize PWM, the insertion and deletion operations between adjacent nucleotide positions are still challenging (
<xref ref-type="bibr" rid="gkt574-B63">63</xref>
). In this work, we describe our efforts in developing a hidden Markov models (HMM)-based approach to model the dependence between adjacent nucleotide positions rigorously; we also show that our method (kmerHMM) can also deduce multiple binding modes for a given TF.</p>
</sec>
</sec>
<sec sec-type="materials|methods">
<title>MATERIALS AND METHODS</title>
<p>
<xref ref-type="fig" rid="gkt574-F1">Figure 1</xref>
illustrates the computational framework that we developed for kmerHMM. For a DNA-binding protein, we are given a set of DNA sequences
<inline-formula>
<inline-graphic xlink:href="gkt574i1.jpg"></inline-graphic>
</inline-formula>
and the corresponding normalized signal intensity values
<inline-formula>
<inline-graphic xlink:href="gkt574i2.jpg"></inline-graphic>
</inline-formula>
(e.g. Array #1). Following the PBM data analysis convention, we refer to such type of input data set as an array in this manuscript. To extract informative motif data, a sliding window of length
<italic>k</italic>
is used to scan each DNA sequence (and its reverse complement) to count and record the normalized signal intensity values for each k-mer. Once all the DNA sequences are scanned, a list of normalized signal intensity values is obtained for each k-mer that is present in those DNA sequences. The median of the list is calculated as the median signal intensity
<italic>m
<sub>x</sub>
</italic>
for each k-mer
<italic>x</italic>
. Among those k-mers, some are motif instances (positive k-mers), whereas the others are just background k-mers. To distinguish them, the robust estimate procedures proposed in RankMotif++ (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
) is adopted in this work. In other words, we define the positive k-mers to be the k-mers
<italic>y</italic>
whose median signal intensity
<inline-formula>
<inline-graphic xlink:href="gkt574i3.jpg"></inline-graphic>
</inline-formula>
where
<italic>mi</italic>
and σ are the median and the median absolute deviation (MAD) of the normalized intensities
<inline-formula>
<inline-graphic xlink:href="gkt574i4.jpg"></inline-graphic>
</inline-formula>
divided by 0.6745 (the MAD of the unit normal distribution), respectively. All the previous numeric settings are set such that the computational condition is consistent with the previous study (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
).
<fig id="gkt574-F1" position="float">
<label>Figure 1.</label>
<caption>
<p>An HMM approach for multimodal motif discovery from PBM data. (1) Positive (bound) k-mers are selected from the training DNA probe sequences (e.g. Array #1). (2) The positive k-mers are aligned using a multiple sequence alignment method. (3) The aligned positive k-mers are input for training an HMM using Baum–Welch training in an unsupervised fashion. (4a) The trained HMM is tested on the testing DNA probe sequences (e.g. Array #2). (4b) The trained HMM can be analyzed and visualized using N-Max-Product algorithm.</p>
</caption>
<graphic xlink:href="gkt574f1p"></graphic>
</fig>
</p>
<p>After a set of positive k-mers were selected, they are aligned using a multiple sequence alignment method. The aligned k-mers are then input for training an HMM to represent the binding preferences of the DNA-binding protein of interest, using Baum–Welch training algorithm (
<xref ref-type="bibr" rid="gkt574-B64">64</xref>
). Mathematically, the Baum–Welch training algorithm can be described herein:
<statement>
<title>Input:</title>
<p>A set of aligned k-mers
<inline-formula>
<inline-graphic xlink:href="gkt574i5.jpg"></inline-graphic>
</inline-formula>
of length
<italic>L</italic>
. Each k-mer
<italic>s
<sub>m</sub>
</italic>
can be represented as
<inline-formula>
<inline-graphic xlink:href="gkt574i6.jpg"></inline-graphic>
</inline-formula>
where
<italic>s
<sub>mp</sub>
</italic>
is the
<italic>p</italic>
-th nucleotide of the aligned k-mer
<italic>s
<sub>m</sub>
</italic>
:
<disp-formula>
<graphic xlink:href="gkt574um1"></graphic>
</disp-formula>
</p>
</statement>
<statement>
<title>Output:</title>
<p>an HMM model θ trained to represent the input aligned k-mers:
<disp-formula>
<graphic xlink:href="gkt574um2"></graphic>
</disp-formula>
where
<italic>a
<sub>ij</sub>
</italic>
is the transition probability from state
<italic>i</italic>
to state
<italic>j</italic>
;
<inline-formula>
<inline-graphic xlink:href="gkt574i7.jpg"></inline-graphic>
</inline-formula>
is the emission probability to emit
<italic>x</italic>
at state
<italic>i</italic>
;
<inline-formula>
<inline-graphic xlink:href="gkt574i8.jpg"></inline-graphic>
</inline-formula>
is the initial state probability for state
<italic>i</italic>
. The mathematical details are available in the
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Data</ext-link>
.</p>
</statement>
</p>
<p>After the HMM is trained, each of its hidden state represents a possible nucleotide position in which occurring probabilities of different bases (including gaps) are represented by its emission distribution. The transition probabilities of the HMM encode the indel (insertion/deletion) operations within the motif model implicitly. The advantage of using HMMs over other topologically restricted probabilistic graphical methods is that the graph topology is more flexible so that multimodal motif models can be captured. We subsequently tested the derived HMM model on another set of DNA sequences, which were not used for training (e.g. Array #2). In particular, one may be interested in the ability of the trained HMM to rank the DNA sequences so as to predict which ones are more likely to be the positive probes as well as the correlation between the predicted ranks and measured ranks among the positive probes. On the other hand, N-Max-Product algorithm can be implemented to extract the N most probable paths in its Markov chain, creating multiple motif models in PWM-like forms. Max-product algorithm is a complete generalization of the well-known Viterbi algorithm (
<xref ref-type="bibr" rid="gkt574-B65">65</xref>
). The major difference is that Viterbi algorithm is given an input sequence and an HMM, whereas max-product algorithm is only given an HMM.</p>
<sec>
<title>Parameter settings</title>
<p>The proposed approach was implemented and tested on a previously published PBM data set (
<xref ref-type="bibr" rid="gkt574-B7">7</xref>
). If the number of positive k-mers is <50, the top 50 k-mers are used to mitigate sampling error. Progressive multiple alignment is adopted; each pairwise alignment is done with the NUC44 scoring matrix (
<xref ref-type="bibr" rid="gkt574-B64">64</xref>
). After that, pairwise distances between sequences are computed by counting the proportion of sites at which each pair of sequences are similar and different using NUC44 (ignoring gaps). Assuming equal variance and independence of evolutionary distance estimates, the guide tree is calculated by the neighbor-joining method. We have used 50 hidden states for all HMM models trained to achieve rigorous pattern modeling. Such a number of hidden states are chosen based on the empirical performances in a few preliminary runs. Laplace smoothing with
<inline-formula>
<inline-graphic xlink:href="gkt574i9.jpg"></inline-graphic>
</inline-formula>
is applied to the emission matrices trained. To be comparable with the previous results,
<italic>k</italic>
is set to 8, i.e. only 8 mer is considered (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). In particular, we need to control how many steps the Baum–Welch training algorithm executes. In this work, the algorithm terminates when all of the following three quantities become numerically negligible (i.e. <0.1%): (i) the change in the log likelihood that the input sequence is generated by the currently estimated values of the transition and emission matrices; (ii) the change in the norm of the transition matrix, normalized by the size of the matrix; (iii) the change in the norm of the emission matrix, normalized by the size of the matrix. As each HMM is initialized randomly, the training is repeated for 10 times to avoid any suboptimal convergence. Among them, the HMM model with the highest Spearman correlation in the training data is selected as the output HMM model.</p>
<p>We followed the evaluation procedures described in a previous study (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). Specifically, for each DNA-binding protein of interest, we have two array sets of DNA probe sequences, i.e. array #1 and array #2. Each DNA probe sequence on the array is associated with a normalized signal intensity value. The higher the value, the higher is the binding preference of a DNA-binding protein to that DNA sequence. For each DNA-binding protein, the two arrays (data replicates) are alternated for the training and testing purpose. In other words, array #1 is used for training while array #2 is used for testing in the first round, whereas array #2 is used for training while array #1 is used for testing in the second round.</p>
<p>We used two evaluation methods to compare the performance of our method with other previously published methods. The first one is to examine the ability of individual methods to recover and rank the binding preferences of the DNA sequences, whereas the second one is to examine their ability to predict positive DNA sequences among the whole set of testing DNA sequences.</p>
<p>For the first evaluation, Spearman rank correlation coefficients are adopted as the performance metric to compare the true ranking of the binding preferences to the tentative ranking predicted by the different computational methods. To apply kmerHMM to predict sequence rank, we use a sliding window of
<italic>L</italic>
(i.e. the alignment length in training) to scan each sequence and compute the probability of observing the subsequence within the sliding window using the forward algorithm (
<xref ref-type="bibr" rid="gkt574-B65">65</xref>
). The maximal probability within each sequence is taken as the quantitative measure for ranking. Mathematically, given a DNA sequence
<inline-formula>
<inline-graphic xlink:href="gkt574i10.jpg"></inline-graphic>
</inline-formula>
, we compute its predicted binding preference
<italic>B</italic>
(
<italic>D</italic>
) as:
<disp-formula>
<graphic xlink:href="gkt574um3"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="gkt574i11.jpg"></inline-graphic>
</inline-formula>
can be computed using the forward algorithm (
<xref ref-type="bibr" rid="gkt574-B65">65</xref>
), similar to the training procedure described in the previous section.</p>
<p>For the second evaluation, the positive (bound) DNA sequences in each testing data set are defined using the robust estimate in RankMotif++, which was specifically developed for the analyzing raw PBM data (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). The remaining DNA sequences are defined as the negative ones, which accounts for 94.6–99.1% of the testing data set. Given such a two-class classification setting, sensitivities were computed at the 99% specificity level. For kmerHMM, the predicted binding preference
<inline-formula>
<inline-graphic xlink:href="gkt574i12.jpg"></inline-graphic>
</inline-formula>
is thresholded to estimate the sensitivities, whereas the other methods used the same settings as described in the previous study (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
).</p>
</sec>
<sec>
<title>Max-Product algorithm</title>
<p>In this study, the most probable state transition path
<inline-formula>
<inline-graphic xlink:href="gkt574i13.jpg"></inline-graphic>
</inline-formula>
is calculated for each HMM θ trained using the max-product algorithm. Mathematically, Max-Product algorithm can be described herein:
<statement>
<title>Input:</title>
<p>an HMM model θ trained to represent the input aligned k-mers:
<disp-formula>
<graphic xlink:href="gkt574um4"></graphic>
</disp-formula>
where
<italic>a
<sub>ij</sub>
</italic>
is the transition probability from state
<italic>i</italic>
to state
<italic>j</italic>
;
<inline-formula>
<inline-graphic xlink:href="gkt574i14.jpg"></inline-graphic>
</inline-formula>
is the emission probability to emit
<italic>x</italic>
at state
<italic>i</italic>
;
<inline-formula>
<inline-graphic xlink:href="gkt574i15.jpg"></inline-graphic>
</inline-formula>
is the initial state probability for state
<italic>i</italic>
.</p>
</statement>
<statement>
<title>Output:</title>
<p>Most probable state transition path
<inline-formula>
<inline-graphic xlink:href="gkt574i16.jpg"></inline-graphic>
</inline-formula>
of the input HMM model θ:
<disp-formula>
<graphic xlink:href="gkt574um5"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="gkt574i17.jpg"></inline-graphic>
</inline-formula>
is the probability to have a state transition path
<italic>Y</italic>
in the input HMM model θ. It can be calculated using a dynamic programming approach. The mathematical details are available in the
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Data</ext-link>
.</p>
</statement>
</p>
</sec>
</sec>
<sec sec-type="results">
<title>RESULTS</title>
<sec>
<title>Comparisons</title>
<p>
<xref ref-type="table" rid="gkt574-T1">Tables 1</xref>
and
<xref ref-type="table" rid="gkt574-T2">2</xref>
list the results from our method (kmerHMM). The ROC curves are plotted in
<xref ref-type="fig" rid="gkt574-F2">Figure 2</xref>
. From those results, we can observe that kmerHMM performs better than other methods on three datasets (Cbf1, Oct-1 and Zif238). On the two other data sets (Ceh-22 and Rap1), kmerHMM is not the top performer but is close. In the case of Rap1, kmerHMM performed slightly worse than other methods. The consensus binding motif for Rap1 is 13 nt long, which is longer than most of the common TFs. kmeHMM only considers motifs of 8 nt long; therefore, it is at an disadvantage for such cases. Nonetheless, we believe that such a limitation will be alleviated when the PBM technology is improved (i.e. a higher value of k can be used) in the future.
<fig id="gkt574-F2" position="float">
<label>Figure 2.</label>
<caption>
<p>Receiver Operating Characteristic (ROC) curves on array #1. The positive (bound) DNA probe sequences in each data set are defined using the robust estimate in RankMotif++ (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). In other words, we define the positive probes to be the probes
<italic>seq
<sub>i</sub>
</italic>
, which normalized signal intensity
<inline-formula>
<inline-graphic xlink:href="gkt574i18.jpg"></inline-graphic>
</inline-formula>
where
<italic>mi</italic>
and σ are the median and the MAD of all the probe normalized intensities
<inline-formula>
<inline-graphic xlink:href="gkt574i19.jpg"></inline-graphic>
</inline-formula>
divided by 0.6745 (the MAD of the unit normal distribution), respectively. The remaining ones are defined as the negative ones that accounts for 94.6–99.1% of the data set. Given such a two-class classification setting, the predicted binding preference
<inline-formula>
<inline-graphic xlink:href="gkt574i20.jpg"></inline-graphic>
</inline-formula>
of each probe sequence
<italic>seq
<sub>j</sub>
</italic>
is thresholded to estimate the true positive rates at different level of false-positive rates for kmerHMM. The performance values of the other methods are adopted from the RankMotif++ manuscript (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). AUC vstands for the Area Under Curve.</p>
</caption>
<graphic xlink:href="gkt574f2p"></graphic>
</fig>
<table-wrap id="gkt574-T1" position="float">
<label>Table 1.</label>
<caption>
<p>Spearman rank correlation coefficients</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">TF</th>
<th rowspan="1" colspan="1">Array</th>
<th rowspan="1" colspan="1">8-mer</th>
<th rowspan="1" colspan="1">MatrixREDUCE</th>
<th rowspan="1" colspan="1">MDScan</th>
<th rowspan="1" colspan="1">PREGO</th>
<th rowspan="1" colspan="1">RankMotif++</th>
<th rowspan="1" colspan="1">Seed and Wobble</th>
<th rowspan="1" colspan="1">kmerHMM</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="2" colspan="1">Cbf1</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.647</td>
<td rowspan="1" colspan="1">0.634</td>
<td rowspan="1" colspan="1">0.512</td>
<td rowspan="1" colspan="1">0.61</td>
<td rowspan="1" colspan="1">0.636</td>
<td rowspan="1" colspan="1">0.527</td>
<td rowspan="1" colspan="1">
<bold>0.660</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.657</td>
<td rowspan="1" colspan="1">0.604</td>
<td rowspan="1" colspan="1">0.496</td>
<td rowspan="1" colspan="1">0.58</td>
<td rowspan="1" colspan="1">0.64</td>
<td rowspan="1" colspan="1">0.49</td>
<td rowspan="1" colspan="1">
<bold>0.647</bold>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Ceh-22</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.487</td>
<td rowspan="1" colspan="1">0.373</td>
<td rowspan="1" colspan="1">0.36</td>
<td rowspan="1" colspan="1">0.366</td>
<td rowspan="1" colspan="1">
<bold>0.485</bold>
</td>
<td rowspan="1" colspan="1">0.304</td>
<td rowspan="1" colspan="1">0.447</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.408</td>
<td rowspan="1" colspan="1">0.3</td>
<td rowspan="1" colspan="1">0.324</td>
<td rowspan="1" colspan="1">0.278</td>
<td rowspan="1" colspan="1">
<bold>0.427</bold>
</td>
<td rowspan="1" colspan="1">0.275</td>
<td rowspan="1" colspan="1">0.313</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Oct-1</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.327</td>
<td rowspan="1" colspan="1">0.263</td>
<td rowspan="1" colspan="1">0.286</td>
<td rowspan="1" colspan="1">0.281</td>
<td rowspan="1" colspan="1">0.244</td>
<td rowspan="1" colspan="1">
<bold>0.315</bold>
</td>
<td rowspan="1" colspan="1">0.302</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.446</td>
<td rowspan="1" colspan="1">0.308</td>
<td rowspan="1" colspan="1">0.264</td>
<td rowspan="1" colspan="1">0.272</td>
<td rowspan="1" colspan="1">0.291</td>
<td rowspan="1" colspan="1">0.213</td>
<td rowspan="1" colspan="1">
<bold>0.359</bold>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Rap1</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.238</td>
<td rowspan="1" colspan="1">0.273</td>
<td rowspan="1" colspan="1">0.338</td>
<td rowspan="1" colspan="1">0.261</td>
<td rowspan="1" colspan="1">
<bold>0.382</bold>
</td>
<td rowspan="1" colspan="1">0.372</td>
<td rowspan="1" colspan="1">0.334</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.275</td>
<td rowspan="1" colspan="1">0.239</td>
<td rowspan="1" colspan="1">0.254</td>
<td rowspan="1" colspan="1">0.205</td>
<td rowspan="1" colspan="1">
<bold>0.359</bold>
</td>
<td rowspan="1" colspan="1">0.357</td>
<td rowspan="1" colspan="1">0.270</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Zif268</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.421</td>
<td rowspan="1" colspan="1">0.293</td>
<td rowspan="1" colspan="1">0.265</td>
<td rowspan="1" colspan="1">0.292</td>
<td rowspan="1" colspan="1">0.336</td>
<td rowspan="1" colspan="1">0.276</td>
<td rowspan="1" colspan="1">
<bold>0.338</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.346</td>
<td rowspan="1" colspan="1">0.279</td>
<td rowspan="1" colspan="1">0.246</td>
<td rowspan="1" colspan="1">0.196</td>
<td rowspan="1" colspan="1">0.308</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">
<bold>0.336</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkt574-TF1">
<p>The rank correlations were computed between the median intensities of the positive probes and the binding preferences predicted by each motif model. The performance values of all the methods except kmerHMM are adopted from
<xref ref-type="table" rid="gkt574-T1">Table 1</xref>
on the RankMotif++ manuscript (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). The highest values (except the 8-mer gold standard) are highlighted in bold. The 8-mer gold standard is the method in which the maximum of the median binding intensities of the 8-mers on a testing probe (60 bp) is used as the predicted binding preference of the testing probe (60 bp).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="gkt574-T2" position="float">
<label>Table 2.</label>
<caption>
<p>True positive rates at 1% false positive rate</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">TF</th>
<th rowspan="1" colspan="1">Array</th>
<th rowspan="1" colspan="1">8-mer</th>
<th rowspan="1" colspan="1">MatrixREDUCE</th>
<th rowspan="1" colspan="1">MDScan</th>
<th rowspan="1" colspan="1">PREGO</th>
<th rowspan="1" colspan="1">RankMotif++</th>
<th rowspan="1" colspan="1">Seed and Wobble</th>
<th rowspan="1" colspan="1">kmerHMM</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="2" colspan="1">Cbf1</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.515</td>
<td rowspan="1" colspan="1">0.39</td>
<td rowspan="1" colspan="1">0.231</td>
<td rowspan="1" colspan="1">0.362</td>
<td rowspan="1" colspan="1">0.493</td>
<td rowspan="1" colspan="1">0.383</td>
<td rowspan="1" colspan="1">
<bold>0.515</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.459</td>
<td rowspan="1" colspan="1">0.348</td>
<td rowspan="1" colspan="1">0.202</td>
<td rowspan="1" colspan="1">0.336</td>
<td rowspan="1" colspan="1">0.424</td>
<td rowspan="1" colspan="1">0.284</td>
<td rowspan="1" colspan="1">
<bold>0.462</bold>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Ceh-22</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.37</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">0.316</td>
<td rowspan="1" colspan="1">0.225</td>
<td rowspan="1" colspan="1">
<bold>0.427</bold>
</td>
<td rowspan="1" colspan="1">0.254</td>
<td rowspan="1" colspan="1">0.380</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.257</td>
<td rowspan="1" colspan="1">0.226</td>
<td rowspan="1" colspan="1">0.293</td>
<td rowspan="1" colspan="1">0.2</td>
<td rowspan="1" colspan="1">
<bold>0.332</bold>
</td>
<td rowspan="1" colspan="1">0.251</td>
<td rowspan="1" colspan="1">0.317</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Oct-1</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.474</td>
<td rowspan="1" colspan="1">0.365</td>
<td rowspan="1" colspan="1">0.274</td>
<td rowspan="1" colspan="1">0.339</td>
<td rowspan="1" colspan="1">0.315</td>
<td rowspan="1" colspan="1">0.239</td>
<td rowspan="1" colspan="1">
<bold>0.440</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.382</td>
<td rowspan="1" colspan="1">0.31</td>
<td rowspan="1" colspan="1">0.213</td>
<td rowspan="1" colspan="1">0.274</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">0.202</td>
<td rowspan="1" colspan="1">
<bold>0.314</bold>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Rap1</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.257</td>
<td rowspan="1" colspan="1">0.197</td>
<td rowspan="1" colspan="1">0.213</td>
<td rowspan="1" colspan="1">0.197</td>
<td rowspan="1" colspan="1">0.247</td>
<td rowspan="1" colspan="1">0.226</td>
<td rowspan="1" colspan="1">
<bold>0.274</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.277</td>
<td rowspan="1" colspan="1">0.171</td>
<td rowspan="1" colspan="1">0.32</td>
<td rowspan="1" colspan="1">0.179</td>
<td rowspan="1" colspan="1">
<bold>0.325</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">0.243</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Zif268</td>
<td rowspan="1" colspan="1">#1</td>
<td rowspan="1" colspan="1">0.449</td>
<td rowspan="1" colspan="1">0.332</td>
<td rowspan="1" colspan="1">0.335</td>
<td rowspan="1" colspan="1">0.328</td>
<td rowspan="1" colspan="1">0.33</td>
<td rowspan="1" colspan="1">0.336</td>
<td rowspan="1" colspan="1">
<bold>0.439</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">#2</td>
<td rowspan="1" colspan="1">0.431</td>
<td rowspan="1" colspan="1">0.297</td>
<td rowspan="1" colspan="1">0.314</td>
<td rowspan="1" colspan="1">0.301</td>
<td rowspan="1" colspan="1">0.389</td>
<td rowspan="1" colspan="1">0.313</td>
<td rowspan="1" colspan="1">
<bold>0.413</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkt574-TF2">
<p>Given the binding preferences of each method on different data sets, their sensitivities (true-positive rates) were computed at the 99% specificity level (false-positive rate). The performance values of all the methods except kmerHMM are adopted from
<xref ref-type="table" rid="gkt574-T1">Table 1</xref>
on the RankMotif++ manuscript (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
). The highest values (except the 8-mer gold standard) are highlighted in bold. The 8-mer gold standard is the method in which the maximum of the median binding intensities of the 8-mers on a testing probe (60 bp) is used as the predicted binding preference of the testing probe (60 bp).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec>
<title>Sensitivity analysis</title>
<p>To be comparable with the past results, the positive k-mers are defined as the k-mers
<italic>y</italic>
whose median signal intensity
<inline-formula>
<inline-graphic xlink:href="gkt574i21.jpg"></inline-graphic>
</inline-formula>
where
<italic>mi</italic>
and σ are the median and the MAD of the normalized intensities
<inline-formula>
<inline-graphic xlink:href="gkt574i22.jpg"></inline-graphic>
</inline-formula>
divided by 0.6745 (the MAD of the unit normal distribution), respectively. It is also the threshold to define the positive probes in this study. Nonetheless, the condition may be too stringent; therefore, we have conducted a sensitivity analysis on the condition from
<inline-formula>
<inline-graphic xlink:href="gkt574i23.jpg"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="gkt574i24.jpg"></inline-graphic>
</inline-formula>
to have a better understanding on kmerHMM. The results are depicted in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Figures S1</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">S2</ext-link>
. It can be observed that the area under curve values decrease as the condition becomes more stringent, whereas the spearman rank correlation appears to be fairly stable. Last but not least, the true positive rate also appears to be stable until the condition
<inline-formula>
<inline-graphic xlink:href="gkt574i25.jpg"></inline-graphic>
</inline-formula>
, after which the true positive rate drops sharply.</p>
</sec>
<sec>
<title>Positional bias</title>
<p>It has been reported that, in a PBM experiment, the k-mer position on a probe may affect the protein–DNA-binding efficiency (
<xref ref-type="bibr" rid="gkt574-B7">7</xref>
). Zhao and Stormo proposed a method to take into account the positional bias (
<xref ref-type="bibr" rid="gkt574-B60">60</xref>
). In kmerHMM, we have adopted the median intensities of the probes containing a k-mer to average out the positional bias. To examine how well such a strategy can deal with the positional bias, we have also implemented and calculated the positional bias coefficients
<inline-formula>
<inline-graphic xlink:href="gkt574i26.jpg"></inline-graphic>
</inline-formula>
where
<italic>j</italic>
is the position index (
<xref ref-type="bibr" rid="gkt574-B60">60</xref>
) for each array and incorporated it into the kmerHMM framework (as shown in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Figures S3</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">S4</ext-link>
). It is actually straightforward to integrate it into our kmmHMM framework, as we only need to modify the original
<italic>B</italic>
(
<italic>D</italic>
) function to a new function
<inline-formula>
<inline-graphic xlink:href="gkt574i27.jpg"></inline-graphic>
</inline-formula>
as follows:
<disp-formula>
<graphic xlink:href="gkt574um6"></graphic>
</disp-formula>
Semantically, the function
<inline-formula>
<inline-graphic xlink:href="gkt574i28.jpg"></inline-graphic>
</inline-formula>
considers all the binding events across the probe and calculate the probability that at least one binding event occurs; each binding event is weighted by the corresponding positional bias coefficient
<inline-formula>
<inline-graphic xlink:href="gkt574i29.jpg"></inline-graphic>
</inline-formula>
.</p>
<p>After the positional bias has been taken into account explicitly, we ran kmerHMM on the data set again. The results are depicted in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Figure S5</ext-link>
. It can be observed that there is a slight improvement on the Cbf1, Ceh-22 and Oct-1 data, whereas slight performance degradation can be seen on the Rap1 and Zif268 data. It is not surprising because Cbf1, Ceh-22 and Oct-1 data have clear and consistent trends in positional bias between Array #1 and #2, which Rap1 and Zif268 do not have (as shown in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Figures S3</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">S4</ext-link>
).</p>
</sec>
<sec>
<title>State transition path analysis</title>
<sec>
<title>Max-Product algorithm</title>
<p>In recent years, probabilistic graphical models have been successfully applied to biological problems such as gene clustering and alternative splicing (
<xref ref-type="bibr" rid="gkt574-B66 gkt574-B67 gkt574-B68">66–68</xref>
). In this work, our probabilistic graphical models (i.e. HMM) are learned from the PBM data, which may contain valuable motif information. In particular, we are interested in the most probable path encoded in each HMM trained, as such paths could represent multiple binding modes of a given DNA-binding protein. To solve such a problem, the max-product algorithm (belief propagation) can be used, as it provides us a computationally effective way to avoid the exponential enumerations of the possible state paths using dynamic programming. In addition, its optimality condition has been well studied (
<xref ref-type="bibr" rid="gkt574-B69">69</xref>
). In this work, we implemented and applied the max-product algorithm to the discrete Markov Chain of each HMM trained. In other words, the most probable state transition path is calculated for each HMM trained. The mathematical formulation can be found in the ‘Materials and Methods’ section.</p>
<p>After the most probable state transition path was calculated for each HMM, we mapped the corresponding emission distribution to each state in the state transition path, resulting in a path model similar to PWM for each HMM. Nonetheless, such a path model is not meant to be equivalent to a PWM, as it only represents the most probable emissions in an HMM, given a fixed path length. The resultant path models are depicted in
<xref ref-type="fig" rid="gkt574-F3">Figures 3</xref>
and
<xref ref-type="fig" rid="gkt574-F4">4</xref>
for Array #1 and #2, respectively. Comparing them with those past PWM models (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
), it can be observed that the most probable paths encoded in kmerHMM are similar to those discovered by the existing PWM-based methods (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Figure S6</ext-link>
).
<fig id="gkt574-F3" position="float">
<label>Figure 3.</label>
<caption>
<p>The motif logos mapped and plotted from the most probable state transition paths of the HMMs trained by kmerHMM on Array #1. Those most probable state transition paths are found using the max-product algorithm. The trailing gaps are trimmed.</p>
</caption>
<graphic xlink:href="gkt574f3p"></graphic>
</fig>
<fig id="gkt574-F4" position="float">
<label>Figure 4.</label>
<caption>
<p>The motif logos mapped and plotted from the most probable state transition paths of the HMMs trained by kmerHMM on Array #2. Those most probable state transition paths are found using the max-product algorithm. The trailing gaps are trimmed.</p>
</caption>
<graphic xlink:href="gkt574f4p"></graphic>
</fig>
</p>
</sec>
<sec>
<title>N-Max-Product algorithm</title>
<p>It has been reported that some DNA-binding proteins could bind to more than one motif models (
<xref ref-type="bibr" rid="gkt574-B56">56</xref>
). To tackle this, we need to add an additional step to elucidate different motif models from each HMM trained. We herein propose N-Max-Product algorithm to solve the problem. Although the most intuitive solution is to add a sequence clustering step to separate the set of k-mers before the multiple sequence alignment step, such a preprocessing clustering step may lose motif information if the motif models overlap with each other.</p>
<p>To extract the multimodal motif information from the HMMs trained, the N-max-product algorithm (
<xref ref-type="bibr" rid="gkt574-B70">70</xref>
) is implemented and applied to find the top N most probable state transition paths from the state transition Markov chain in each trained HMM. Using the same method described in the previous section, each state transition path can become an individual path model but, owing to the probabilistic locality, it is likely that those top paths may just be the ones with small variations from the most probable path. Thus, a large value of N needs to be chosen. A clustering step is also needed to summarize them. Specifically, after the top N most probable state transition paths are found, we can map the emission probability distribution to each state in each path. By doing so, we can apply a clustering method to cluster the paths and get a consensus state path model for each cluster.</p>
<p>As an illustrative example, such a procedure was implemented and applied to the Oct-1 Array#1. In the implementation level, we set the value of N such that the (N+1)-th path occurring probability is numerically negligible (i.e.
<inline-formula>
<inline-graphic xlink:href="gkt574i30.jpg"></inline-graphic>
</inline-formula>
). The single-linkage hierarchical clustering was applied to the top N most probable state transition paths to build a cluster dendrogram. To determine the number of clusters, a dendrogram cutoff was chosen such that the mean of the silhouette values was the highest, resulting in two clusters. Their centroids are extracted and depicted with the other sequence logos obtained by the other methods in
<xref ref-type="fig" rid="gkt574-F5">Figure 5</xref>
. To quantify their similarities, STAMP was used to calculate the expected value for pair-wise comparisons (
<xref ref-type="bibr" rid="gkt574-B71">71</xref>
). It can be observed that the first centroid path model is most similar to the one obtained by Seed & Wobble, whereas the second centroid path model is most similar to the one obtained by MatrixREDUCE. Those two motifs have been confirmed by the previous independent wet-lab experiments by Verrijzer
<italic>et al.</italic>
(
<xref ref-type="bibr" rid="gkt574-B72">72</xref>
).
<fig id="gkt574-F5" position="float">
<label>Figure 5.</label>
<caption>
<p>Comparison of PWMs of Oct-1 as predicted by different methods. For the left-most column, the top entry is the silhouette plot for cluster analysis, whereas the other two entries indicate the two centroid path models of the HMMs trained by kmerHMM on the Oct-1 Array #1 data set. The first row shows the sequence logos for the PWMs learned by different methods on the Oct1 Array #1 data set [conducted by Chen
<italic>et al.</italic>
The figures are edited from (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
)]. The remaining numeric entries are the expected values for the pair-wise motif matrix comparisons by STAMP (
<xref ref-type="bibr" rid="gkt574-B70">70</xref>
). Those two centroids have been confirmed by the independent wet-lab experiments by Verrijzer
<italic>et al.</italic>
(
<xref ref-type="bibr" rid="gkt574-B71">71</xref>
).</p>
</caption>
<graphic xlink:href="gkt574f5p"></graphic>
</fig>
</p>
<p>We used the two centroid path models to scan the probe sequences in the Oct-1 Array#2 again, following the previous evaluation procedure style. The maximal probability of either path model was taken as the predicted binding preference for each probe sequence. The resultant correlation coefficient between the predicted binding preferences (i.e. ranks) and the measured binding preferences is 0.3264, which is the highest among all the PWM-based methods shown in
<xref ref-type="table" rid="gkt574-T1">Table 1</xref>
. It reflects that kmerHMM can capture multimodal motifs using the HMM modeling and training, explaining why it could perform better than the others in some cases.</p>
<p>We have examined the modes of the state transition paths for the two clusters. Interestingly, we observe that they have different state transition paths, independent of each other. It reflects that HMM modeling is necessary for multimodal motifs, comparing with other modeling in which state transition path topology is restricted to a principal state transition path manually.</p>
</sec>
</sec>
<sec>
<title>Further evaluation on mouse PBM data</title>
<p>To evaluate kmerHMM further, kmerHMM and RankMotif++ were run and tested on the PBM microarray data provided in the comprehensive mouse data set (
<xref ref-type="bibr" rid="gkt574-B56">56</xref>
). The results are tabulated in
<xref ref-type="table" rid="gkt574-T3">Tables 3</xref>
and
<xref ref-type="table" rid="gkt574-T4">4</xref>
; array #1 is used as training data in
<xref ref-type="table" rid="gkt574-T3">Table 3</xref>
and array #2 is used as training data in
<xref ref-type="table" rid="gkt574-T4">Table 4</xref>
. Interestingly, it can be observed that kmerHMM can consistently achieve higher true positive rates than RankMotif++, which was specifically designed to analyze PBM data (
<xref ref-type="bibr" rid="gkt574-B57">57</xref>
).
<table-wrap id="gkt574-T3" position="float">
<label>Table 3.</label>
<caption>
<p>Comparisons between kmerHMM and RankMotif++ on the mouse data set (
<xref ref-type="bibr" rid="gkt574-B56">56</xref>
)</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">SR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">TPR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">AUC
<hr></hr>
</th>
<th rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">SR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">TPR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">AUC
<hr></hr>
</th>
</tr>
<tr>
<th rowspan="1" colspan="1">TF</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">TF</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Arid3a</td>
<td rowspan="1" colspan="1">
<bold>0.34</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Osr2</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.77</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ascl2</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.52</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
<td rowspan="1" colspan="1">Plagl1</td>
<td rowspan="1" colspan="1">0.36</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.51</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
<td rowspan="1" colspan="1">0.89</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Bcl6b</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">−0.10</td>
<td rowspan="1" colspan="1">
<bold>0.17</bold>
</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.69</bold>
</td>
<td rowspan="1" colspan="1">0.63</td>
<td rowspan="1" colspan="1">Rfx3</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Bhlhb2</td>
<td rowspan="1" colspan="1">
<bold>0.60</bold>
</td>
<td rowspan="1" colspan="1">0.46</td>
<td rowspan="1" colspan="1">
<bold>0.57</bold>
</td>
<td rowspan="1" colspan="1">0.35</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.92</td>
<td rowspan="1" colspan="1">Rfx4</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.80</bold>
</td>
<td rowspan="1" colspan="1">0.77</td>
</tr>
<tr>
<td rowspan="1" colspan="1">E2F2</td>
<td rowspan="1" colspan="1">
<bold>0.43</bold>
</td>
<td rowspan="1" colspan="1">0.23</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.23</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
<td rowspan="1" colspan="1">Rfxdc2</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
</tr>
<tr>
<td rowspan="1" colspan="1">E2F3</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.58</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.98</bold>
</td>
<td rowspan="1" colspan="1">0.91</td>
<td rowspan="1" colspan="1">Rxra</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.03</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">
<bold>0.72</bold>
</td>
<td rowspan="1" colspan="1">0.53</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Egr1</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">
<bold>0.57</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">Sfpi1</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.19</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ehf</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.79</bold>
</td>
<td rowspan="1" colspan="1">0.12</td>
<td rowspan="1" colspan="1">
<bold>0.99</bold>
</td>
<td rowspan="1" colspan="1">0.77</td>
<td rowspan="1" colspan="1">Sox11</td>
<td rowspan="1" colspan="1">−0.10</td>
<td rowspan="1" colspan="1">
<bold>0.12</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.71</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Elf3</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.23</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">0.85</td>
<td rowspan="1" colspan="1">
<bold>0.86</bold>
</td>
<td rowspan="1" colspan="1">Sox14</td>
<td rowspan="1" colspan="1">0.08</td>
<td rowspan="1" colspan="1">
<bold>0.08</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.84</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Eomes</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.78</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">Sox15</td>
<td rowspan="1" colspan="1">
<bold>0.15</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Esrra</td>
<td rowspan="1" colspan="1">
<bold>0.43</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.78</td>
<td rowspan="1" colspan="1">Sox17</td>
<td rowspan="1" colspan="1">−0.24</td>
<td rowspan="1" colspan="1">
<bold>0.00</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
<td rowspan="1" colspan="1">0.74</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxa2</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">−0.01</td>
<td rowspan="1" colspan="1">
<bold>0.74</bold>
</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
<td rowspan="1" colspan="1">0.78</td>
<td rowspan="1" colspan="1">Sox18</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.85</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxj1</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.45</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">0.79</td>
<td rowspan="1" colspan="1">
<bold>0.84</bold>
</td>
<td rowspan="1" colspan="1">Sox21</td>
<td rowspan="1" colspan="1">
<bold>0.03</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">
<bold>0.15</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxj3</td>
<td rowspan="1" colspan="1">
<bold>0.46</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.35</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Sox30</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.09</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxk1</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">
<bold>0.48</bold>
</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">Sox4</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">
<bold>0.11</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.35</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">0.81</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxl1</td>
<td rowspan="1" colspan="1">
<bold>0.47</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.57</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.87</td>
<td rowspan="1" colspan="1">Spdef</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gabpa</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Srf</td>
<td rowspan="1" colspan="1">
<bold>0.28</bold>
</td>
<td rowspan="1" colspan="1">0.23</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.01</td>
<td rowspan="1" colspan="1">
<bold>0.86</bold>
</td>
<td rowspan="1" colspan="1">0.70</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gata3</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.87</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">Sry</td>
<td rowspan="1" colspan="1">
<bold>0.15</bold>
</td>
<td rowspan="1" colspan="1">−0.02</td>
<td rowspan="1" colspan="1">
<bold>0.11</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.80</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gata5</td>
<td rowspan="1" colspan="1">
<bold>−0.15</bold>
</td>
<td rowspan="1" colspan="1">−0.18</td>
<td rowspan="1" colspan="1">
<bold>0.44</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.73</td>
<td rowspan="1" colspan="1">Tbp</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.90</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gata6</td>
<td rowspan="1" colspan="1">0.00</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">Tcf1</td>
<td rowspan="1" colspan="1">
<bold>0.14</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gcm1</td>
<td rowspan="1" colspan="1">
<bold>0.35</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.35</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.73</td>
<td rowspan="1" colspan="1">Tcf3</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">−0.16</td>
<td rowspan="1" colspan="1">
<bold>0.69</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.66</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gm397</td>
<td rowspan="1" colspan="1">
<bold>0.34</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.62</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.96</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
<td rowspan="1" colspan="1">Tcf7</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.64</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gmeb1</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.12</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">Tcf7l2</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>1.00</bold>
</td>
<td rowspan="1" colspan="1">0.74</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hic1</td>
<td rowspan="1" colspan="1">
<bold>0.46</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.08</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.75</td>
<td rowspan="1" colspan="1">Tcfap2a</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.31</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.90</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hnf4a</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">Tcfap2b</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">
<bold>0.58</bold>
</td>
<td rowspan="1" colspan="1">0.43</td>
<td rowspan="1" colspan="1">
<bold>0.98</bold>
</td>
<td rowspan="1" colspan="1">0.94</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hoxa3</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Tcfap2c</td>
<td rowspan="1" colspan="1">
<bold>0.49</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">
<bold>0.28</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">0.87</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf3</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.84</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Tcfap2e</td>
<td rowspan="1" colspan="1">
<bold>0.54</bold>
</td>
<td rowspan="1" colspan="1">−0.08</td>
<td rowspan="1" colspan="1">
<bold>0.52</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.84</bold>
</td>
<td rowspan="1" colspan="1">0.68</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf4</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
<td rowspan="1" colspan="1">Tcfe2a</td>
<td rowspan="1" colspan="1">
<bold>0.69</bold>
</td>
<td rowspan="1" colspan="1">0.37</td>
<td rowspan="1" colspan="1">
<bold>0.72</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.90</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf5</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">Zbtb12</td>
<td rowspan="1" colspan="1">
<bold>0.32</bold>
</td>
<td rowspan="1" colspan="1">−0.10</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.71</bold>
</td>
<td rowspan="1" colspan="1">0.64</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf6</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
<td rowspan="1" colspan="1">0.81</td>
<td rowspan="1" colspan="1">Zbtb3</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">−0.02</td>
<td rowspan="1" colspan="1">
<bold>0.65</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
<td rowspan="1" colspan="1">0.72</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Isgf3g</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">Zbtb7b</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">0.39</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.89</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Jundm2</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.03</td>
<td rowspan="1" colspan="1">
<bold>0.73</bold>
</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
<td rowspan="1" colspan="1">Zfp105</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.32</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Klf7</td>
<td rowspan="1" colspan="1">0.01</td>
<td rowspan="1" colspan="1">
<bold>0.08</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.77</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Zfp128</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.74</bold>
</td>
<td rowspan="1" colspan="1">0.00</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.74</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Mafb</td>
<td rowspan="1" colspan="1">
<bold>0.11</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.15</bold>
</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.71</bold>
</td>
<td rowspan="1" colspan="1">0.70</td>
<td rowspan="1" colspan="1">Zfp161</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.29</td>
<td rowspan="1" colspan="1">
<bold>0.67</bold>
</td>
<td rowspan="1" colspan="1">0.38</td>
<td rowspan="1" colspan="1">
<bold>0.98</bold>
</td>
<td rowspan="1" colspan="1">0.94</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Mafk</td>
<td rowspan="1" colspan="1">
<bold>−0.05</bold>
</td>
<td rowspan="1" colspan="1">−0.12</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.86</bold>
</td>
<td rowspan="1" colspan="1">0.81</td>
<td rowspan="1" colspan="1">Zfp281</td>
<td rowspan="1" colspan="1">
<bold>0.47</bold>
</td>
<td rowspan="1" colspan="1">0.45</td>
<td rowspan="1" colspan="1">
<bold>0.70</bold>
</td>
<td rowspan="1" colspan="1">0.39</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Max</td>
<td rowspan="1" colspan="1">
<bold>0.55</bold>
</td>
<td rowspan="1" colspan="1">0.33</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.85</td>
<td rowspan="1" colspan="1">Zfp410</td>
<td rowspan="1" colspan="1">
<bold>0.19</bold>
</td>
<td rowspan="1" colspan="1">−0.05</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.72</bold>
</td>
<td rowspan="1" colspan="1">0.62</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Myb</td>
<td rowspan="1" colspan="1">−0.26</td>
<td rowspan="1" colspan="1">
<bold>0.13</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.86</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
<td rowspan="1" colspan="1">Zfp691</td>
<td rowspan="1" colspan="1">
<bold>0.22</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.76</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Mybl1</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Zic1</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">
<bold>0.86</bold>
</td>
<td rowspan="1" colspan="1">0.81</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Myf6</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">
<bold>0.64</bold>
</td>
<td rowspan="1" colspan="1">0.03</td>
<td rowspan="1" colspan="1">
<bold>0.98</bold>
</td>
<td rowspan="1" colspan="1">0.61</td>
<td rowspan="1" colspan="1">Zic2</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Nkx3-1</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Zic3</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.77</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Nr2f2</td>
<td rowspan="1" colspan="1">
<bold>0.53</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">
<bold>0.57</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">Zscan4</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.75</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.96</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Osr1</td>
<td rowspan="1" colspan="1">
<bold>0.13</bold>
</td>
<td rowspan="1" colspan="1">−0.02</td>
<td rowspan="1" colspan="1">
<bold>0.52</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.79</bold>
</td>
<td rowspan="1" colspan="1">0.67</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkt574-TF3">
<p>They have been trained on Array #1 and tested on Array #2 where SR denotes Spearman Rank Correlation, TPR denotes True Positive Rate, AUC denotes Area Under ROC Curve, HMM denotes kmerHMM and RM denotes RankMotif++. The bold values indicate which method (HMM v.s. RM) performs better at a particular test.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="gkt574-T4" position="float">
<label>Table 4.</label>
<caption>
<p>Comparisons between kmerHMM and RankMotif++ on the mouse DNA-binding TF data set (
<xref ref-type="bibr" rid="gkt574-B56">56</xref>
)</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">SR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">TPR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">AUC
<hr></hr>
</th>
<th rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">SR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">TPR
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">AUC
<hr></hr>
</th>
</tr>
<tr>
<th rowspan="1" colspan="1">TF</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">TF</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
<th rowspan="1" colspan="1">HMM</th>
<th rowspan="1" colspan="1">RM</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Arid3a</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
<td rowspan="1" colspan="1">Osr2</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.46</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ascl2</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.62</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.89</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
<td rowspan="1" colspan="1">Plagl1</td>
<td rowspan="1" colspan="1">
<bold>0.49</bold>
</td>
<td rowspan="1" colspan="1">0.39</td>
<td rowspan="1" colspan="1">
<bold>0.57</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
<td rowspan="1" colspan="1">0.89</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Bcl6b</td>
<td rowspan="1" colspan="1">
<bold>0.12</bold>
</td>
<td rowspan="1" colspan="1">−0.10</td>
<td rowspan="1" colspan="1">
<bold>0.10</bold>
</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.77</bold>
</td>
<td rowspan="1" colspan="1">0.63</td>
<td rowspan="1" colspan="1">Rfx3</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">0.90</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Bhlhb2</td>
<td rowspan="1" colspan="1">
<bold>0.55</bold>
</td>
<td rowspan="1" colspan="1">0.46</td>
<td rowspan="1" colspan="1">
<bold>0.58</bold>
</td>
<td rowspan="1" colspan="1">0.35</td>
<td rowspan="1" colspan="1">0.92</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">Rfx4</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
<td rowspan="1" colspan="1">0.77</td>
</tr>
<tr>
<td rowspan="1" colspan="1">E2F2</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.23</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.23</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
<td rowspan="1" colspan="1">Rfxdc2</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.43</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
</tr>
<tr>
<td rowspan="1" colspan="1">E2F3</td>
<td rowspan="1" colspan="1">
<bold>0.32</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.96</bold>
</td>
<td rowspan="1" colspan="1">0.91</td>
<td rowspan="1" colspan="1">Rxra</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.03</td>
<td rowspan="1" colspan="1">
<bold>0.17</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">
<bold>0.81</bold>
</td>
<td rowspan="1" colspan="1">0.53</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Egr1</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">
<bold>0.84</bold>
</td>
<td rowspan="1" colspan="1">Sfpi1</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.19</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.80</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ehf</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.32</bold>
</td>
<td rowspan="1" colspan="1">0.12</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.77</td>
<td rowspan="1" colspan="1">Sox11</td>
<td rowspan="1" colspan="1">−0.07</td>
<td rowspan="1" colspan="1">
<bold>0.12</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.28</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.74</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Elf3</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.55</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Sox14</td>
<td rowspan="1" colspan="1">−0.15</td>
<td rowspan="1" colspan="1">
<bold>0.08</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.16</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Eomes</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.67</bold>
</td>
<td rowspan="1" colspan="1">0.29</td>
<td rowspan="1" colspan="1">
<bold>0.98</bold>
</td>
<td rowspan="1" colspan="1">0.87</td>
<td rowspan="1" colspan="1">Sox15</td>
<td rowspan="1" colspan="1">
<bold>0.06</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.75</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Esrra</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.78</td>
<td rowspan="1" colspan="1">Sox17</td>
<td rowspan="1" colspan="1">−0.05</td>
<td rowspan="1" colspan="1">
<bold>0.00</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.13</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">0.74</td>
<td rowspan="1" colspan="1">
<bold>0.74</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxa2</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">−0.01</td>
<td rowspan="1" colspan="1">
<bold>0.52</bold>
</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.96</bold>
</td>
<td rowspan="1" colspan="1">0.78</td>
<td rowspan="1" colspan="1">Sox18</td>
<td rowspan="1" colspan="1">−0.04</td>
<td rowspan="1" colspan="1">
<bold>0.19</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.14</bold>
</td>
<td rowspan="1" colspan="1">0.74</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxj1</td>
<td rowspan="1" colspan="1">0.01</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.22</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">Sox21</td>
<td rowspan="1" colspan="1">−0.01</td>
<td rowspan="1" colspan="1">
<bold>0.02</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.14</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">0.74</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxj3</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.32</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Sox30</td>
<td rowspan="1" colspan="1">−0.13</td>
<td rowspan="1" colspan="1">
<bold>0.09</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.18</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">0.74</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxk1</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.02</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">0.09</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">Sox4</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.45</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Foxl1</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.55</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.87</td>
<td rowspan="1" colspan="1">Spdef</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">0.85</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gabpa</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.47</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Srf</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.02</bold>
</td>
<td rowspan="1" colspan="1">0.01</td>
<td rowspan="1" colspan="1">
<bold>0.73</bold>
</td>
<td rowspan="1" colspan="1">0.70</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gata3</td>
<td rowspan="1" colspan="1">
<bold>0.19</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.89</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
<td rowspan="1" colspan="1">Sry</td>
<td rowspan="1" colspan="1">−0.21</td>
<td rowspan="1" colspan="1">
<bold>−0.02</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.15</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">0.70</td>
<td rowspan="1" colspan="1">
<bold>0.80</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gata5</td>
<td rowspan="1" colspan="1">−0.28</td>
<td rowspan="1" colspan="1">
<bold>−0.18</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.62</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.73</td>
<td rowspan="1" colspan="1">Tbp</td>
<td rowspan="1" colspan="1">−0.12</td>
<td rowspan="1" colspan="1">
<bold>0.31</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.94</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gata6</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">0.31</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">Tcf1</td>
<td rowspan="1" colspan="1">−0.08</td>
<td rowspan="1" colspan="1">
<bold>0.07</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gcm1</td>
<td rowspan="1" colspan="1">
<bold>0.51</bold>
</td>
<td rowspan="1" colspan="1">0.26</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.73</td>
<td rowspan="1" colspan="1">Tcf3</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">−0.16</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.76</bold>
</td>
<td rowspan="1" colspan="1">0.66</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gm397</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.49</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.86</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
<td rowspan="1" colspan="1">Tcf7</td>
<td rowspan="1" colspan="1">
<bold>0.16</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.66</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.89</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gmeb1</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.12</td>
<td rowspan="1" colspan="1">
<bold>0.17</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">Tcf7l2</td>
<td rowspan="1" colspan="1">
<bold>0.32</bold>
</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">
<bold>0.43</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.74</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hic1</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.34</bold>
</td>
<td rowspan="1" colspan="1">0.08</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.75</td>
<td rowspan="1" colspan="1">Tcfap2a</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.31</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.90</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hnf4a</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">Tcfap2b</td>
<td rowspan="1" colspan="1">
<bold>0.30</bold>
</td>
<td rowspan="1" colspan="1">0.25</td>
<td rowspan="1" colspan="1">
<bold>0.54</bold>
</td>
<td rowspan="1" colspan="1">0.43</td>
<td rowspan="1" colspan="1">
<bold>0.96</bold>
</td>
<td rowspan="1" colspan="1">0.94</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hoxa3</td>
<td rowspan="1" colspan="1">
<bold>0.48</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.93</bold>
</td>
<td rowspan="1" colspan="1">0.86</td>
<td rowspan="1" colspan="1">Tcfap2c</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">
<bold>0.45</bold>
</td>
<td rowspan="1" colspan="1">0.27</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.90</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf3</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">
<bold>0.14</bold>
</td>
<td rowspan="1" colspan="1">0.80</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
<td rowspan="1" colspan="1">Tcfap2e</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">−0.08</td>
<td rowspan="1" colspan="1">
<bold>0.59</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.68</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf4</td>
<td rowspan="1" colspan="1">
<bold>0.38</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Tcfe2a</td>
<td rowspan="1" colspan="1">
<bold>0.53</bold>
</td>
<td rowspan="1" colspan="1">0.37</td>
<td rowspan="1" colspan="1">
<bold>0.61</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.90</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf5</td>
<td rowspan="1" colspan="1">
<bold>0.45</bold>
</td>
<td rowspan="1" colspan="1">0.10</td>
<td rowspan="1" colspan="1">
<bold>0.33</bold>
</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
<td rowspan="1" colspan="1">Zbtb12</td>
<td rowspan="1" colspan="1">
<bold>0.35</bold>
</td>
<td rowspan="1" colspan="1">−0.10</td>
<td rowspan="1" colspan="1">
<bold>0.50</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
<td rowspan="1" colspan="1">0.64</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Irf6</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.90</bold>
</td>
<td rowspan="1" colspan="1">0.81</td>
<td rowspan="1" colspan="1">Zbtb3</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">−0.02</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
<td rowspan="1" colspan="1">0.72</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Isgf3g</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.45</bold>
</td>
<td rowspan="1" colspan="1">0.11</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
<td rowspan="1" colspan="1">Zbtb7b</td>
<td rowspan="1" colspan="1">
<bold>0.19</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.46</bold>
</td>
<td rowspan="1" colspan="1">0.42</td>
<td rowspan="1" colspan="1">0.92</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Jundm2</td>
<td rowspan="1" colspan="1">
<bold>0.42</bold>
</td>
<td rowspan="1" colspan="1">0.03</td>
<td rowspan="1" colspan="1">
<bold>0.58</bold>
</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">
<bold>0.96</bold>
</td>
<td rowspan="1" colspan="1">0.71</td>
<td rowspan="1" colspan="1">Zfp105</td>
<td rowspan="1" colspan="1">
<bold>0.29</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.85</bold>
</td>
<td rowspan="1" colspan="1">0.84</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Klf7</td>
<td rowspan="1" colspan="1">
<bold>0.17</bold>
</td>
<td rowspan="1" colspan="1">0.08</td>
<td rowspan="1" colspan="1">
<bold>0.56</bold>
</td>
<td rowspan="1" colspan="1">0.24</td>
<td rowspan="1" colspan="1">
<bold>0.94</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Zfp128</td>
<td rowspan="1" colspan="1">−0.20</td>
<td rowspan="1" colspan="1">
<bold>0.10</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.60</bold>
</td>
<td rowspan="1" colspan="1">0.00</td>
<td rowspan="1" colspan="1">
<bold>0.80</bold>
</td>
<td rowspan="1" colspan="1">0.74</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Mafb</td>
<td rowspan="1" colspan="1">−0.03</td>
<td rowspan="1" colspan="1">
<bold>0.07</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.08</bold>
</td>
<td rowspan="1" colspan="1">0.06</td>
<td rowspan="1" colspan="1">0.56</td>
<td rowspan="1" colspan="1">
<bold>0.70</bold>
</td>
<td rowspan="1" colspan="1">Zfp161</td>
<td rowspan="1" colspan="1">
<bold>0.43</bold>
</td>
<td rowspan="1" colspan="1">0.29</td>
<td rowspan="1" colspan="1">
<bold>0.54</bold>
</td>
<td rowspan="1" colspan="1">0.38</td>
<td rowspan="1" colspan="1">
<bold>0.97</bold>
</td>
<td rowspan="1" colspan="1">0.94</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Mafk</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">−0.12</td>
<td rowspan="1" colspan="1">
<bold>0.37</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.87</bold>
</td>
<td rowspan="1" colspan="1">0.81</td>
<td rowspan="1" colspan="1">Zfp281</td>
<td rowspan="1" colspan="1">
<bold>0.65</bold>
</td>
<td rowspan="1" colspan="1">0.45</td>
<td rowspan="1" colspan="1">
<bold>0.60</bold>
</td>
<td rowspan="1" colspan="1">0.39</td>
<td rowspan="1" colspan="1">
<bold>0.91</bold>
</td>
<td rowspan="1" colspan="1">0.88</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Max</td>
<td rowspan="1" colspan="1">
<bold>0.53</bold>
</td>
<td rowspan="1" colspan="1">0.33</td>
<td rowspan="1" colspan="1">
<bold>0.52</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.95</bold>
</td>
<td rowspan="1" colspan="1">0.85</td>
<td rowspan="1" colspan="1">Zfp410</td>
<td rowspan="1" colspan="1">
<bold>0.26</bold>
</td>
<td rowspan="1" colspan="1">−0.05</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.04</td>
<td rowspan="1" colspan="1">
<bold>0.75</bold>
</td>
<td rowspan="1" colspan="1">0.62</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Myb</td>
<td rowspan="1" colspan="1">
<bold>0.21</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.15</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
<td rowspan="1" colspan="1">0.79</td>
<td rowspan="1" colspan="1">Zfp691</td>
<td rowspan="1" colspan="1">
<bold>0.39</bold>
</td>
<td rowspan="1" colspan="1">0.13</td>
<td rowspan="1" colspan="1">
<bold>0.41</bold>
</td>
<td rowspan="1" colspan="1">0.14</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">
<bold>0.83</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Mybl1</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.36</bold>
</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.92</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Zic1</td>
<td rowspan="1" colspan="1">
<bold>0.23</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.17</bold>
</td>
<td rowspan="1" colspan="1">0.16</td>
<td rowspan="1" colspan="1">0.80</td>
<td rowspan="1" colspan="1">
<bold>0.81</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Myf6</td>
<td rowspan="1" colspan="1">
<bold>0.40</bold>
</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">
<bold>0.20</bold>
</td>
<td rowspan="1" colspan="1">0.03</td>
<td rowspan="1" colspan="1">
<bold>0.74</bold>
</td>
<td rowspan="1" colspan="1">0.61</td>
<td rowspan="1" colspan="1">Zic2</td>
<td rowspan="1" colspan="1">
<bold>0.24</bold>
</td>
<td rowspan="1" colspan="1">0.22</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.17</bold>
</td>
<td rowspan="1" colspan="1">0.78</td>
<td rowspan="1" colspan="1">
<bold>0.82</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Nkx3-1</td>
<td rowspan="1" colspan="1">0.17</td>
<td rowspan="1" colspan="1">
<bold>0.18</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.65</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.99</bold>
</td>
<td rowspan="1" colspan="1">0.82</td>
<td rowspan="1" colspan="1">Zic3</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.18</td>
<td rowspan="1" colspan="1">
<bold>0.25</bold>
</td>
<td rowspan="1" colspan="1">0.21</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.85</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Nr2f2</td>
<td rowspan="1" colspan="1">
<bold>0.44</bold>
</td>
<td rowspan="1" colspan="1">0.28</td>
<td rowspan="1" colspan="1">
<bold>0.43</bold>
</td>
<td rowspan="1" colspan="1">0.19</td>
<td rowspan="1" colspan="1">
<bold>0.88</bold>
</td>
<td rowspan="1" colspan="1">0.76</td>
<td rowspan="1" colspan="1">Zscan4</td>
<td rowspan="1" colspan="1">0.08</td>
<td rowspan="1" colspan="1">
<bold>0.18</bold>
</td>
<td rowspan="1" colspan="1">
<bold>0.80</bold>
</td>
<td rowspan="1" colspan="1">0.20</td>
<td rowspan="1" colspan="1">
<bold>0.99</bold>
</td>
<td rowspan="1" colspan="1">0.83</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Osr1</td>
<td rowspan="1" colspan="1">
<bold>0.47</bold>
</td>
<td rowspan="1" colspan="1">−0.02</td>
<td rowspan="1" colspan="1">
<bold>0.27</bold>
</td>
<td rowspan="1" colspan="1">0.07</td>
<td rowspan="1" colspan="1">
<bold>0.81</bold>
</td>
<td rowspan="1" colspan="1">0.67</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkt574-TF4">
<p>They have been trained on Array #2 and tested on Array #1 where SR denotes Spearman Rank Correlation, TPR denotes True Positive Rate, AUC denotes Area Under ROC Curve, HMM denotes kmerHMM and RM denotes RankMotif++. The bold values indicate which method (HMM v.s. RM) performs better at a particular test.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>Motivated by the good performance of kmmHMM in discovering multimodal binding of DNA-binding proteins, we next used it on all the mouse PBM data to discover how frequently a mouse TF can bind to multiple motifs. Thus, we repeated the previous state transition path analysis using the N-Max-Product algorithm on all the DNA-binding proteins we have studied. After we have removed similar matrix models at different thresholds using
<inline-formula>
<inline-graphic xlink:href="gkt574i31.jpg"></inline-graphic>
</inline-formula>
(mathematical details can be found in the
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Data</ext-link>
), we have obtained the results depicted in
<xref ref-type="fig" rid="gkt574-F6">Figure 6</xref>
. To quantify the statistical significance, we generated two thousand random motif matrix models of width from 5 to 15 uniformly. Nearly 2 million random pair-wise distances have been calculated to estimate the empirical
<italic>P</italic>
-value distribution for each distance threshold
<inline-formula>
<inline-graphic xlink:href="gkt574i33.jpg"></inline-graphic>
</inline-formula>
as depicted in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Figure S7</ext-link>
. It can be observed that the distance threshold
<inline-formula>
<inline-graphic xlink:href="gkt574i34.jpg"></inline-graphic>
</inline-formula>
becomes statistically significant at 0.5. At the estimate at
<inline-formula>
<inline-graphic xlink:href="gkt574i35.jpg"></inline-graphic>
</inline-formula>
(
<italic>P</italic>
= 0.003), ∼17% of the mouse DNA-binding proteins we have studied have more than one motif matrix model. Interestingly, it is similar to the estimate in the yeast DNA-binding protein collection, which 26% (39 of 150) proteins have more than one motif matrix model (
<xref ref-type="bibr" rid="gkt574-B73">73</xref>
).
<fig id="gkt574-F6" position="float">
<label>Figure 6.</label>
<caption>
<p>Percentage of multimodal DNA-binding proteins at different distance thresholds
<inline-formula>
<inline-graphic xlink:href="gkt574i32.jpg"></inline-graphic>
</inline-formula>
on Array #1 and #2. Blue/Green/Red denote the DNA-binding proteins, which have one/two/three motif matrix model(s), respectively.</p>
</caption>
<graphic xlink:href="gkt574f6p"></graphic>
</fig>
</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>DISCUSSION</title>
<p>In this study, we proposed a computational pipeline for PBM motif discovery in which HMMs are trained to model DNA motifs, and Belief Propagation is used to elucidate multiple motif models from each trained HMM. We compared it with other existing methods on benchmark PBM data sets and demonstrated its effectiveness and uniqueness (
<xref ref-type="table" rid="gkt574-T1 gkt574-T2 gkt574-T3 gkt574-T4">Tables 1–4</xref>
and
<xref ref-type="fig" rid="gkt574-F5">Figure 5</xref>
). The novelty of the method lies in two aspects. First, it outperforms the existing method in using HMM to derive an HMM model to represent PBM data. In our knowledge, this is the first instance that HMM is used in representing PBM data. Second, kmerHMM incorporates N-max algorithm and can derive multiple motif matrix models to represent PBM data.</p>
<p>In particular, we implemented a belief propagation method (max-product algorithm) and applied it to the HMMs trained. It can find the most probable state transition paths from the HMMs, representing the DNA-binding preference of the proteins in study. Moreover, the generalized method (N-Max-Product algorithm) has also been implemented and applied. The resultant case study also gave us insights into the multimodal pattern recognition ability of the method proposed. To the best of the authors’ knowledge, this work is the first study incorporating HMMs into the PBM motif discovery problem. In a broader sense, this work is also the first study incorporating max-product algorithms (belief propagations) into the general motif discovery problem explicitly.</p>
<p>The implication of such a study is not limited to motif discovery. From the state transition path analysis, we can observe that HMM training is effective in handling multimodal pattern recognitions, which other modeling methods may not be able to handle. We believe that HMMs should be examined further in other multimodal signal recognition domains. The potential drawback of the proposed approach is that it relies on a sliding window to segment DNA probe sequences into individual k-mers, which may lose the sequence context information. We expect such a limitation will be alleviated when a future improved PBM technology can generate binding affinity for longer probes (i.e. higher k value).</p>
<p>It has been recently intensively debated in the literature that, in light of the availability of high-throughput protein–DNA-binding affinity data, whether there is a need to develop more sophisticated models or simpler position weight matrices are sufficient to capture such binding landscape (
<xref ref-type="bibr" rid="gkt574-B60">60</xref>
,
<xref ref-type="bibr" rid="gkt574-B74">74</xref>
). In this work, we demonstrated that kmerHMM can capture multiple binding modes of a DNA-binding protein, for which a single position weight matrix model is unable to do. Nevertheless, we showed that decomposition of the trained HMM into two distinct position weight matrices did show comparable performance to the trained HMM itself on the Oct-1 data set (Spearman rank correlation 0.326 versus 0.359), suggesting that a more sophisticated model such as kmerHMM can achieve better performance, but the overall improvement is likely subtle. However, the strength of the kmerHMM is that it can distinguish distinct binding modes between a DNA-binding protein and its target sequence, which could provide biological insights on the subtlety of the gene regulation. We foresee that a method like kmerHMM is useful in this arena.</p>
</sec>
<sec sec-type="supplementary-material">
<title>SUPPLEMENTARY DATA</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/lookup/suppl/doi:10.1093/nar/gkt574/-/DC1">Supplementary Data</ext-link>
are available at NAR Online: Supplementary Figures 1–10 and Supplementary Methods.</p>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_41_16_e153__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gkt574_nar-00905-met-n-2013-File002.pdf"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>The authors thank the anonymous reviewers and Brendan Frey for their positive comments. The authors also thank Quaid Morris Lab for making RankMotif++ publicly available. The authors also thank Martha L. Bulyk Lab for making their PBM data publicly available.</p>
</ack>
<sec>
<title>FUNDING</title>
<p>Discovery Grant from
<funding-source>Natural Sciences and Engineering Research Council, Canada</funding-source>
(NSERC), grant number [
<award-id>327612-2009</award-id>
RGPIN to Z.Z.]; Acres Inc. - Joseph Yonan Memorial Fellowship, Kwok Sau Po Scholarship, and
<funding-source>International Research and Teaching Assistantship</funding-source>
from
<funding-source>University of Toronto</funding-source>
(to K.W.); Cofunded by
<funding-source>NSERC Canada Graduate Scholarship and Ontario Graduate Scholarship</funding-source>
(to Y.L.). Funding for open access charge:
<funding-source>Discovery Grant from NSERC</funding-source>
[
<award-id>327612-2009 RGPIN</award-id>
].</p>
<p>
<italic>Conflict of interest statement.</italic>
None declared.</p>
</sec>
<ref-list>
<title>REFERENCES</title>
<ref id="gkt574-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Moor</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>Eskin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Favorov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Assessing computational tools for the discovery of transcription factor binding sites</article-title>
<source>Nat. Biotech.</source>
<year>2005</year>
<volume>23</volume>
<fpage>137</fpage>
<lpage>144</lpage>
</element-citation>
</ref>
<ref id="gkt574-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Galas</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Schmitz</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>DNAse footprinting: a simple method for the detection of protein-DNA binding specificity</article-title>
<source>Nucleic Acids Res.</source>
<year>1987</year>
<volume>5</volume>
<fpage>3157</fpage>
<lpage>3170</lpage>
<pub-id pub-id-type="pmid">212715</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garner</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Revzin</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system</article-title>
<source>Nucleic Acids Res.</source>
<year>1981</year>
<volume>9</volume>
<fpage>3047</fpage>
<lpage>3060</lpage>
<pub-id pub-id-type="pmid">6269071</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Robert</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wyrick</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Aparicio</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Jennings</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Simon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zeitlinger</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schreiber</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hannett</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kanin</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide location and function of DNA binding proteins</article-title>
<source>Science</source>
<year>2000</year>
<volume>290</volume>
<fpage>2306</fpage>
<lpage>2309</lpage>
<pub-id pub-id-type="pmid">11125145</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Genome-wide mapping of in vivo protein-DNA interactions</article-title>
<source>Science</source>
<year>2007</year>
<volume>316</volume>
<fpage>1497</fpage>
<lpage>1502</lpage>
<pub-id pub-id-type="pmid">17540862</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>XS</given-names>
</name>
<name>
<surname>Brutlag</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments</article-title>
<source>Nat. Biotechnol.</source>
<year>Aug, 2002</year>
<volume>20</volume>
<fpage>835</fpage>
<lpage>839</lpage>
<pub-id pub-id-type="pmid">12101404</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Philippakis</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Qureshi</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>He</surname>
<given-names>FS</given-names>
</name>
<name>
<surname>Estep</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities</article-title>
<source>Nat. Biotechnol.</source>
<year>Nov, 2006</year>
<volume>24</volume>
<fpage>1429</fpage>
<lpage>1435</lpage>
<pub-id pub-id-type="pmid">16998473</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fordyce</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Gerber</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>DeRisi</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Quake</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis</article-title>
<source>Nat. Biotechnol.</source>
<year>2010</year>
<volume>28</volume>
<fpage>970</fpage>
<lpage>975</lpage>
<pub-id pub-id-type="pmid">20802496</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Onishi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rho</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Woodard</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Jeong</surname>
<given-names>JS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Profiling the human protein-DNA interactome reveals ERK2 as a transcriptional repressor of interferon signaling</article-title>
<source>Cell</source>
<year>2009</year>
<volume>139</volume>
<fpage>610</fpage>
<lpage>622</lpage>
<pub-id pub-id-type="pmid">19879846</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ho</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Jona</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Johnston</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Linking DNA-binding proteins to their recognition sequences by using protein microarrays</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2006</year>
<volume>103</volume>
<fpage>9940</fpage>
<lpage>9945</lpage>
<pub-id pub-id-type="pmid">16785442</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matys</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Kel-Margoulis</surname>
<given-names>OV</given-names>
</name>
<name>
<surname>Fricke</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Liebich</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Land</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Barre-Dirrie</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reuter</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Chekmenev</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Krull</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hornischer</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<fpage>108</fpage>
<lpage>110</lpage>
</element-citation>
</ref>
<ref id="gkt574-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Portales-Casamar</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Thongjuea</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kwon</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Arenillas</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Valen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Yusuf</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lenhard</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wasserman</surname>
<given-names>WW</given-names>
</name>
<name>
<surname>Sandelin</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D105</fpage>
<lpage>D110</lpage>
<pub-id pub-id-type="pmid">19906716</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bateman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Coin</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Finn</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Hollich</surname>
<given-names>V</given-names>
</name>
<name>
<surname>GrifRths-Jones</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Khanna</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Moxon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sonnhammer</surname>
<given-names>ELL</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Pfam protein families database</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>D138</fpage>
<lpage>D141</lpage>
<pub-id pub-id-type="pmid">14681378</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robasky</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D124</fpage>
<lpage>D128</lpage>
<pub-id pub-id-type="pmid">21037262</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spivak</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>ScerTF: a comprehensive database of benchmarked position weight matrices for
<italic>Saccharomyces</italic>
species</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>D162</fpage>
<lpage>D168</lpage>
<pub-id pub-id-type="pmid">22140105</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pfreundt</surname>
<given-names>U</given-names>
</name>
<name>
<surname>James</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Tweedie</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Teichmann</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Adryan</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D443</fpage>
<lpage>D447</lpage>
<pub-id pub-id-type="pmid">19884132</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>deBoer</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
</person-group>
<article-title>YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>D169</fpage>
<lpage>D179</lpage>
<pub-id pub-id-type="pmid">22102575</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Blackshaw</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>hPDI: a database of experimental human protein-DNA interactions</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<fpage>287</fpage>
<lpage>289</lpage>
<pub-id pub-id-type="pmid">19900953</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fulton</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Sundararajan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Wasserman</surname>
<given-names>WW</given-names>
</name>
<name>
<surname>Roach</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Sladek</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>TFCat: the curated catalog of mouse and human transcription factors</article-title>
<source>Genome Biol.</source>
<year>2009</year>
<volume>10</volume>
<fpage>R29</fpage>
<pub-id pub-id-type="pmid">19284633</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Luscombe</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Austin</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Berman</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>An overview of the structures of protein-DNA complexes</article-title>
<source>Genome Biol.</source>
<year>2000</year>
<volume>1</volume>
<fpage>REVIEWS001</fpage>
<pub-id pub-id-type="pmid">11104519</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Luscombe</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Laskowski</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level</article-title>
<source>Nucleic Acids Res.</source>
<year>2001</year>
<volume>29</volume>
<fpage>2860</fpage>
<lpage>2874</lpage>
<pub-id pub-id-type="pmid">11433033</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krishna</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Majumdar</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Grishin</surname>
<given-names>NV</given-names>
</name>
</person-group>
<article-title>Structural classification of zinc fingers: survey and summary</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>532</fpage>
<lpage>550</lpage>
<pub-id pub-id-type="pmid">12527760</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Luscombe</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity</article-title>
<source>J. Mol. Biol.</source>
<year>2002</year>
<volume>320</volume>
<fpage>991</fpage>
<lpage>1009</lpage>
<pub-id pub-id-type="pmid">12126620</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jones</surname>
<given-names>S</given-names>
</name>
<name>
<surname>van Heyningen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Berman</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>Protein-DNA interactions: a structural analysis</article-title>
<source>J. Mol. Biol.</source>
<year>1999</year>
<volume>287</volume>
<fpage>877</fpage>
<lpage>896</lpage>
<pub-id pub-id-type="pmid">10222198</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jones</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shanahan</surname>
<given-names>HP</given-names>
</name>
<name>
<surname>Berman</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>7189</fpage>
<lpage>7198</lpage>
<pub-id pub-id-type="pmid">14654694</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gunewardena</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jeavons</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Enhancing the prediction of transcription factor binding sites by incorporating structural properties and nucleotide covariations</article-title>
<source>J. Comput. Biol.</source>
<year>2006</year>
<volume>13</volume>
<fpage>929</fpage>
<lpage>945</lpage>
<pub-id pub-id-type="pmid">16761919</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sarai</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kono</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Protein-DNA recognition patterns and predictions</article-title>
<source>Annu. Rev. Biophys. Biomol. Struct.</source>
<year>2005</year>
<volume>34</volume>
<fpage>379</fpage>
<lpage>398</lpage>
<pub-id pub-id-type="pmid">15869395</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>Extracting sequence features to predict protein-DNA interactions: a comparative study</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>4137</fpage>
<lpage>4148</lpage>
<pub-id pub-id-type="pmid">18556756</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahmad</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gromiha</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Sarai</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>477</fpage>
<lpage>486</lpage>
<pub-id pub-id-type="pmid">14990443</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahmad</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Keskin</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Sarai</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nussinov</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Protein-DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>5922</fpage>
<lpage>5932</lpage>
<pub-id pub-id-type="pmid">18801847</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pham</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Clemente</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Satou</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>TB</given-names>
</name>
</person-group>
<article-title>Computational discovery of transcriptional regulatory rules</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>101</fpage>
<lpage>107</lpage>
</element-citation>
</ref>
<ref id="gkt574-B32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ofran</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Mysore</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Rost</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Prediction of DNA-binding residues from sequence</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>i347</fpage>
<lpage>i353</lpage>
<pub-id pub-id-type="pmid">17646316</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wong</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>KS</given-names>
</name>
</person-group>
<article-title>Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm</article-title>
<source>Soft. Comput.</source>
<year>2011</year>
<volume>15</volume>
<fpage>1631</fpage>
<lpage>1642</lpage>
</element-citation>
</ref>
<ref id="gkt574-B34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leung</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Lau</surname>
<given-names>CK</given-names>
</name>
<name>
<surname>Tsui</surname>
<given-names>SK</given-names>
</name>
</person-group>
<article-title>Discovering protein-DNA binding sequence patterns using association rule mining</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>6324</fpage>
<lpage>6337</lpage>
<pub-id pub-id-type="pmid">20529874</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Lau</surname>
<given-names>CK</given-names>
</name>
<name>
<surname>Tsui</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>KS</given-names>
</name>
</person-group>
<article-title>Discovering approximate-associated sequence patterns for protein-DNA interactions</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>471</fpage>
<lpage>478</lpage>
<pub-id pub-id-type="pmid">21193520</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>MacIsaac</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Fraenkel</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Practical strategies for discovering regulatory DNA sequence motifs</article-title>
<source>PLoS Comput. Biol.</source>
<year>2006</year>
<volume>2</volume>
<fpage>e36</fpage>
<pub-id pub-id-type="pmid">16683017</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kel</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Goessling</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Reuter</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Cheremushkin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kel-Margoulis</surname>
<given-names>OV</given-names>
</name>
<name>
<surname>Wingender</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>MATCH: a tool for searching transcription factor binding sites in DNA sequences</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>3576</fpage>
<lpage>3579</lpage>
<pub-id pub-id-type="pmid">12824369</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Computer methods for analyzing sequence recognition of nucleic acids</article-title>
<source>Annu. Rev. BioChem.</source>
<year>1988</year>
<volume>17</volume>
<fpage>241</fpage>
<lpage>263</lpage>
</element-citation>
</ref>
<ref id="gkt574-B39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jensen</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>XS</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>Computational discovery of gene regulatory binding motifs: a Bayesian perspective</article-title>
<source>Stat. Sci.</source>
<year>2004</year>
<volume>19</volume>
<fpage>188</fpage>
<lpage>204</lpage>
</element-citation>
</ref>
<ref id="gkt574-B40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sandve</surname>
<given-names>GK</given-names>
</name>
<name>
<surname>Abul</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Walseng</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Drablos</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Improved benchmarks for computational motif discovery</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<fpage>193</fpage>
<pub-id pub-id-type="pmid">17559676</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hughes</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Estep</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Tavazoie</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
</person-group>
<article-title>Computational identification of cis-regulatory elements associated with groups of functionally related genes in
<italic>Saccharomyces cerevisiae</italic>
</article-title>
<source>J. Mol. Biol.</source>
<year>2000</year>
<volume>296</volume>
<fpage>1205</fpage>
<lpage>1214</lpage>
<pub-id pub-id-type="pmid">10698627</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thijs</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lescot</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Marchal</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Rombauts</surname>
<given-names>S</given-names>
</name>
<name>
<surname>DeMoor</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Rouze</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Moreau</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>1113</fpage>
<lpage>1122</lpage>
<pub-id pub-id-type="pmid">11751219</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ao</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Gaudet</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Muttumu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mango</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR</article-title>
<source>Science</source>
<year>2004</year>
<volume>305</volume>
<fpage>1743</fpage>
<lpage>1746</lpage>
<pub-id pub-id-type="pmid">15375261</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Elkan</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>The value of prior knowledge in discovering motifs with MEME</article-title>
<source>Proc. Int. Conf. Intell. Syst. Mol. Biol.</source>
<year>1995</year>
<volume>3</volume>
<fpage>21</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="pmid">7584439</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Workman</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>ANN-Spec: a method for discovering transcription factor binding sites with improved specificity</article-title>
<source>Pac. Symp. Biocomput.</source>
<year>2000</year>
<fpage>467</fpage>
<lpage>478</lpage>
<pub-id pub-id-type="pmid">10902194</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B46">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Favorov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Gelfand</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Gerasimova</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Ravcheev</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Mironov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Makeev</surname>
<given-names>VJ</given-names>
</name>
</person-group>
<article-title>A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>2240</fpage>
<lpage>2245</lpage>
<pub-id pub-id-type="pmid">15728117</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B47">
<label>47</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>TFBS identification based on genetic algorithm with combined representations and adaptive post-processing</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>341</fpage>
<lpage>349</lpage>
<pub-id pub-id-type="pmid">18065426</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B48">
<label>48</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hertz</surname>
<given-names>GZ</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Identifying DNA and protein patterns with statistically significant alignments of multiple sequences</article-title>
<source>Bioinformatics</source>
<year>1999</year>
<volume>15</volume>
<fpage>563</fpage>
<lpage>577</lpage>
<pub-id pub-id-type="pmid">10487864</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B49">
<label>49</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Hansen</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Spouge</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Finding functional sequence elements by multiple local alignment</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>189</fpage>
<lpage>200</lpage>
<pub-id pub-id-type="pmid">14704356</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B50">
<label>50</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eskin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
</person-group>
<article-title>Finding composite regulatory patterns in DNA sequences</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<issue>Suppl. 1</issue>
<fpage>S354</fpage>
<lpage>S363</lpage>
<pub-id pub-id-type="pmid">12169566</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B51">
<label>51</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Helden</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Andre</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Collado-Vides</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies</article-title>
<source>J. Mol. Biol.</source>
<year>1998</year>
<volume>281</volume>
<fpage>827</fpage>
<lpage>842</lpage>
<pub-id pub-id-type="pmid">9719638</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B52">
<label>52</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gunewardena</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>A hybrid model for robust detection of transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>484</fpage>
<lpage>491</lpage>
<pub-id pub-id-type="pmid">18184687</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B53">
<label>53</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Régnier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Denise</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Rare events and conditional events on random strings</article-title>
<source>Discrete Math.</source>
<year>2004</year>
<volume>6</volume>
<fpage>191</fpage>
<lpage>214</lpage>
</element-citation>
</ref>
<ref id="gkt574-B54">
<label>54</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pavesi</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Mereghetti</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mauri</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pesole</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>199</fpage>
<lpage>203</lpage>
</element-citation>
</ref>
<ref id="gkt574-B55">
<label>55</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinha</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>3586</fpage>
<lpage>3588</lpage>
<pub-id pub-id-type="pmid">12824371</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B56">
<label>56</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Philippakis</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Talukder</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gehrke</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Jaeger</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Metzler</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Vedenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Diversity and complexity in DNA recognition by transcription factors</article-title>
<source>Science</source>
<year>2009</year>
<volume>324</volume>
<fpage>1720</fpage>
<lpage>1723</lpage>
<pub-id pub-id-type="pmid">19443739</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B57">
<label>57</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>Q</given-names>
</name>
</person-group>
<article-title>RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>i72</fpage>
<lpage>i79</lpage>
<pub-id pub-id-type="pmid">17646348</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B58">
<label>58</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Foat</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Houshmandi</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Olivas</surname>
<given-names>WM</given-names>
</name>
<name>
<surname>Bussemaker</surname>
<given-names>HJ</given-names>
</name>
</person-group>
<article-title>Profiling condition-specific, genome-wide regulation of mRNA stability in yeast</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>17675</fpage>
<lpage>17680</lpage>
<pub-id pub-id-type="pmid">16317069</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B59">
<label>59</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tanay</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Extensive low-affinity transcriptional interactions in the yeast genome</article-title>
<source>Genome Res.</source>
<year>2006</year>
<volume>16</volume>
<fpage>962</fpage>
<lpage>972</lpage>
<pub-id pub-id-type="pmid">16809671</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B60">
<label>60</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Quantitative analysis demonstrates most transcription factors require only simple models of specificity</article-title>
<source>Nat. Biotechnol.</source>
<year>2011</year>
<volume>29</volume>
<fpage>480</fpage>
<lpage>483</lpage>
<pub-id pub-id-type="pmid">21654662</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B61">
<label>61</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weirauch</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Cote</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Norel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Annala</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Saez-Rodriguez</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cokelaer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Vedenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Talukder</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evaluation of methods for modeling transcription factor sequence specificity</article-title>
<source>Nat. Biotechnol.</source>
<year>2013</year>
<volume>31</volume>
<fpage>126</fpage>
<lpage>134</lpage>
<pub-id pub-id-type="pmid">23354101</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B62">
<label>62</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berg</surname>
<given-names>OG</given-names>
</name>
<name>
<surname>von Hippel</surname>
<given-names>PH</given-names>
</name>
</person-group>
<article-title>Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters</article-title>
<source>J. Mol. Biol.</source>
<year>1987</year>
<volume>193</volume>
<fpage>723</fpage>
<lpage>750</lpage>
<pub-id pub-id-type="pmid">3612791</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B63">
<label>63</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Maximally efficient modeling of DNA sequence motifs at all levels of complexity</article-title>
<source>Genetics</source>
<year>2011</year>
<volume>187</volume>
<fpage>1219</fpage>
<lpage>1224</lpage>
<pub-id pub-id-type="pmid">21300846</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B64">
<label>64</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Krogh</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mitchison</surname>
<given-names>G</given-names>
</name>
</person-group>
<source>Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids</source>
<year>1998</year>
<publisher-name>Cambridge University Press</publisher-name>
</element-citation>
</ref>
<ref id="gkt574-B65">
<label>65</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Rabiner</surname>
<given-names>LR</given-names>
</name>
</person-group>
<source>Readings in Speech Recognition. Chapter A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition</source>
<year>1990</year>
<publisher-loc>San Francisco, CA, USA</publisher-loc>
<publisher-name>Morgan Kaufmann Publishers Inc</publisher-name>
<fpage>267</fpage>
<lpage>296</lpage>
</element-citation>
</ref>
<ref id="gkt574-B66">
<label>66</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frey</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Mohammad</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>QD</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Robinson</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Mnaimneh</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Sat</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Rossant</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs</article-title>
<source>Nat. Genet.</source>
<year>2005</year>
<volume>37</volume>
<fpage>991</fpage>
<lpage>996</lpage>
<pub-id pub-id-type="pmid">16127451</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B67">
<label>67</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frey</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Dueck</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Clustering by passing messages between data points</article-title>
<source>Science</source>
<year>2007</year>
<volume>315</volume>
<fpage>972</fpage>
<lpage>976</lpage>
<pub-id pub-id-type="pmid">17218491</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B68">
<label>68</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barash</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Calarco</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Shai</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Blencowe</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Frey</surname>
<given-names>BJ</given-names>
</name>
</person-group>
<article-title>Deciphering the splicing code</article-title>
<source>Nature</source>
<year>2010</year>
<volume>465</volume>
<fpage>53</fpage>
<lpage>59</lpage>
<pub-id pub-id-type="pmid">20445623</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B69">
<label>69</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weiss</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Freeman</surname>
<given-names>WT</given-names>
</name>
</person-group>
<article-title>On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs</article-title>
<source>IEEE Trans. Inf. Theory.</source>
<year>2001</year>
<volume>47</volume>
<fpage>736</fpage>
<lpage>744</lpage>
</element-citation>
</ref>
<ref id="gkt574-B70">
<label>70</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Barber</surname>
<given-names>D</given-names>
</name>
</person-group>
<source>Bayesian Reasoning and Machine Learning</source>
<year>2011</year>
<publisher-name>Cambridge University Press</publisher-name>
</element-citation>
</ref>
<ref id="gkt574-B71">
<label>71</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahony</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Benos</surname>
<given-names>PV</given-names>
</name>
</person-group>
<article-title>STAMP: a web tool for exploring DNA-binding motif similarities</article-title>
<source>Nucleic Acids Res.</source>
<year>2007</year>
<volume>35</volume>
<fpage>W253</fpage>
<lpage>W258</lpage>
<pub-id pub-id-type="pmid">17478497</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B72">
<label>72</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Verrijzer</surname>
<given-names>CP</given-names>
</name>
<name>
<surname>Alkema</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>van Weperen</surname>
<given-names>WW</given-names>
</name>
<name>
<surname>VanLeeuwen</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Strating</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>vander Vliet</surname>
<given-names>PC</given-names>
</name>
</person-group>
<article-title>The DNA binding specificity of the bipartite POU domain and its subdomains</article-title>
<source>EMBO J.</source>
<year>1992</year>
<volume>11</volume>
<fpage>4993</fpage>
<lpage>5003</lpage>
<pub-id pub-id-type="pmid">1361172</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B73">
<label>73</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gordan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>KF</given-names>
</name>
<name>
<surname>McCord</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Vedenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights</article-title>
<source>Genome Biol.</source>
<year>2011</year>
<volume>12</volume>
<fpage>R125</fpage>
<pub-id pub-id-type="pmid">22189060</pub-id>
</element-citation>
</ref>
<ref id="gkt574-B74">
<label>74</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Morris</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
</person-group>
<article-title>Jury remains out on simple models of transcription factor specificity</article-title>
<source>Nat. Biotechnol.</source>
<year>2011</year>
<volume>29</volume>
<fpage>483</fpage>
<lpage>484</lpage>
<pub-id pub-id-type="pmid">21654663</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F52 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000F52 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3763557
   |texte=   DNA motif elucidation using belief propagation
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:23814189" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021