Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000F49 ( Pmc/Corpus ); précédent : 000F489; suivant : 000F500 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information</title>
<author>
<name sortKey="Ma, Xiaotu" sort="Ma, Xiaotu" uniqKey="Ma X" first="Xiaotu" last="Ma">Xiaotu Ma</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Ashwinikumar" sort="Kulkarni, Ashwinikumar" uniqKey="Kulkarni A" first="Ashwinikumar" last="Kulkarni">Ashwinikumar Kulkarni</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Zhihua" sort="Zhang, Zhihua" uniqKey="Zhang Z" first="Zhihua" last="Zhang">Zhihua Zhang</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Xuan, Zhenyu" sort="Xuan, Zhenyu" uniqKey="Xuan Z" first="Zhenyu" last="Xuan">Zhenyu Xuan</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Serfling, Robert" sort="Serfling, Robert" uniqKey="Serfling R" first="Robert" last="Serfling">Robert Serfling</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gkr1135-AFF1">Department of Mathematics, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Michael Q" sort="Zhang, Michael Q" uniqKey="Zhang M" first="Michael Q." last="Zhang">Michael Q. Zhang</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Division of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22228832</idno>
<idno type="pmc">3326300</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326300</idno>
<idno type="RBID">PMC:3326300</idno>
<idno type="doi">10.1093/nar/gkr1135</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Pmc/Corpus">000F49</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F49</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information</title>
<author>
<name sortKey="Ma, Xiaotu" sort="Ma, Xiaotu" uniqKey="Ma X" first="Xiaotu" last="Ma">Xiaotu Ma</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Ashwinikumar" sort="Kulkarni, Ashwinikumar" uniqKey="Kulkarni A" first="Ashwinikumar" last="Kulkarni">Ashwinikumar Kulkarni</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Zhihua" sort="Zhang, Zhihua" uniqKey="Zhang Z" first="Zhihua" last="Zhang">Zhihua Zhang</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Xuan, Zhenyu" sort="Xuan, Zhenyu" uniqKey="Xuan Z" first="Zhenyu" last="Xuan">Zhenyu Xuan</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Serfling, Robert" sort="Serfling, Robert" uniqKey="Serfling R" first="Robert" last="Serfling">Robert Serfling</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="gkr1135-AFF1">Department of Mathematics, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Michael Q" sort="Zhang, Michael Q" uniqKey="Zhang M" first="Michael Q." last="Zhang">Michael Q. Zhang</name>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Department of Molecular and Cell Biology, Center for Systems Biology,</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="gkr1135-AFF1">Division of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new
<italic>k-</italic>
mer occurrence model to reflect the fact that functional DNA
<italic>k</italic>
-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional
<italic>k-</italic>
mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar
<italic>k</italic>
-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
<author>
<name sortKey="Li, N" uniqKey="Li N">N Li</name>
</author>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
<author>
<name sortKey="De Moor, B" uniqKey="De Moor B">B De Moor</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Favorov, Av" uniqKey="Favorov A">AV Favorov</name>
</author>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vaquerizas, Jm" uniqKey="Vaquerizas J">JM Vaquerizas</name>
</author>
<author>
<name sortKey="Kummerfeld, Sk" uniqKey="Kummerfeld S">SK Kummerfeld</name>
</author>
<author>
<name sortKey="Teichmann, Sa" uniqKey="Teichmann S">SA Teichmann</name>
</author>
<author>
<name sortKey="Luscombe, Nm" uniqKey="Luscombe N">NM Luscombe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Portales Casamar, E" uniqKey="Portales Casamar E">E Portales-Casamar</name>
</author>
<author>
<name sortKey="Thongjuea, S" uniqKey="Thongjuea S">S Thongjuea</name>
</author>
<author>
<name sortKey="Kwon, At" uniqKey="Kwon A">AT Kwon</name>
</author>
<author>
<name sortKey="Arenillas, D" uniqKey="Arenillas D">D Arenillas</name>
</author>
<author>
<name sortKey="Zhao, X" uniqKey="Zhao X">X Zhao</name>
</author>
<author>
<name sortKey="Valen, E" uniqKey="Valen E">E Valen</name>
</author>
<author>
<name sortKey="Yusuf, D" uniqKey="Yusuf D">D Yusuf</name>
</author>
<author>
<name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
<author>
<name sortKey="Wasserman, Ww" uniqKey="Wasserman W">WW Wasserman</name>
</author>
<author>
<name sortKey="Sandelin, A" uniqKey="Sandelin A">A Sandelin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerstein, Mb" uniqKey="Gerstein M">MB Gerstein</name>
</author>
<author>
<name sortKey="Lu, Zj" uniqKey="Lu Z">ZJ Lu</name>
</author>
<author>
<name sortKey="Van Nostrand, El" uniqKey="Van Nostrand E">EL Van Nostrand</name>
</author>
<author>
<name sortKey="Cheng, C" uniqKey="Cheng C">C Cheng</name>
</author>
<author>
<name sortKey="Arshinoff, Bi" uniqKey="Arshinoff B">BI Arshinoff</name>
</author>
<author>
<name sortKey="Liu, T" uniqKey="Liu T">T Liu</name>
</author>
<author>
<name sortKey="Yip, Ky" uniqKey="Yip K">KY Yip</name>
</author>
<author>
<name sortKey="Robilotto, R" uniqKey="Robilotto R">R Robilotto</name>
</author>
<author>
<name sortKey="Rechtsteiner, A" uniqKey="Rechtsteiner A">A Rechtsteiner</name>
</author>
<author>
<name sortKey="Ikegami, K" uniqKey="Ikegami K">K Ikegami</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roy, S" uniqKey="Roy S">S Roy</name>
</author>
<author>
<name sortKey="Ernst, J" uniqKey="Ernst J">J Ernst</name>
</author>
<author>
<name sortKey="Kharchenko, Pv" uniqKey="Kharchenko P">PV Kharchenko</name>
</author>
<author>
<name sortKey="Kheradpour, P" uniqKey="Kheradpour P">P Kheradpour</name>
</author>
<author>
<name sortKey="Negre, N" uniqKey="Negre N">N Negre</name>
</author>
<author>
<name sortKey="Eaton, Ml" uniqKey="Eaton M">ML Eaton</name>
</author>
<author>
<name sortKey="Landolin, Jm" uniqKey="Landolin J">JM Landolin</name>
</author>
<author>
<name sortKey="Bristow, Ca" uniqKey="Bristow C">CA Bristow</name>
</author>
<author>
<name sortKey="Ma, L" uniqKey="Ma L">L Ma</name>
</author>
<author>
<name sortKey="Lin, Mf" uniqKey="Lin M">MF Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jothi, R" uniqKey="Jothi R">R Jothi</name>
</author>
<author>
<name sortKey="Cuddapah, S" uniqKey="Cuddapah S">S Cuddapah</name>
</author>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
<author>
<name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author>
<name sortKey="Zhao, K" uniqKey="Zhao K">K Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buhler, J" uniqKey="Buhler J">J Buhler</name>
</author>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ettwiller, L" uniqKey="Ettwiller L">L Ettwiller</name>
</author>
<author>
<name sortKey="Paten, B" uniqKey="Paten B">B Paten</name>
</author>
<author>
<name sortKey="Ramialison, M" uniqKey="Ramialison M">M Ramialison</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
<author>
<name sortKey="Wittbrodt, J" uniqKey="Wittbrodt J">J Wittbrodt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fratkin, E" uniqKey="Fratkin E">E Fratkin</name>
</author>
<author>
<name sortKey="Naughton, Bt" uniqKey="Naughton B">BT Naughton</name>
</author>
<author>
<name sortKey="Brutlag, Dl" uniqKey="Brutlag D">DL Brutlag</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lawrence, Ce" uniqKey="Lawrence C">CE Lawrence</name>
</author>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Boguski, Ms" uniqKey="Boguski M">MS Boguski</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
<author>
<name sortKey="Neuwald, Af" uniqKey="Neuwald A">AF Neuwald</name>
</author>
<author>
<name sortKey="Wootton, Jc" uniqKey="Wootton J">JC Wootton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Xs" uniqKey="Liu X">XS Liu</name>
</author>
<author>
<name sortKey="Brutlag, Dl" uniqKey="Brutlag D">DL Brutlag</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marsan, L" uniqKey="Marsan L">L Marsan</name>
</author>
<author>
<name sortKey="Sagot, Mf" uniqKey="Sagot M">MF Sagot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavesi, G" uniqKey="Pavesi G">G Pavesi</name>
</author>
<author>
<name sortKey="Mereghetti, P" uniqKey="Mereghetti P">P Mereghetti</name>
</author>
<author>
<name sortKey="Mauri, G" uniqKey="Mauri G">G Mauri</name>
</author>
<author>
<name sortKey="Pesole, G" uniqKey="Pesole G">G Pesole</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roth, Fp" uniqKey="Roth F">FP Roth</name>
</author>
<author>
<name sortKey="Hughes, Jd" uniqKey="Hughes J">JD Hughes</name>
</author>
<author>
<name sortKey="Estep, Pw" uniqKey="Estep P">PW Estep</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vardhanabhuti, S" uniqKey="Vardhanabhuti S">S Vardhanabhuti</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Hannenhalli, S" uniqKey="Hannenhalli S">S Hannenhalli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Linhart, C" uniqKey="Linhart C">C Linhart</name>
</author>
<author>
<name sortKey="Halperin, Y" uniqKey="Halperin Y">Y Halperin</name>
</author>
<author>
<name sortKey="Shamir, R" uniqKey="Shamir R">R Shamir</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Nk" uniqKey="Kim N">NK Kim</name>
</author>
<author>
<name sortKey="Tharakaraman, K" uniqKey="Tharakaraman K">K Tharakaraman</name>
</author>
<author>
<name sortKey="Marino Ramirez, L" uniqKey="Marino Ramirez L">L Marino-Ramirez</name>
</author>
<author>
<name sortKey="Spouge, Jl" uniqKey="Spouge J">JL Spouge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Narang, V" uniqKey="Narang V">V Narang</name>
</author>
<author>
<name sortKey="Mittal, A" uniqKey="Mittal A">A Mittal</name>
</author>
<author>
<name sortKey="Sung, Wk" uniqKey="Sung W">WK Sung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keilwagen, J" uniqKey="Keilwagen J">J Keilwagen</name>
</author>
<author>
<name sortKey="Grau, J" uniqKey="Grau J">J Grau</name>
</author>
<author>
<name sortKey="Paponov, Ia" uniqKey="Paponov I">IA Paponov</name>
</author>
<author>
<name sortKey="Posch, S" uniqKey="Posch S">S Posch</name>
</author>
<author>
<name sortKey="Strickert, M" uniqKey="Strickert M">M Strickert</name>
</author>
<author>
<name sortKey="Grosse, I" uniqKey="Grosse I">I Grosse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, M" uniqKey="Hu M">M Hu</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Taylor, Jm" uniqKey="Taylor J">JM Taylor</name>
</author>
<author>
<name sortKey="Chinnaiyan, Am" uniqKey="Chinnaiyan A">AM Chinnaiyan</name>
</author>
<author>
<name sortKey="Qin, Zs" uniqKey="Qin Z">ZS Qin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kulakovskiy, Iv" uniqKey="Kulakovskiy I">IV Kulakovskiy</name>
</author>
<author>
<name sortKey="Boeva, Va" uniqKey="Boeva V">VA Boeva</name>
</author>
<author>
<name sortKey="Favorov, Av" uniqKey="Favorov A">AV Favorov</name>
</author>
<author>
<name sortKey="Makeev, Vj" uniqKey="Makeev V">VJ Makeev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmid, Cd" uniqKey="Schmid C">CD Schmid</name>
</author>
<author>
<name sortKey="Bucher, P" uniqKey="Bucher P">P Bucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ji, H" uniqKey="Ji H">H Ji</name>
</author>
<author>
<name sortKey="Jiang, H" uniqKey="Jiang H">H Jiang</name>
</author>
<author>
<name sortKey="Ma, W" uniqKey="Ma W">W Ma</name>
</author>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wong, Wh" uniqKey="Wong W">WH Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corbo, Jc" uniqKey="Corbo J">JC Corbo</name>
</author>
<author>
<name sortKey="Lawrence, Ka" uniqKey="Lawrence K">KA Lawrence</name>
</author>
<author>
<name sortKey="Karlstetter, M" uniqKey="Karlstetter M">M Karlstetter</name>
</author>
<author>
<name sortKey="Myers, Ca" uniqKey="Myers C">CA Myers</name>
</author>
<author>
<name sortKey="Abdelaziz, M" uniqKey="Abdelaziz M">M Abdelaziz</name>
</author>
<author>
<name sortKey="Dirkes, W" uniqKey="Dirkes W">W Dirkes</name>
</author>
<author>
<name sortKey="Weigelt, K" uniqKey="Weigelt K">K Weigelt</name>
</author>
<author>
<name sortKey="Seifert, M" uniqKey="Seifert M">M Seifert</name>
</author>
<author>
<name sortKey="Benes, V" uniqKey="Benes V">V Benes</name>
</author>
<author>
<name sortKey="Fritsche, Lg" uniqKey="Fritsche L">LG Fritsche</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robertson, G" uniqKey="Robertson G">G Robertson</name>
</author>
<author>
<name sortKey="Hirst, M" uniqKey="Hirst M">M Hirst</name>
</author>
<author>
<name sortKey="Bainbridge, M" uniqKey="Bainbridge M">M Bainbridge</name>
</author>
<author>
<name sortKey="Bilenky, M" uniqKey="Bilenky M">M Bilenky</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Zeng, T" uniqKey="Zeng T">T Zeng</name>
</author>
<author>
<name sortKey="Euskirchen, G" uniqKey="Euskirchen G">G Euskirchen</name>
</author>
<author>
<name sortKey="Bernier, B" uniqKey="Bernier B">B Bernier</name>
</author>
<author>
<name sortKey="Varhol, R" uniqKey="Varhol R">R Varhol</name>
</author>
<author>
<name sortKey="Delaney, A" uniqKey="Delaney A">A Delaney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
<author>
<name sortKey="Cuddapah, S" uniqKey="Cuddapah S">S Cuddapah</name>
</author>
<author>
<name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author>
<name sortKey="Roh, Ty" uniqKey="Roh T">TY Roh</name>
</author>
<author>
<name sortKey="Schones, De" uniqKey="Schones D">DE Schones</name>
</author>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
<author>
<name sortKey="Chepelev, I" uniqKey="Chepelev I">I Chepelev</name>
</author>
<author>
<name sortKey="Zhao, K" uniqKey="Zhao K">K Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wederell, Ed" uniqKey="Wederell E">ED Wederell</name>
</author>
<author>
<name sortKey="Bilenky, M" uniqKey="Bilenky M">M Bilenky</name>
</author>
<author>
<name sortKey="Cullum, R" uniqKey="Cullum R">R Cullum</name>
</author>
<author>
<name sortKey="Thiessen, N" uniqKey="Thiessen N">N Thiessen</name>
</author>
<author>
<name sortKey="Dagpinar, M" uniqKey="Dagpinar M">M Dagpinar</name>
</author>
<author>
<name sortKey="Delaney, A" uniqKey="Delaney A">A Delaney</name>
</author>
<author>
<name sortKey="Varhol, R" uniqKey="Varhol R">R Varhol</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Zeng, T" uniqKey="Zeng T">T Zeng</name>
</author>
<author>
<name sortKey="Bernier, B" uniqKey="Bernier B">B Bernier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Th" uniqKey="Kim T">TH Kim</name>
</author>
<author>
<name sortKey="Abdullaev, Zk" uniqKey="Abdullaev Z">ZK Abdullaev</name>
</author>
<author>
<name sortKey="Smith, Ad" uniqKey="Smith A">AD Smith</name>
</author>
<author>
<name sortKey="Ching, Ka" uniqKey="Ching K">KA Ching</name>
</author>
<author>
<name sortKey="Loukinov, Di" uniqKey="Loukinov D">DI Loukinov</name>
</author>
<author>
<name sortKey="Green, Rd" uniqKey="Green R">RD Green</name>
</author>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
<author>
<name sortKey="Lobanenkov, Vv" uniqKey="Lobanenkov V">VV Lobanenkov</name>
</author>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bradley, Rk" uniqKey="Bradley R">RK Bradley</name>
</author>
<author>
<name sortKey="Li, Xy" uniqKey="Li X">XY Li</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Davidson, S" uniqKey="Davidson S">S Davidson</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L Pachter</name>
</author>
<author>
<name sortKey="Chu, Hc" uniqKey="Chu H">HC Chu</name>
</author>
<author>
<name sortKey="Tonkin, La" uniqKey="Tonkin L">LA Tonkin</name>
</author>
<author>
<name sortKey="Biggin, Md" uniqKey="Biggin M">MD Biggin</name>
</author>
<author>
<name sortKey="Eisen, Mb" uniqKey="Eisen M">MB Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
<author>
<name sortKey="Xu, H" uniqKey="Xu H">H Xu</name>
</author>
<author>
<name sortKey="Yuan, P" uniqKey="Yuan P">P Yuan</name>
</author>
<author>
<name sortKey="Fang, F" uniqKey="Fang F">F Fang</name>
</author>
<author>
<name sortKey="Huss, M" uniqKey="Huss M">M Huss</name>
</author>
<author>
<name sortKey="Vega, Vb" uniqKey="Vega V">VB Vega</name>
</author>
<author>
<name sortKey="Wong, E" uniqKey="Wong E">E Wong</name>
</author>
<author>
<name sortKey="Orlov, Yl" uniqKey="Orlov Y">YL Orlov</name>
</author>
<author>
<name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author>
<name sortKey="Jiang, J" uniqKey="Jiang J">J Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Liu, T" uniqKey="Liu T">T Liu</name>
</author>
<author>
<name sortKey="Meyer, Ca" uniqKey="Meyer C">CA Meyer</name>
</author>
<author>
<name sortKey="Eeckhoute, J" uniqKey="Eeckhoute J">J Eeckhoute</name>
</author>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Bernstein, Be" uniqKey="Bernstein B">BE Bernstein</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Brown, M" uniqKey="Brown M">M Brown</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilbanks, Eg" uniqKey="Wilbanks E">EG Wilbanks</name>
</author>
<author>
<name sortKey="Facciotti, Mt" uniqKey="Facciotti M">MT Facciotti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dean, N" uniqKey="Dean N">N Dean</name>
</author>
<author>
<name sortKey="Raftery, Ae" uniqKey="Raftery A">AE Raftery</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schones, De" uniqKey="Schones D">DE Schones</name>
</author>
<author>
<name sortKey="Sumazin, P" uniqKey="Sumazin P">P Sumazin</name>
</author>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahony, S" uniqKey="Mahony S">S Mahony</name>
</author>
<author>
<name sortKey="Auron, Pe" uniqKey="Auron P">PE Auron</name>
</author>
<author>
<name sortKey="Benos, Pv" uniqKey="Benos P">PV Benos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Ad" uniqKey="Smith A">AD Smith</name>
</author>
<author>
<name sortKey="Sumazin, P" uniqKey="Sumazin P">P Sumazin</name>
</author>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sinha, S" uniqKey="Sinha S">S Sinha</name>
</author>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sumazin, P" uniqKey="Sumazin P">P Sumazin</name>
</author>
<author>
<name sortKey="Chen, G" uniqKey="Chen G">G Chen</name>
</author>
<author>
<name sortKey="Hata, N" uniqKey="Hata N">N Hata</name>
</author>
<author>
<name sortKey="Smith, Ad" uniqKey="Smith A">AD Smith</name>
</author>
<author>
<name sortKey="Zhang, T" uniqKey="Zhang T">T Zhang</name>
</author>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Valouev, A" uniqKey="Valouev A">A Valouev</name>
</author>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Sundquist, A" uniqKey="Sundquist A">A Sundquist</name>
</author>
<author>
<name sortKey="Medina, C" uniqKey="Medina C">C Medina</name>
</author>
<author>
<name sortKey="Anton, E" uniqKey="Anton E">E Anton</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Sidow, A" uniqKey="Sidow A">A Sidow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cao, Ar" uniqKey="Cao A">AR Cao</name>
</author>
<author>
<name sortKey="Rabinovich, R" uniqKey="Rabinovich R">R Rabinovich</name>
</author>
<author>
<name sortKey="Xu, M" uniqKey="Xu M">M Xu</name>
</author>
<author>
<name sortKey="Xu, X" uniqKey="Xu X">X Xu</name>
</author>
<author>
<name sortKey="Jin, Vx" uniqKey="Jin V">VX Jin</name>
</author>
<author>
<name sortKey="Farnham, Pj" uniqKey="Farnham P">PJ Farnham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tuteja, G" uniqKey="Tuteja G">G Tuteja</name>
</author>
<author>
<name sortKey="White, P" uniqKey="White P">P White</name>
</author>
<author>
<name sortKey="Schug, J" uniqKey="Schug J">J Schug</name>
</author>
<author>
<name sortKey="Kaestner, Kh" uniqKey="Kaestner K">KH Kaestner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liang, Hl" uniqKey="Liang H">HL Liang</name>
</author>
<author>
<name sortKey="Nien, Cy" uniqKey="Nien C">CY Nien</name>
</author>
<author>
<name sortKey="Liu, Hy" uniqKey="Liu H">HY Liu</name>
</author>
<author>
<name sortKey="Metzstein, Mm" uniqKey="Metzstein M">MM Metzstein</name>
</author>
<author>
<name sortKey="Kirov, N" uniqKey="Kirov N">N Kirov</name>
</author>
<author>
<name sortKey="Rushlow, C" uniqKey="Rushlow C">C Rushlow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wei, Gh" uniqKey="Wei G">GH Wei</name>
</author>
<author>
<name sortKey="Badis, G" uniqKey="Badis G">G Badis</name>
</author>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Kivioja, T" uniqKey="Kivioja T">T Kivioja</name>
</author>
<author>
<name sortKey="Palin, K" uniqKey="Palin K">K Palin</name>
</author>
<author>
<name sortKey="Enge, M" uniqKey="Enge M">M Enge</name>
</author>
<author>
<name sortKey="Bonke, M" uniqKey="Bonke M">M Bonke</name>
</author>
<author>
<name sortKey="Jolma, A" uniqKey="Jolma A">A Jolma</name>
</author>
<author>
<name sortKey="Varjosalo, M" uniqKey="Varjosalo M">M Varjosalo</name>
</author>
<author>
<name sortKey="Gehrke, Ar" uniqKey="Gehrke A">AR Gehrke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, Mf" uniqKey="Berger M">MF Berger</name>
</author>
<author>
<name sortKey="Philippakis, Aa" uniqKey="Philippakis A">AA Philippakis</name>
</author>
<author>
<name sortKey="Qureshi, Am" uniqKey="Qureshi A">AM Qureshi</name>
</author>
<author>
<name sortKey="He, Fs" uniqKey="He F">FS He</name>
</author>
<author>
<name sortKey="Estep, Pw" uniqKey="Estep P">PW Estep</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitington, T" uniqKey="Whitington T">T Whitington</name>
</author>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Johnson, J" uniqKey="Johnson J">J Johnson</name>
</author>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-id journal-id-type="hwp">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22228832</article-id>
<article-id pub-id-type="pmc">3326300</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkr1135</article-id>
<article-id pub-id-type="publisher-id">gkr1135</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods Online</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ma</surname>
<given-names>Xiaotu</given-names>
</name>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kulkarni</surname>
<given-names>Ashwinikumar</given-names>
</name>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Zhihua</given-names>
</name>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xuan</surname>
<given-names>Zhenyu</given-names>
</name>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Serfling</surname>
<given-names>Robert</given-names>
</name>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Michael Q.</given-names>
</name>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="gkr1135-AFF1">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="gkr1135-COR1">*</xref>
</contrib>
</contrib-group>
<aff id="gkr1135-AFF1">
<sup>1</sup>
Department of Molecular and Cell Biology, Center for Systems Biology,
<sup>2</sup>
Department of Mathematics, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA and
<sup>3</sup>
Division of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China</aff>
<author-notes>
<corresp id="gkr1135-COR1">*To whom correspondence should be addressed. Tel:
<phone>516 367 8393</phone>
; Fax:
<fax>516 367 8461</fax>
; Email:
<email>michael.zhang@utdallas.edu</email>
;
<email>mzhang@cshl.edu</email>
</corresp>
</author-notes>
<pmc-comment>For NAR both ppub and collection dates generated for PMC processing 1/27/05 beck</pmc-comment>
<pub-date pub-type="collection">
<month>4</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="ppub">
<month>4</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>6</day>
<month>1</month>
<year>2011</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>6</day>
<month>1</month>
<year>2011</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>40</volume>
<issue>7</issue>
<fpage>e50</fpage>
<lpage>e50</lpage>
<history>
<date date-type="received">
<day>16</day>
<month>8</month>
<year>2011</year>
</date>
<date date-type="rev-recd">
<day>28</day>
<month>10</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>8</day>
<month>11</month>
<year>2011</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2012. Published by Oxford University Press.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">http://creativecommons.org/licenses/by-nc/3.0</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new
<italic>k-</italic>
mer occurrence model to reflect the fact that functional DNA
<italic>k</italic>
-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional
<italic>k-</italic>
mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar
<italic>k</italic>
-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.</p>
</abstract>
<counts>
<page-count count="11"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>INTRODUCTION</title>
<p>Decoding the transcriptional regulatory network is a challenging task in molecular biology (
<xref ref-type="bibr" rid="gkr1135-B1">1</xref>
,
<xref ref-type="bibr" rid="gkr1135-B2">2</xref>
). In human, despite the estimated 1391 sequence-specific DNA-binding transcription factors, only ∼60 of them have been experimentally verified for both DNA-binding and regulatory functions (
<xref ref-type="bibr" rid="gkr1135-B3">3</xref>
). As of January, 2011, there were only 75 matrix models describing the binding motifs of human transcription factors in the JASPAR database (
<xref ref-type="bibr" rid="gkr1135-B4">4</xref>
). With the rapid development of high-throughput DNA sequencing technology, it is now popular to experimentally map the genome-wide binding regions of transcription factors using chromatin immunoprecipitation (ChIP) coupled with massively parallel sequencing technology (ChIP-seq) or microarray (ChIP-chip) (
<xref ref-type="bibr" rid="gkr1135-B1">1</xref>
,
<xref ref-type="bibr" rid="gkr1135-B2">2</xref>
). For example, binding regions of 23 worm transcription factors (
<xref ref-type="bibr" rid="gkr1135-B5">5</xref>
) and 103 fly transcription factors (
<xref ref-type="bibr" rid="gkr1135-B6">6</xref>
) have been studied in single projects. Identification of functional DNA-motifs from such data may provide valuable resources for modeling the transcription regulatory networks. Although the resolution of binding regions identified from ChIP-seq can be a few hundred base pairs (
<xref ref-type="bibr" rid="gkr1135-B2">2</xref>
), it has been found (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
) that existing iterative motif discovery methods, e.g. MEME (
<xref ref-type="bibr" rid="gkr1135-B8">8</xref>
), do not have the computational efficiency required to process the huge amount of data from ChIP-seq/ChIP-chip experiments.</p>
<p>On the other hand, modeling and discovery of DNA motifs from a set of DNA sequences have been a major research focus in computational biology (
<xref ref-type="bibr" rid="gkr1135-B1">1</xref>
,
<xref ref-type="bibr" rid="gkr1135-B2">2</xref>
,
<xref ref-type="bibr" rid="gkr1135-B9">9</xref>
). In the earlier works (
<xref ref-type="bibr" rid="gkr1135-B8 gkr1135-B9 gkr1135-B10 gkr1135-B11 gkr1135-B12 gkr1135-B13 gkr1135-B14 gkr1135-B15 gkr1135-B16 gkr1135-B17 gkr1135-B18">8–18</xref>
), it is generally assumed that the underlying DNA motifs to be discovered are enriched in certain regions (e.g. promoters of co-expressed genes) without any positional preference. Since some transcription factors are known to bind DNA regions close to 5′ transcription start site (TSS) of their target genes (
<xref ref-type="bibr" rid="gkr1135-B19">19</xref>
), Linhart
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B20">20</xref>
) introduced a binomial test to determine if a given DNA motif tends to appear in certain bins of the 5′TSS regions of genes. Kim
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B21">21</xref>
) introduced a Bayesian model to incorporate positional bias of transcription factor binding sites (TFBSs) in promoter regions to discover DNA motifs. Narang
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B22">22</xref>
) introduced a spatial confinement score combined with an overrepresentation score and relative entropy score to discover DNA motifs. Using positional preference for DNA motif discovery was most recently revisited by Keilwagen
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B23">23</xref>
).</p>
<p>With the ChIP-seq/ChIP-chip technique, positional information is more evident in such data sets. For example, peak intensity profiles were used a priori (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
) to accelerate the optimization process. Such intensity profiles were also used to score PWMs (
<xref ref-type="bibr" rid="gkr1135-B25">25</xref>
). Although the above-mentioned motif discovery tools have achieved success in many scenarios, positional information has not been fully exploited for motif discovery. For example, it was found that the underlying DNA motifs are distributed more frequently around the summits of peaks than in the flanking regions of the peaks (
<xref ref-type="bibr" rid="gkr1135-B26">26</xref>
). While the intensity profiles used by Hu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
) and Kulakovskiy
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B25">25</xref>
) contain positional information, estimation accuracy of the intensity profiles at peaks with low-read coverage was error prone. In addition, both fragment length and distribution of the underlying DNA motif may affect the peak intensity profile (
<xref ref-type="bibr" rid="gkr1135-B27">27</xref>
) that often renders the determination of ‘peak segments’ for motif discovery
<italic>ad hoc</italic>
. Also, it is unknown how to optimally specify the start and end points of the detected ChIP peaks for most currently available motif discovery software. As a result, ‘foreground’ sequences are often determined using arbitrary thresholds. For example, in Jothi
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
) and Hu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
), a 200-bp region centered on the peak summit is used, while a region having a 1000-bp length is used in Corbo
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B28">28</xref>
). It is thus interesting to ask how the positional information can be best utilized for DNA motif discovery.</p>
<p>In this work, we first noticed that in a typical ChIP experiment for a sequence-specific transcription factor, the functional DNA motif interacting with the studied protein tends to cluster around the peak summit (
<xref ref-type="bibr" rid="gkr1135-B26">26</xref>
). Based on this observation, we propose a Gaussian-uniform mixture model to describe the positional patterns of
<italic>k</italic>
-mers relative to the peak summit. A scoring method is also proposed to quickly rank and discover
<italic>k</italic>
-mers from ChIP data. A positional information guided motif discovery software, termed POSMO, is then implemented. In the following, using both simulated and real data sets, we will demonstrate the higher effectiveness and efficiency of POSMO than available software tools.</p>
</sec>
<sec sec-type="materials|methods">
<title>MATERIALS AND METHODS</title>
<p>In the following, we assume that the ChIP-seq/ChIP-chip experiments are for sequence-specific DNA-binding transcription factors. It may not be suitable for ChIP-seq/ChIP-chip experiments for non-specific DNA binding proteins, such as histones.</p>
<sec>
<title>Data sets used in this work</title>
<p>To demonstrate the practical use of our method, we obtained ChIP-seq data for STAT1 (
<xref ref-type="bibr" rid="gkr1135-B29">29</xref>
), CRX (
<xref ref-type="bibr" rid="gkr1135-B28">28</xref>
), CTCF (
<xref ref-type="bibr" rid="gkr1135-B30">30</xref>
), NRSF (
<xref ref-type="bibr" rid="gkr1135-B31">31</xref>
) and FOXA2 (
<xref ref-type="bibr" rid="gkr1135-B32">32</xref>
). To validate the performance of our method for ChIP-chip data, we obtained data for CTCF in human (
<xref ref-type="bibr" rid="gkr1135-B33">33</xref>
). To validate the performance of our method on other species, we obtained ChIP-seq data for CAD, KNI, KR1, KR2, BCD, HB1 and HB2 of
<italic>Drosophila melanogaster</italic>
(
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
). We also demonstrated the usefulness of our method on a large cohort of core transcription factors involved in mouse embryonic stem cells (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
). Binding motifs of the above transcription factors are either documented in the JASPAR database (
<xref ref-type="bibr" rid="gkr1135-B4">4</xref>
) or in the original publications, facilitating the comparison between our method and other available methods.</p>
</sec>
<sec>
<title>Model for motif discovery</title>
<p>Naturally, after peak calling on ChIP-seq/ChIP-chip data, we have a set of chromosomal positions about the potential binding events of a given transcription factor. Typically, peak calling software also reports a ‘summit’ for each peak [e.g. MACS (
<xref ref-type="bibr" rid="gkr1135-B36">36</xref>
), SISSRs (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
), for a recent comparison see (
<xref ref-type="bibr" rid="gkr1135-B37">37</xref>
)]. Therefore, these peaks can be uniquely aligned according to the location (
<italic>µ</italic>
<sub>0</sub>
) of their respective summits. Next, based on the nature of the ChIP-seq/ChIP-chip experiment (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
,
<xref ref-type="bibr" rid="gkr1135-B29">29</xref>
,
<xref ref-type="bibr" rid="gkr1135-B36">36</xref>
), we assume that a functional DNA motif exists in the vicinity of peak summits. The DNA motif is a
<italic>k</italic>
-mer with unknown
<italic>k</italic>
. In practice, many such
<italic>k</italic>
-mers may exist as a result of degeneracy and a word-clustering algorithm is employed to group them together. Location (
<italic>X</italic>
) of such motifs is assumed to follow a Gaussian distribution:
<italic>X</italic>
<italic>N</italic>
(
<italic>µ</italic>
<sub>0</sub>
, 
<italic>σ</italic>
<sup>2</sup>
), where
<italic>µ</italic>
<sub>0</sub>
is the peak summit and
<italic>σ</italic>
is an unknown parameter related to the binding nature of the transcription factor being studied, the noise level (e.g. antibody specificity) of the ChIP experiment as well as the noise in the sequencing step. In addition,
<italic>σ</italic>
is small compared to the flanking regions of each candidate peak. This assumption is quite reasonable since the underlying functional DNA motifs are generally enriched in peak summit regions with length of a few hundred base pairs or less (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
,
<xref ref-type="bibr" rid="gkr1135-B29">29</xref>
,
<xref ref-type="bibr" rid="gkr1135-B36">36</xref>
), and we can freely increase the length (e.g. ±5000 bp was used in this work; denoted as
<italic>m</italic>
) of flanking sequences of each peak. Finally, as a background model, the given
<italic>k</italic>
-mer is assumed to be uniformly distributed in the flanking regions of the identified peaks. With these assumptions, we have
<disp-formula id="gkr1135-M1">
<label>(1)</label>
<graphic xlink:href="gkr1135m1"></graphic>
</disp-formula>
where α is an unknown enrichment parameter between 0 and 1 and is specific to the
<italic>k</italic>
-mer. Inferences on α and
<italic>σ</italic>
for a given
<italic>k</italic>
-mer can be made when there is a sufficient number of observations. In principle, a maximum likelihood estimation of α and
<italic>σ</italic>
can be obtained by optimization methods (
<xref ref-type="bibr" rid="gkr1135-B38">38</xref>
). However, since a majority of
<italic>k</italic>
-mers are usually unrelated to the transcription factor, directly solving the above mixture model is not computationally efficient. In fact, we introduced a novel statistic to score and rank each
<italic>k</italic>
-mer as follows.</p>
<p>Since the peak sequences with exactly the same lengths are aligned by the summit (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S1A</ext-link>
), we can count the appearance frequency of each
<italic>k-</italic>
mer at a particular position relative to the peak summit across all aligned peak sequences. In other words, we have an appearance frequency profile (
<italic>A</italic>
<sub>1</sub>
,
<italic>A</italic>
<sub>2</sub>
, … ,
<italic>A</italic>
<sub>2m</sub>
) for each
<italic>k-</italic>
mer (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S1B</ext-link>
), where A
<sub>i</sub>
is the total times of observing a given
<italic>k-</italic>
mer at position
<italic>i</italic>
of all peaks. According to the above mixture model, the appearance frequency profile will be relatively higher when the position index
<italic>i</italic>
is close to peak summit
<italic>µ</italic>
<sub>0</sub>
, provided that the corresponding
<italic>k-</italic>
mer is the binding motif of the investigated transcription factor. Clearly, a significant jump must be observed around the summit region (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S1C</ext-link>
) when the appearance frequency profile of a
<italic>k-</italic>
mer related to the studied transcription factor is converted into a cumulative appearance frequency profile (CAFP). We thus adopted scoring such a jump. Since a higher jump corresponds to larger area between the observed CAFP and the diagonal line (corresponding to α = 0), we used the area (
<italic>R</italic>
) between the CAFP and the diagonal (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Text 1</ext-link>
for detailed definition) to score each
<italic>k-</italic>
mer, and the obtained score is hereinafter referred to as POSMO
<italic>R</italic>
score. We allow this area (
<italic>R</italic>
) to have a negative value, which corresponds to cases where a given
<italic>k-</italic>
mer is depleted around the summit region, but enriched in flanking regions. As shown in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Text 1</ext-link>
, we proved that the POSMO
<italic>R</italic>
score asymptotically follows a normal distribution
<inline-formula>
<inline-graphic xlink:href="gkr1135i1.jpg"></inline-graphic>
</inline-formula>
for large
<italic>T</italic>
(say >60), where
<italic>T</italic>
represents total occurrences of a given
<italic>k-</italic>
mer. With this distribution, we can evaluate the statistical significance of each
<italic>k-</italic>
mer by POSMO
<italic>R</italic>
score. Such significance scores, termed POSMO
<italic>Z</italic>
scores, are then used to rank
<italic>k-</italic>
mers (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S1D</ext-link>
).</p>
<p>Although the above POSMO
<italic>Z</italic>
score is highly effective in ranking the true
<italic>k</italic>
-mers on top, it obviously does not efficiently account for
<italic>σ</italic>
in
<xref ref-type="disp-formula" rid="gkr1135-M1">Equation (1)</xref>
. Thus, for the purpose of efficiency, we proposed an approximate solution to estimate
<italic>σ</italic>
using linear models. As can be seen in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S1C</ext-link>
, the CAFP has two linear components of the same slope in the flanking regions. In other words, we can model the linear components in the flanking regions by:
<disp-formula>
<graphic xlink:href="gkr1135um1"></graphic>
</disp-formula>
<disp-formula>
<graphic xlink:href="gkr1135um2"></graphic>
</disp-formula>
Estimators
<inline-formula>
<inline-graphic xlink:href="gkr1135i2.jpg"></inline-graphic>
</inline-formula>
can be derived using least square methods for the above model. We then approximately estimate
<italic>σ</italic>
of the Gaussian component by checking the residual of the linear fitting of the two flanking regions. Since the profile in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S1C</ext-link>
is monotonic, we can calculate the residual at each position using the linear model on left flanking region and right flanking region. The start (
<italic>s</italic>
) and end (
<italic>e</italic>
) positions where the residual first exceeds
<inline-formula>
<inline-graphic xlink:href="gkr1135i3.jpg"></inline-graphic>
</inline-formula>
are estimated by:
<disp-formula>
<graphic xlink:href="gkr1135um3"></graphic>
</disp-formula>
<disp-formula>
<graphic xlink:href="gkr1135um4"></graphic>
</disp-formula>
</p>
<p>The distance between these two positions (
<inline-formula>
<inline-graphic xlink:href="gkr1135i4.jpg"></inline-graphic>
</inline-formula>
; hereinafter termed
<italic>D</italic>
score) is a second filter in addition to the above POSMO
<italic>Z</italic>
score. A true
<italic>k</italic>
-mer will have a relatively smaller positive
<italic>D</italic>
score than other
<italic>k</italic>
-mers. Clearly, the complexity to calculate the above POSMO
<italic>R</italic>
score and
<italic>D</italic>
score is linear with the length of the flanking regions.</p>
</sec>
<sec>
<title>Thresholds</title>
<p>Obviously, a majority of the 4 
<italic>
<sup>k</sup>
k</italic>
-mers are unrelated to the transcription factor being studied. With the above POSMO
<italic>Z</italic>
score and
<italic>D</italic>
score for each
<italic>k</italic>
-mer, we next want to detect ‘significant
<italic>k</italic>
-mers’ for further analysis. For this purpose, we only consider positive
<italic>D</italic>
scores that correspond to
<italic>k</italic>
-mers enriched in peak summit regions. A
<italic>k</italic>
-mer will be retained if its
<italic>D</italic>
score is small, but non-negative:
<disp-formula>
<graphic xlink:href="gkr1135um5"></graphic>
</disp-formula>
where
<italic>t</italic>
<sub>D</sub>
is set to 1.645 (corresponding to one-sided
<italic>P</italic>
-value of 0.05) in this work.</p>
<p>Next, we checked the population mean (
<italic>µ
<sub>Z</sub>
</italic>
) and standard deviation (
<italic>σ
<sub>Z</sub>
</italic>
) of the POSMO
<italic>Z</italic>
scores of each
<italic>k</italic>
-mer. A
<italic>k</italic>
-mer will be filtered out if its POSMO
<italic>Z</italic>
score is small:
<disp-formula>
<graphic xlink:href="gkr1135um6"></graphic>
</disp-formula>
where
<italic>t</italic>
<sub>Z</sub>
is set to 2.33 (corresponding to one-sided
<italic>P</italic>
-value of 0.01) in this work.
<italic>k</italic>
-mers satisfying the above two filters are called ‘significant
<italic>k</italic>
-mers’ and will be subject to word clustering, as described in the next section. We note that the above thresholds (
<italic>t</italic>
<sub>D</sub>
and
<italic>t</italic>
<sub>Z</sub>
) are quite arbitrary and could be fine-tuned for specific data sets. However, in the present work, we found that our method is robust to these parameters (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S3</ext-link>
).</p>
</sec>
<sec>
<title>Word clustering</title>
<p>Although the desired
<italic>k-</italic>
mers are generally within the significant
<italic>k</italic>
-mer list, it is difficult to manually inspect the whole list of significant
<italic>k</italic>
-mers. In addition, different variations of the same binding motif exist in the list of significant
<italic>k-</italic>
mers due to degeneracy. Thus, a traditional PWM representation may be more informative. This raised a question of
<italic>k-</italic>
mer clustering, which is still an open problem in bioinformatics. Different methods for clustering
<italic>k-</italic>
mers have been proposed. For example, Schones
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B39">39</xref>
) found that the chi-square statistics and Fisher-Irwin test are good measurements of PWM similarity. Mahony
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B40">40</xref>
) studied the effectiveness of different similarity metrics and tree-building methods on grouping
<italic>k-</italic>
mers from an analysis of variance (ANOVA) perspective. They found that the Pearson correlation coefficient is a good similarity measure between PWMs. However, we found that their metric on automatic determination of the number of clusters is generally not satisfactory for our PWMs, i.e. the CH
<sub>log</sub>
statistic by Mahony
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B40">40</xref>
) does not always generate reasonable global minima to determine optimal cluster numbers (data not shown). Thus, we developed a simpler, yet effective, method to group the significant
<italic>k-</italic>
mers by considering context, as described below.</p>
<p>Specifically, with the significant
<italic>k</italic>
-mers, we rescan the whole data set of peak sequences, e.g. (−5, 5 kb) flanking region of the peak summits. Each significant
<italic>k-</italic>
mer as defined in section ‘Thresholds’ is assigned its POSMO
<italic>Z</italic>
score, while insignificant
<italic>k-</italic>
mers are set to zero. For each continuous sequence segment with score >0, we extract its flanking regions (e.g. ±20 bp) surrounding the
<italic>k-</italic>
mer (
<italic>w</italic>
<sub>0</sub>
) with highest score and label this sequence segment using corresponding
<italic>k-</italic>
mer (
<italic>w</italic>
<sub>0</sub>
). This step is particularly useful when our
<italic>k</italic>
is smaller than the optimal
<italic>k</italic>
<sub>0</sub>
, in which case several
<italic>k-</italic>
mers are highly significant because they are simple shifts of the same DNA motif. DNA segments are then grouped according to their labeling
<italic>k-</italic>
mers, respectively. For each of these labeling
<italic>k-</italic>
mers, a position specific frequency matrix (PWM) is obtained. Next, each PWM is converted into a 4 × 
<italic>w</italic>
vector where
<italic>w</italic>
(>
<italic>k</italic>
) is a predetermined length (here we used
<italic>w</italic>
 = 
<italic>k</italic>
 + 2). Different PWMs are compared using Pearson's correlation coefficient over its region with maximum information content, where the information content is defined as:
<disp-formula>
<graphic xlink:href="gkr1135um7"></graphic>
</disp-formula>
where p
<sub>i,k</sub>
is the frequency of observing nucleotide
<italic>k</italic>
at position
<italic>I</italic>
 = 1,2, … 
<italic>w</italic>
. Different offsets are tried to best match a pair of PWMs. At each step of our hierarchical clustering, the pair of PWMs with highest similarity score is joined, and a new PWM is constructed in the clustering tree. We iterate this process until only one node is left. We found that the similarity score between joined nodes in each step decreases when the tree level is close to the root node. With this observation, we determine the final number of clusters by the tree level which is closest to the root node and which has similarity score over a predefined threshold
<italic>T</italic>
<sub>0</sub>
. We found that setting
<italic>T</italic>
<sub>0</sub>
between (0.8, 0.9) generally gives reasonably good results. A sequence logo was, in turn, generated using the seqLogo package in the BioConductor open source software package, as was done in Hu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
).</p>
</sec>
<sec>
<title>Comparison with other algorithms on ranking k-mers by POSMO
<italic>Z</italic>
score</title>
<p>We first set out to compare our algorithm with other motif-discovery methods. For this purpose, we compared our algorithm with DME (
<xref ref-type="bibr" rid="gkr1135-B41">41</xref>
), as a representative of enumerative methods, and MEME (
<xref ref-type="bibr" rid="gkr1135-B8">8</xref>
), as a representative of iterative optimization methods. The comparison was carried out with simulated data using the following procedure:
<list list-type="order">
<list-item>
<p>simulate 2000 sequences each with length
<italic>L</italic>
(e.g. 10 000 bp);</p>
</list-item>
<list-item>
<p>half (here 1000) of the sequences are randomly chosen as foreground data and another half (1000) as control data;</p>
</list-item>
<list-item>
<p>a
<italic>k</italic>
-mer
<italic>w</italic>
<sub>0</sub>
is generated to be planted into 50 sequences [corresponding to α = 0.24 in
<xref ref-type="disp-formula" rid="gkr1135-M1">Equation (1)</xref>
, as the expected number of a given 8-mer in background will be 1000 × 10 000/4
<sup>8 </sup>
≈ 153 and 50/(153 + 50) = 0.24], foreground sequences in step 4;</p>
</list-item>
<list-item>
<p>for each of the 50 sequences (see above) selected as foreground, a random integer
<italic>x</italic>
(>0 and <10 000-
<italic>k</italic>
) is sampled from Gaussian distribution
<italic>N</italic>
(
<italic>µ</italic>
,
<italic>σ</italic>
<sup>2</sup>
), where
<italic>µ</italic>
(here 5000) controls the location of the real peaks (to be discovered by ChIP experiments), and
<italic>σ</italic>
controls the spread of the peaks. The
<italic>k-</italic>
mer at position
<italic>x</italic>
is replaced by the
<italic>k-</italic>
mer in step 3);</p>
</list-item>
<list-item>
<p>all the foreground sequences are processed by POSMO;</p>
</list-item>
<list-item>
<p>for each foreground and background sequence, the substrings from position 5000 − 
<italic>l</italic>
to 5000 + 
<italic>l</italic>
are extracted to compile a new foreground data set and a new background data set, where
<italic>l</italic>
is set as 100 bp. These two data sets are input to DME and MEME as foreground and background, respectively;</p>
</list-item>
<list-item>
<p>the rank of the known
<italic>k-</italic>
mer in step (3) by POSMO, DME and MEME is recorded; and</p>
</list-item>
<list-item>
<p>steps (1) through (6) are repeated many (e.g. 500) times to summarize the rank distribution of the known
<italic>k-</italic>
mers. A lower rank of the target
<italic>k</italic>
-mer is better.</p>
</list-item>
</list>
</p>
</sec>
<sec>
<title>Implementation</title>
<p>We have implemented our algorithm using C++. The POSMO program can be freely downloaded from
<ext-link ext-link-type="uri" xlink:href="http://cb.utdallas.edu/Posmo/index.html">http://cb.utdallas.edu/Posmo/index.html</ext-link>
.</p>
</sec>
</sec>
<sec>
<title>RESULTS</title>
<sec>
<title>Simulation study</title>
<p>We first used simulation to compare our POSMO algorithm with established methods, such as exhaustive enumeration [e.g. YMF (
<xref ref-type="bibr" rid="gkr1135-B42">42</xref>
), DWE (
<xref ref-type="bibr" rid="gkr1135-B43">43</xref>
) and DME (
<xref ref-type="bibr" rid="gkr1135-B41">41</xref>
)] and iterative optimization [e.g. MEME (
<xref ref-type="bibr" rid="gkr1135-B8">8</xref>
)]. We decided to compare our algorithm with DME and MEME as representatives of the above two categories. As it turned out, our method performs in a manner similar to DME and MEME in ranking the target
<italic>k</italic>
-mers on top when the underlying distribution of the target motif is well correlated with the ChIP peak (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S2A</ext-link>
). In a typical ChIP experiment, the cross-linked DNA sequences are sheared into desired length, and these smaller DNA segments are then sequenced. However, some transcription factors may interact with co-factors, which, in turn, lead to sharper or flatter peaks. These unknown parameters will affect the spread of the motif distribution under the ChIP peaks, as modeled by
<italic>σ</italic>
in
<xref ref-type="disp-formula" rid="gkr1135-M1">Equation (1)</xref>
. As can be seen from
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S2B</ext-link>
, performance of POSMO is more robust to larger
<italic>σ</italic>
as compared to that of MEME and DME. In addition, the ChIP peaks may have systematic shift from the true binding site (
<xref ref-type="bibr" rid="gkr1135-B36">36</xref>
,
<xref ref-type="bibr" rid="gkr1135-B44">44</xref>
). As can be seen from
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Figure S2C</ext-link>
, POSMO is much more robust against such a systematic error than either MEME or DME. This is not surprising since the performance of both MEME and DME is dependent on the accuracy of the foreground sequences. In this sense our simulation method is in favor of our POSMO methods, since MEME and DME do not consider the positional preferences at all. On the other hand, POSMO is able to find the target motif without the need of explicitly specifying foreground and background sequences since it implicitly contrasts the ‘peak center’ with the flanking region. Based on the above simulation results, we proceeded to apply our method to real ChIP data sets in the next sections.</p>
</sec>
<sec>
<title>Application on real data</title>
<sec>
<title>ChIP-seq on STAT1, NRSF, CTCF, CRX and FOXA2</title>
<p>We first applied our POSMO algorithm on ChIP peaks identified by Jothi
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
) on STAT1 (
<xref ref-type="bibr" rid="gkr1135-B29">29</xref>
). A refined motif of STAT1 was recently reported by Hu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
). We therefore studied the performance of POSMO in ranking
<italic>k</italic>
-mers for STAT1-binding motif, using the PWM of STAT1 by Hu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
) as the gold standard. By focusing on the 4741 top peaks of STAT1 ChIP data [NumTags >50 in Jothi
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
); see
<xref ref-type="table" rid="gkr1135-T4">Table 4</xref>
for robustness of our method against the number of top peaks used], we found that 8-mers directly related to STAT1 binding are indeed ranked on top (
<xref ref-type="table" rid="gkr1135-T1">Table 1</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S1</ext-link>
).
<table-wrap id="gkr1135-T1" position="float">
<label>Table 1.</label>
<caption>
<p>Top five
<italic>k</italic>
-mers ranked by POSMO are related to the underlying transcription factor-DNA interaction</p>
</caption>
<table frame="hsides" rules="groups">
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<inline-graphic xlink:href="gkr1135t1.jpg"></inline-graphic>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkr1135-TF1">
<p>Dark shaded cells represent either significant
<italic>k</italic>
-mers called by our POSMO algorithm (
<italic>Z</italic>
columns), or a
<italic>k</italic>
-mer with a significant PWM score (PWM columns; >
<italic>µ</italic>
+ 3
<italic>σ</italic>
criterion over all 4 
<italic>
<sup>k</sup>
k</italic>
-mers). Light-gray shaded cells represent
<italic>k</italic>
-mers which are shifts of the genuine motif, thus having an insignificant PWM score. POSMO
<italic>Z</italic>
score is the average POSMO
<italic>Z</italic>
score of a
<italic>k</italic>
-mer and its reverse complementary
<italic>k</italic>
-mer.
<italic>k-</italic>
mers for each transcription factor are sorted according to POSMO
<italic>Z</italic>
score (Z columns).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>We next asked whether POSMO could generally rank
<italic>k-</italic>
mers related to studied transcription factors on top. To accomplish this, we collected ChIP-seq data for CRX (
<xref ref-type="bibr" rid="gkr1135-B28">28</xref>
), CTCF (
<xref ref-type="bibr" rid="gkr1135-B30">30</xref>
), NRSF (
<xref ref-type="bibr" rid="gkr1135-B31">31</xref>
) and FOXA2 (
<xref ref-type="bibr" rid="gkr1135-B32">32</xref>
). As can be seen from
<xref ref-type="table" rid="gkr1135-T1">Table 1</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S1</ext-link>
, POSMO successfully ranked the desired
<italic>k</italic>
-mers on top for all studied factors, indicating the effectiveness of our method in ranking functional
<italic>k</italic>
-mers for ChIP-seq data.</p>
</sec>
<sec>
<title>ChIP-chip on CTCF</title>
<p>We also asked if POSMO could process ChIP-chip data, which has less resolution than that of ChIP-seq data (
<xref ref-type="bibr" rid="gkr1135-B2">2</xref>
). For this purpose, we obtained the 13 720 peaks of CTCF binding determined using ChIP-chip (
<xref ref-type="bibr" rid="gkr1135-B33">33</xref>
). As shown in
<xref ref-type="table" rid="gkr1135-T1">Table 1</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S1</ext-link>
, the top five 8-mers found by POSMO are all related to the CTCF binding motif. This result suggested that our method is applicable to ChIP-chip data.</p>
</sec>
<sec>
<title>ChIP-seq data from
<italic>D</italic>
</title>
<p>melanogaster. We next asked if our method could process ChIP-seq data from species other than human. To address this question, we obtained ChIP peaks for transcription factors BCD, HB1, HB2, CAD, KNI, GT, KR1 and KR2 of
<italic>D. melanogaster</italic>
(
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
).
<xref ref-type="table" rid="gkr1135-T1">Table 1</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary S1</ext-link>
clearly show that POSMO is able to rank the target
<italic>k</italic>
-mers on top for GT, KR1, KR2, BCD, HB1 and HB2. POSMO failed to rank the known motif of CAD and KNI on top. However, we found that both MEME and DME also failed to discover the correct DNA motifs for CAD and KNI (
<xref ref-type="table" rid="gkr1135-T5">Table 5</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
), indicating that the underlying nature of this data set may not permit us to find the desired motifs.</p>
</sec>
</sec>
<sec>
<title>ChIP-seq data from core transcriptional networks in mouse ES cells</title>
<p>We also investigated the performance of our algorithm on the 13 sequence-specific transcription factors involved in the core transcriptional networks in mouse ES cells (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
). As shown in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S1</ext-link>
, POSMO successfully ranked the desired
<italic>k</italic>
-mers on top for 9 of the 13 factors (c-MYC, n-MYC, CTCF, ESRRB, KLF4, OCT4, SOX2, STAT3, ZFX and E2F1). Although the
<italic>k</italic>
-mer with highest PWM score is ranked 34th for TCFCP2L1, most of the top-ranked
<italic>k</italic>
-mers are just shift of the known motif (light gray cells in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S1</ext-link>
). For E2F1, no motifs were found in the original work by Chen
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
). In Bailey (
<xref ref-type="bibr" rid="gkr1135-B45">45</xref>
), the top two motifs found for E2F1 are in the form of GGAA and ATGGCG. On the other hand, the top
<italic>k</italic>
-mers found by POSMO contain the sequence TTCCGG, which is partly similar to the
<italic>in vitro</italic>
E2F1 motif documented in JASPAR. As a confirmation, we found that our top
<italic>k</italic>
-mers, especially motif TTCCGG (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S1</ext-link>
), also appeared in independent E2F1 ChIP-seq data (
<xref ref-type="bibr" rid="gkr1135-B46">46</xref>
). Interestingly, in this independent E2F1 ChIP-seq data (
<xref ref-type="bibr" rid="gkr1135-B46">46</xref>
), a motif TTGGCGC with rank 14 is partly similar to the E2F1 motif documented in JASPAR. For SMAD1, our algorithm did not find any significant motifs. However, POSMO identified a motif (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
) weakly similar to the known SMAD1 motif when using ±200 bp flanking region of the peak summits, indicating that the length of flanking regions may be further optimized for POSMO.</p>
<p>In summary, the above results clearly indicated that POSMO is highly effective for both ChIP-chip and ChIP-seq data from human, mouse and other species, such as fly. In the next section, we will try to group these significant
<italic>k</italic>
-mers into PWM representation.</p>
</sec>
<sec>
<title>Word clustering on real data</title>
<p>Even though our POSMO software is generally able to rank the desired DNA
<italic>k</italic>
-mers on top, it is difficult to manually inspect them. Consequently, we next applied our novel word clustering algorithm on the sequence contexts of significant
<italic>k</italic>
-mers in the genome to obtain PWM representations (see ‘Materials and Methods’ section). As can be seen in
<xref ref-type="table" rid="gkr1135-T2">Table 2</xref>
, motifs reported for CTCF, STAT1, NRSF, CRX and FOXA2 by our word clustering method are highly similar to those reported in the literature.
<table-wrap id="gkr1135-T2" position="float">
<label>Table 2.</label>
<caption>
<p>Sequence motifs discovered by POSMO</p>
</caption>
<table frame="hsides" rules="groups">
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<inline-graphic xlink:href="gkr1135t2.jpg"></inline-graphic>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkr1135-TF2">
<p>The DNA motifs after word clustering are listed. As a comparison, the DNA motifs from the literature are also listed. For NRSF, two motifs are reported by POSMO, which correspond to the left and right half-sites reported by Hu
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>In the case of
<italic>D. melanogaster</italic>
, the motifs found by our method for BCD, HB1, HB2, KR1, KR2 and GT (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
) are highly similar to the motifs cataloged in JASPAR. For transcription factor CAD and KNI, the known motif was not found by either our algorithm or other algorithms (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
). Interestingly, for CAD, our algorithm reported a motif with consensus CAGGTA, which is implicated in the regulation of early transcribed genes during
<italic>Drosophila</italic>
development (
<xref ref-type="bibr" rid="gkr1135-B48">48</xref>
).</p>
<p>For the core transcription factors involved in mouse ES cells, motifs reported by POSMO are also highly similar to the known motifs for CTCF, n-MYC, c-MYC, STAT3, KLF4, SOX2, OCT4, ZFX, TCFCP2L1 and ESRRB (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
). Similar to a recent study using an improved version of MEME by Bailey (
<xref ref-type="bibr" rid="gkr1135-B45">45</xref>
), our algorithm did not report any motif for SMAD1. However, POSMO identified a motif (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
) weakly similar to the known SMAD1 motif when using ±1000 bp flanking region of the peak summits, indicating that the length of flanking regions may be further optimized for POSMO. The motif for E2F1 was not found by either Chen
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
) or other methods including POSMO. However, our algorithm found a motif with the pattern CCGGAAG (reverse complement is CTTCCGG; see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S5</ext-link>
), which is partly similar to the known motif TTT[G/C][G/C]CGC documented in JASPAR. Interestingly, we note that CCGGAAG is highly similar to the binding motifs of ETS transcription factors (
<xref ref-type="bibr" rid="gkr1135-B49">49</xref>
), which might indicate the interaction between E2F1 and ETS transcription factors. Notably, this motif is also found in an independent E2F1 ChIP-seq data (
<xref ref-type="bibr" rid="gkr1135-B46">46</xref>
) (data now shown).</p>
</sec>
<sec>
<title>POSMO is robust to input parameters</title>
<p>One general concern for motif finders using the
<italic>k</italic>
-mer enumeration method is the determination of
<italic>k</italic>
. We thus asked how
<italic>k</italic>
affects the performance of POSMO. For this purpose, we ran our POSMO algorithm for 7-, 8- and 9-mer on STAT1, NRSF, CTCF, CRX and FoxA2 data. As can be seen from
<xref ref-type="table" rid="gkr1135-T3">Table 3</xref>
, different
<italic>k</italic>
values do not greatly affect the obtained DNA motifs. Apparently, the closer the specified word pattern is to the truth, the better the results will be. Since it is difficult to know this parameter
<italic>a priori</italic>
, it may be helpful to try several parameters in real applications. However our results on long motifs of CTCF and NRSF suggest that in general we do not need very large
<italic>k</italic>
to discover long motifs. This is due to our heuristic word clustering method by considering context, as described in ‘Materials and Methods’ section.
<table-wrap id="gkr1135-T3" position="float">
<label>Table 3.</label>
<caption>
<p>Input pattern lengths of 7, 8 and 9 are compared (POSMO is robust to input parameter
<italic>k</italic>
)</p>
</caption>
<table frame="hsides" rules="groups">
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<inline-graphic xlink:href="gkr1135t3.jpg"></inline-graphic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>We also asked how input parameters
<italic>t</italic>
<sub>D</sub>
and
<italic>t</italic>
<sub>Z</sub>
(see ‘Materials and Methods’ section) of POSMO could affect the performance of POSMO. We tried a series of parameters on
<italic>t</italic>
<sub>D</sub>
and
<italic>t</italic>
<sub>Z</sub>
and found that our method is not sensitive to them (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S3</ext-link>
). Thus, this fact renders our method easy to use in practice.</p>
</sec>
<sec>
<title>POSMO performs well for large sample sizes</title>
<p>As discussed in Schmid and Bucher (
<xref ref-type="bibr" rid="gkr1135-B26">26</xref>
), the percentage of top peak sequences containing the known motif generally decreases as the threshold relaxes. Therefore, we asked how the total number of peak sequences would affect our motif finding results. We ranked the identified peaks for STAT1, CTCF, NRSF and FOXA2 according to the peak height, and we used the top 1000, 2000, 5000 and 10 000 peaks as input for POSMO. As can be seen in
<xref ref-type="table" rid="gkr1135-T4">Table 4</xref>
, POSMO is effective for all tested parameters. In particular, POSMO was found to be effective for input data sets containing 10 000 peaks for all studied transcription factors, suggesting the superior performance of POSMO for large sample sizes. Since ChIP-seq experiments typically produce thousands of peak sequences, we conclude that POSMO has broad applicability for ChIP experiments.
<table-wrap id="gkr1135-T4" position="float">
<label>Table 4.</label>
<caption>
<p>POSMO is robust to large sample sizes</p>
</caption>
<table frame="hsides" rules="groups">
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<inline-graphic xlink:href="gkr1135t4.jpg"></inline-graphic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec>
<title>POSMO is more effective than available methods for ChIP data</title>
<p>We next compared the overall performance of POSMO with that of DME (
<xref ref-type="bibr" rid="gkr1135-B41">41</xref>
), MEME (
<xref ref-type="bibr" rid="gkr1135-B8">8</xref>
), ChIPMunk (
<xref ref-type="bibr" rid="gkr1135-B25">25</xref>
), HMS (
<xref ref-type="bibr" rid="gkr1135-B24">24</xref>
) and DREME (
<xref ref-type="bibr" rid="gkr1135-B45">45</xref>
) on motif discovery. For this purpose, we checked the rank of the known motifs among all discoveries in DME (5 motifs to be reported), MEME (only top 500 peaks are used as a result of running speed constraint and 5 motifs to be reported), ChIPMunk (1 motif to be reported), HMS (ChIP-seq intensity profile under the peaks were also compiled for applicable transcription factors), DREME and POSMO (details on the motifs found by each method can be found in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
). For POSMO, our algorithm reported the total number of occurrences of each motif in the ChIP-seq/ChIP-chip data, which could be used to rank all the discovered motifs. As can be seen in
<xref ref-type="table" rid="gkr1135-T5">Table 5</xref>
, POSMO performs in a manner similar to DME, MEME and DREME, though with better average rank of the discovered motifs. Motifs discovered by POSMO are similar to that discovered by ChIPMunk. However, the top 3 extremely high scoring peaks must be removed for ChIPMunk to discover the correct motif for STAT1 (
<xref ref-type="table" rid="gkr1135-T5">Table 5</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
). In addition, POSMO is better than HMS for fly transcription factors and several mammal transcription factors including CRX, STAT3 and ZFX, suggesting the high effectiveness of our method. POSMO did not find motif for SMAD1 using our default settings; however, we noted that POSMO correctly identified the motif for SMAD1 when shorter flanking regions (±200 bp) are used (
<xref ref-type="table" rid="gkr1135-T5">Table 5</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S2</ext-link>
). This result suggests that flanking length may be further optimized for motif finding. Interestingly, the true motif found by POSMO is always ranked in first place, again indicating that PSOMO is more effective than other tools. This property is particularly useful to assign DNA motifs to a newly investigated transcription factor for which no prior motif information is available.
<table-wrap id="gkr1135-T5" position="float">
<label>Table 5.</label>
<caption>
<p>Performance comparison of POSMO, MEME, DME, ChIPMunk, HMS and DREME on ChIP data</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Transcription factor</th>
<th rowspan="1" colspan="1">Rank by POSMO</th>
<th rowspan="1" colspan="1">Rank by MEME</th>
<th rowspan="1" colspan="1">Rank by DME</th>
<th rowspan="1" colspan="1">Rank by ChIPMunk</th>
<th rowspan="1" colspan="1">Rank by HMS</th>
<th rowspan="1" colspan="1">Rank by DREME</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">STAT1 (
<xref ref-type="bibr" rid="gkr1135-B29">29</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1
<xref ref-type="table-fn" rid="gkr1135-TF4">
<sup>a</sup>
</xref>
</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">NRSF (
<xref ref-type="bibr" rid="gkr1135-B31">31</xref>
)</td>
<td rowspan="1" colspan="1">1, 2</td>
<td rowspan="1" colspan="1">1, 2</td>
<td rowspan="1" colspan="1">1, 2</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CTCF (
<xref ref-type="bibr" rid="gkr1135-B30">30</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CTCF (
<xref ref-type="bibr" rid="gkr1135-B33">33</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1
<xref ref-type="table-fn" rid="gkr1135-TF5">
<sup>b</sup>
</xref>
</td>
<td rowspan="1" colspan="1">NA</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">FOXA2 (
<xref ref-type="bibr" rid="gkr1135-B32">32</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CRX (
<xref ref-type="bibr" rid="gkr1135-B28">28</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">BCD (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CAD (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
</tr>
<tr>
<td rowspan="1" colspan="1">HB1 (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td rowspan="1" colspan="1">HB2 (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KR1 (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KR2 (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KNI (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GT (
<xref ref-type="bibr" rid="gkr1135-B34">34</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">7</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
</tr>
<tr>
<td rowspan="1" colspan="1">c-MYC (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">n-MYC (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CTCF (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">ESRRB (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">STAT3 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">OCT4 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">SOX2 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KLF4 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">E2F1 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TCFCP2L1 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">ZFX (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">NANOG (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1
<xref ref-type="table-fn" rid="gkr1135-TF6">
<sup>c</sup>
</xref>
</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">SMAD1 (
<xref ref-type="bibr" rid="gkr1135-B35">35</xref>
)</td>
<td rowspan="1" colspan="1">1
<xref ref-type="table-fn" rid="gkr1135-TF7">
<sup>d</sup>
</xref>
</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">No match</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Total successes</td>
<td rowspan="1" colspan="1">24/27</td>
<td rowspan="1" colspan="1">23/27</td>
<td rowspan="1" colspan="1">23/27</td>
<td rowspan="1" colspan="1">24/27</td>
<td rowspan="1" colspan="1">12/26</td>
<td rowspan="1" colspan="1">23/27</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Average rank</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1.30</td>
<td rowspan="1" colspan="1">1.47</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1.43</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="gkr1135-TF3">
<p>Among the Top 5 motifs found by MEME, DREME and DME, the rank (per
<italic>P</italic>
-values) of the known binding motif is listed. For NRSF, there are two known motifs and their ranks are counted separately. No match: the software did not report any motif similar to the known motif.</p>
</fn>
<fn id="gkr1135-TF4">
<p>
<sup>a</sup>
Top 3 peaks removed to get the correct motif.</p>
</fn>
<fn id="gkr1135-TF5">
<p>
<sup>b</sup>
Triangle intensity profile used.</p>
</fn>
<fn id="gkr1135-TF6">
<p>
<sup>c</sup>
Motif length of 20 used.</p>
</fn>
<fn id="gkr1135-TF7">
<p>
<sup>d</sup>
±200 bases flanking peak summit.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec>
<title>POSMO is more efficient than available methods</title>
<p>We also compared the running time of POSMO with other established methods. As established in Keilwagen
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B23">23</xref>
), DME by Smith
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="gkr1135-B41">41</xref>
) is one of the fastest algorithms for large sample sizes. We therefore compared the running time of our method with that of DME using various numbers of peaks ranging from 500 to 10 000. As can be seen in
<xref ref-type="fig" rid="gkr1135-F1">Figure 1</xref>
, our method is significantly faster than DME for large sample sizes, where the running time of our method scales linearly with the number of peak sequences, with a typical running time of only a few minutes. In addition, a comparison on the real ChIP data sets revealed that POSMO is significantly more efficient than ChIPMunk, HMS and DREME (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S4</ext-link>
). Thus, we conclude that our method is highly efficient for motif discovery from large sample sizes of ChIP experiments. Clearly, the efficiency of our method will quickly decrease with large
<italic>k</italic>
. However, as was demonstrated in (
<xref ref-type="bibr" rid="gkr1135-B50">50</xref>
), 77% of the transcription factor binding motifs have <11 informative positions. Most importantly, as can be seen in
<xref ref-type="table" rid="gkr1135-T3">Table 3</xref>
, our method is robust to different
<italic>k</italic>
s for tested transcription factors. In particular, our method works well for CTCF and NRSF motif, which have a long motif >13 bp. This result indicates that we do not need a very large
<italic>k</italic>
to find a long motif, partly due to our heuristic word clustering method. Thus, the efficiency of our algorithm is generally guaranteed.
<fig id="gkr1135-F1" position="float">
<label>Figure 1.</label>
<caption>
<p>POSMO is more efficient than DME for large sample sizes. Shown in the
<italic>y</italic>
-axis is the time spent for a given number of top peaks shown in the
<italic>x</italic>
-axis. Results for POSMO (dashed line with boxes) and DME (dashed line with triangles for a smaller peak window and solid line with circles for a larger peak window) are shown. Here
<italic>k</italic>
 = 8 for both POSMO and MEME.</p>
</caption>
<graphic xlink:href="gkr1135f1"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Co-motif finding</title>
<p>In principle, different DNA motifs may co-localize to perform regulatory functions by forming protein complexes. Therefore, it will be very interesting to see if we can discover co-motifs from such high-throughput ChIP-seq data. In fact, sophisticated method targeting this question is already proposed in SpaMo by Bailey and colleagues (
<xref ref-type="bibr" rid="gkr1135-B51">51</xref>
). Though our POSMO is not specifically designed for the purpose of finding co-motifs, we still asked if it can find some of the known co-motifs. For a few transcription factors such as STAT1, CRX, E2F1 and n-Myc (see
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Table S5</ext-link>
for details), POSMO reported some of the known co-motifs as identified by Bailey and colleagues (
<xref ref-type="bibr" rid="gkr1135-B51">51</xref>
). Interestingly, POSMO found co-motif CAGGTA for many fly transcription factors. However, we note that our POSMO is not purposely designed to find co-motifs; therefore, POSMO reported much less co-motifs than SpaMo did. An extension of POSMO specifically designed to detect co-motifs is under development.</p>
</sec>
</sec>
<sec>
<title>DISCUSSION</title>
<p>ChIP-seq/ChIP-chip is a popular experimental method to map
<italic>in vivo</italic>
binding sites of transcription factors. DNA motif discovery from such data is a necessary step toward understanding gene regulation. However, available motif finding tools are mostly designed to find DNA motifs in sequence segments by optimizing alignments, which renders the optimization process inefficient for large sample sizes. Recently, a few methods have been developed to utilize signal intensity to accelerate the discovery process. In this work, we have introduced a new
<italic>k</italic>
-mer enumeration method, POSMO, to predict transcription factor binding motifs. Using simulation, we found that our method is more robust against the information spread and systematic errors in peak locations than available methods in terms of ranking the target
<italic>k</italic>
-mer. The high prediction accuracy is further confirmed using a diverse set of real ChIP-seq/ChIP-chip data sets on human, mouse and fly. We also developed a novel word clustering algorithm by checking the sequence context of each significant
<italic>k</italic>
-mer. We found that our word clustering method can generate motif representation consistent with reports found in the literature. We found that motifs discovered by POSMO is consistent with that discovered by DME and MEME, though our method always gives the true motif highest rank in all tested data sets. This property could be very important when there is no prior knowledge on the binding motifs of a newly investigated transcription factor. Thus, our method is more effective for motif discovery.</p>
<p>On the other hand, since estimation of peak summits is more accurate than estimation of the exact ‘peak regions’ from ChIP-chip/ChIP-seq data, our method provides better usability. In addition, since POSMO essentially contrasts far flanking sequences with sequences under the peak summit, our method does not require explicit ‘background’ to normalize the
<italic>k</italic>
-mer appearance frequency, which is generally recommended for many motif discovery methods (i.e. to also construct background data set). This property also better mimics the biology of transcription factor-DNA interactions: instead of optimizing the binding affinity between the target DNA motif and many other genome-wide ‘background’ sequences, a transcription factor is actually searching the target DNA motifs from the pool of surrounding local DNA sequences. Our results suggested that these local sequences can be better approximated by flanking sequences of ChIP-peak regions than by other ‘control’ sequences.</p>
<p>Most importantly, since our method is essentially a
<italic>k</italic>
-mer enumeration method where hypothesis testing procedures are extensively used, it is very efficient, with a typical running time of only a few minutes for thousands, or even more, ChIP-seq peaks for word length <10. This is in clear contrast to most established methods, such as MEME, which utilize extensive optimization techniques that can take up to hours for a few hundred ChIP-seq peaks (
<xref ref-type="bibr" rid="gkr1135-B7">7</xref>
). Thus, we believe our method will be a useful alternative to quickly study the binding sites of transcription factors.</p>
</sec>
<sec>
<title>SUPPLEMENTARY DATA</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkr1135/DC1">Supplementary Data</ext-link>
are available at NAR online: Supplementary Text 1, Supplementary Figures 1 and 2, Supplementary Tables 1–5.</p>
</sec>
<sec>
<title>FUNDING</title>
<p>
<funding-source>National Institute of Health</funding-source>
(
<award-id>HG001696</award-id>
to M.Q.Z.);
<funding-source>National Basic Research Program of China</funding-source>
(
<award-id>2012CB316503</award-id>
to M.Q.Z.);
<funding-source>National Natural Science Foundation of China</funding-source>
(
<award-id>91019016</award-id>
,
<award-id>31061160497</award-id>
to M.Q.Z.);
<funding-source>National Science Foundation</funding-source>
(
<award-id>DMS-1106091</award-id>
to R.S.) and
<funding-source>UTD Startup Fund</funding-source>
(to Z.X.). Funding for open access charge:
<funding-source>NIH</funding-source>
.</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_40_7_e50__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_NAR-gkr1135-Table_S2-comparing_with_DME_MEME_DREME_HMS_ChIPMunk_New_Version.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File003.xls"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File002.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File004.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File005.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File006.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File007.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File008.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="supp_gkr1135_nar-02027-met-k-2011-File009.doc"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>We thank Joe Corbo for providing PWM of CRX. We also thank Hongyu Zhao for valuable suggestions. The authors are grateful to the anonymous reviewers for their excellent suggestions.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="gkr1135-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>De Moor</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Eskin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Favorov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Assessing computational tools for the discovery of transcription factor binding sites</article-title>
<source>Nat. Biotechnol.</source>
<year>2005</year>
<volume>23</volume>
<fpage>137</fpage>
<lpage>144</lpage>
<pub-id pub-id-type="pmid">15637633</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Determining the specificity of protein-DNA interactions</article-title>
<source>Nat. Rev. Genet.</source>
<year>2010</year>
<volume>11</volume>
<fpage>751</fpage>
<lpage>760</lpage>
<pub-id pub-id-type="pmid">20877328</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vaquerizas</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Kummerfeld</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Teichmann</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Luscombe</surname>
<given-names>NM</given-names>
</name>
</person-group>
<article-title>A census of human transcription factors: function, expression and evolution</article-title>
<source>Nat. Rev. Genet.</source>
<year>2009</year>
<volume>10</volume>
<fpage>252</fpage>
<lpage>263</lpage>
<pub-id pub-id-type="pmid">19274049</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Portales-Casamar</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Thongjuea</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kwon</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Arenillas</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Valen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Yusuf</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lenhard</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wasserman</surname>
<given-names>WW</given-names>
</name>
<name>
<surname>Sandelin</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D105</fpage>
<lpage>D110</lpage>
<pub-id pub-id-type="pmid">19906716</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerstein</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>ZJ</given-names>
</name>
<name>
<surname>Van Nostrand</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Arshinoff</surname>
<given-names>BI</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Yip</surname>
<given-names>KY</given-names>
</name>
<name>
<surname>Robilotto</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rechtsteiner</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ikegami</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Integrative analysis of the
<italic>Caenorhabditis elegans</italic>
genome by the modENCODE project</article-title>
<source>Science</source>
<year>2010</year>
<volume>330</volume>
<fpage>1775</fpage>
<lpage>1787</lpage>
<pub-id pub-id-type="pmid">21177976</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ernst</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kharchenko</surname>
<given-names>PV</given-names>
</name>
<name>
<surname>Kheradpour</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Negre</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Eaton</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Landolin</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Bristow</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>MF</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Identification of functional elements and regulatory circuits by Drosophila modENCODE</article-title>
<source>Science</source>
<year>2010</year>
<volume>330</volume>
<fpage>1787</fpage>
<lpage>1797</lpage>
<pub-id pub-id-type="pmid">21177974</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jothi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cuddapah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>5221</fpage>
<lpage>5231</lpage>
<pub-id pub-id-type="pmid">18684996</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Elkan</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Fitting a mixture model by expectation maximization to discover motifs in biopolymers</article-title>
<source>Proc. Int. Conf. Intell. Syst. Mol. Biol.</source>
<year>1994</year>
<volume>2</volume>
<fpage>28</fpage>
<lpage>36</lpage>
<pub-id pub-id-type="pmid">7584402</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B9">
<label>9</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Lengauer</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Inferring Gene Regulatory Networks</article-title>
<source>Bioinformatics - From Genomes to Therapies</source>
<year>2007</year>
<publisher-loc>Weinheim, Germany</publisher-loc>
<publisher-name>Wiley-VCH GmbH</publisher-name>
<fpage>807</fpage>
<lpage>828</lpage>
</element-citation>
</ref>
<ref id="gkr1135-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buhler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Finding motifs using random projections</article-title>
<source>J. Comput. Biol.</source>
<year>2002</year>
<volume>9</volume>
<fpage>225</fpage>
<lpage>242</lpage>
<pub-id pub-id-type="pmid">12015879</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eskin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
</person-group>
<article-title>Finding composite regulatory patterns in DNA sequences</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<issue>Suppl. 1</issue>
<fpage>S354</fpage>
<lpage>S363</lpage>
<pub-id pub-id-type="pmid">12169566</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ettwiller</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Paten</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ramialison</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Wittbrodt</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation</article-title>
<source>Nat. Methods</source>
<year>2007</year>
<volume>4</volume>
<fpage>563</fpage>
<lpage>565</lpage>
<pub-id pub-id-type="pmid">17589518</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fratkin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Naughton</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Brutlag</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>MotifCut: regulatory motifs finding with maximum density subgraphs</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>e150</fpage>
<lpage>e157</lpage>
<pub-id pub-id-type="pmid">16873465</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lawrence</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Boguski</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Neuwald</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Wootton</surname>
<given-names>JC</given-names>
</name>
</person-group>
<article-title>Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment</article-title>
<source>Science</source>
<year>1993</year>
<volume>262</volume>
<fpage>208</fpage>
<lpage>214</lpage>
<pub-id pub-id-type="pmid">8211139</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>XS</given-names>
</name>
<name>
<surname>Brutlag</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments</article-title>
<source>Nat. Biotechnol.</source>
<year>2002</year>
<volume>20</volume>
<fpage>835</fpage>
<lpage>839</lpage>
<pub-id pub-id-type="pmid">12101404</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marsan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Sagot</surname>
<given-names>MF</given-names>
</name>
</person-group>
<article-title>Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification</article-title>
<source>J. Comput. Biol.</source>
<year>2000</year>
<volume>7</volume>
<fpage>345</fpage>
<lpage>362</lpage>
<pub-id pub-id-type="pmid">11108467</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pavesi</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Mereghetti</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mauri</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pesole</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>W199</fpage>
<lpage>W203</lpage>
<pub-id pub-id-type="pmid">15215380</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roth</surname>
<given-names>FP</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Estep</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM</given-names>
</name>
</person-group>
<article-title>Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation</article-title>
<source>Nat. Biotechnol.</source>
<year>1998</year>
<volume>16</volume>
<fpage>939</fpage>
<lpage>945</lpage>
<pub-id pub-id-type="pmid">9788350</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vardhanabhuti</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hannenhalli</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Position and distance specificity are important determinants of
<italic>cis</italic>
-regulatory motifs in addition to evolutionary conservation</article-title>
<source>Nucleic Acids Res.</source>
<year>2007</year>
<volume>35</volume>
<fpage>3203</fpage>
<lpage>3213</lpage>
<pub-id pub-id-type="pmid">17452354</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Linhart</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Halperin</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Shamir</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets</article-title>
<source>Genome Res.</source>
<year>2008</year>
<volume>18</volume>
<fpage>1180</fpage>
<lpage>1189</lpage>
<pub-id pub-id-type="pmid">18411406</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>NK</given-names>
</name>
<name>
<surname>Tharakaraman</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Marino-Ramirez</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Spouge</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>262</fpage>
<pub-id pub-id-type="pmid">18533028</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Narang</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sung</surname>
<given-names>WK</given-names>
</name>
</person-group>
<article-title>Localized motif discovery in gene regulatory sequences</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<fpage>1152</fpage>
<lpage>1159</lpage>
<pub-id pub-id-type="pmid">20223835</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keilwagen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Grau</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Paponov</surname>
<given-names>IA</given-names>
</name>
<name>
<surname>Posch</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Strickert</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grosse</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>De-novo discovery of differentially abundant transcription factor binding sites including their positional preference</article-title>
<source>PLoS Comput. Biol.</source>
<year>2011</year>
<volume>7</volume>
<fpage>e1001070</fpage>
<pub-id pub-id-type="pmid">21347314</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Chinnaiyan</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>ZS</given-names>
</name>
</person-group>
<article-title>On the detection and refinement of transcription factor binding sites using ChIP-Seq data</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>2154</fpage>
<lpage>2167</lpage>
<pub-id pub-id-type="pmid">20056654</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kulakovskiy</surname>
<given-names>IV</given-names>
</name>
<name>
<surname>Boeva</surname>
<given-names>VA</given-names>
</name>
<name>
<surname>Favorov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Makeev</surname>
<given-names>VJ</given-names>
</name>
</person-group>
<article-title>Deep and wide digging for binding motifs in ChIP-Seq data</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<fpage>2622</fpage>
<lpage>2623</lpage>
<pub-id pub-id-type="pmid">20736340</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schmid</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Bucher</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>MER41 repeat sequences contain inducible STAT1 binding sites</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<fpage>e11425</fpage>
<pub-id pub-id-type="pmid">20625510</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ji</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>WH</given-names>
</name>
</person-group>
<article-title>An integrated software system for analyzing ChIP-chip and ChIP-seq data</article-title>
<source>Nat. Biotechnol.</source>
<year>2008</year>
<volume>26</volume>
<fpage>1293</fpage>
<lpage>1300</lpage>
<pub-id pub-id-type="pmid">18978777</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corbo</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Karlstetter</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Abdelaziz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dirkes</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Weigelt</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Seifert</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Benes</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Fritsche</surname>
<given-names>LG</given-names>
</name>
<etal></etal>
</person-group>
<article-title>CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors</article-title>
<source>Genome Res.</source>
<year>2010</year>
<volume>20</volume>
<fpage>1512</fpage>
<lpage>1525</lpage>
<pub-id pub-id-type="pmid">20693478</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robertson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hirst</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bainbridge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bilenky</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Euskirchen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bernier</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Varhol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Delaney</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing</article-title>
<source>Nat. Methods</source>
<year>2007</year>
<volume>4</volume>
<fpage>651</fpage>
<lpage>657</lpage>
<pub-id pub-id-type="pmid">17558387</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cuddapah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Roh</surname>
<given-names>TY</given-names>
</name>
<name>
<surname>Schones</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chepelev</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>High-resolution profiling of histone methylations in the human genome</article-title>
<source>Cell</source>
<year>2007</year>
<volume>129</volume>
<fpage>823</fpage>
<lpage>837</lpage>
<pub-id pub-id-type="pmid">17512414</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Genome-wide mapping of in vivo protein-DNA interactions</article-title>
<source>Science</source>
<year>2007</year>
<volume>316</volume>
<fpage>1497</fpage>
<lpage>1502</lpage>
<pub-id pub-id-type="pmid">17540862</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wederell</surname>
<given-names>ED</given-names>
</name>
<name>
<surname>Bilenky</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Cullum</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Thiessen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Dagpinar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Delaney</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Varhol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bernier</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>4549</fpage>
<lpage>4564</lpage>
<pub-id pub-id-type="pmid">18611952</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Abdullaev</surname>
<given-names>ZK</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Ching</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Loukinov</surname>
<given-names>DI</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
<name>
<surname>Lobanenkov</surname>
<given-names>VV</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome</article-title>
<source>Cell</source>
<year>2007</year>
<volume>128</volume>
<fpage>1231</fpage>
<lpage>1245</lpage>
<pub-id pub-id-type="pmid">17382889</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bradley</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>XY</given-names>
</name>
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Pachter</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Tonkin</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Biggin</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species</article-title>
<source>PLoS Biol.</source>
<year>2010</year>
<volume>8</volume>
<fpage>e1000343</fpage>
<pub-id pub-id-type="pmid">20351773</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Huss</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vega</surname>
<given-names>VB</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Orlov</surname>
<given-names>YL</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Integration of external signaling pathways with the core transcriptional network in embryonic stem cells</article-title>
<source>Cell</source>
<year>2008</year>
<volume>133</volume>
<fpage>1106</fpage>
<lpage>1117</lpage>
<pub-id pub-id-type="pmid">18555785</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Eeckhoute</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Model-based analysis of ChIP-Seq (MACS)</article-title>
<source>Genome Biol.</source>
<year>2008</year>
<volume>9</volume>
<fpage>R137</fpage>
<pub-id pub-id-type="pmid">18798982</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilbanks</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Facciotti</surname>
<given-names>MT</given-names>
</name>
</person-group>
<article-title>Evaluation of algorithm performance in ChIP-seq peak detection</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<fpage>e11471</fpage>
<pub-id pub-id-type="pmid">20628599</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dean</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Raftery</surname>
<given-names>AE</given-names>
</name>
</person-group>
<article-title>Normal uniform mixture differential gene expression detection for cDNA microarrays</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>173</fpage>
<pub-id pub-id-type="pmid">16011807</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schones</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Sumazin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
</person-group>
<article-title>Similarity of position frequency matrices for transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>307</fpage>
<lpage>313</lpage>
<pub-id pub-id-type="pmid">15319260</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahony</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Auron</surname>
<given-names>PE</given-names>
</name>
<name>
<surname>Benos</surname>
<given-names>PV</given-names>
</name>
</person-group>
<article-title>DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies</article-title>
<source>PLoS Comput. Biol.</source>
<year>2007</year>
<volume>3</volume>
<fpage>e61</fpage>
<pub-id pub-id-type="pmid">17397256</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Sumazin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
</person-group>
<article-title>Identifying tissue-selective transcription factor binding sites in vertebrate promoters</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>1560</fpage>
<lpage>1565</lpage>
<pub-id pub-id-type="pmid">15668401</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinha</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>3586</fpage>
<lpage>3588</lpage>
<pub-id pub-id-type="pmid">12824371</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sumazin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hata</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
</person-group>
<article-title>DWE: discriminating word enumerator</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>31</fpage>
<lpage>38</lpage>
<pub-id pub-id-type="pmid">15333453</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Valouev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Sundquist</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Medina</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Anton</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Sidow</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data</article-title>
<source>Nat. Methods</source>
<year>2008</year>
<volume>5</volume>
<fpage>829</fpage>
<lpage>834</lpage>
<pub-id pub-id-type="pmid">19160518</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
</person-group>
<article-title>DREME: motif discovery in transcription factor ChIP-seq data</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>1653</fpage>
<lpage>1659</lpage>
<pub-id pub-id-type="pmid">21543442</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B46">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cao</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Rabinovich</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>VX</given-names>
</name>
<name>
<surname>Farnham</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>Genome-wide analysis of transcription factor E2F1 mutant proteins reveals that N- and C-terminal protein interaction domains do not participate in targeting E2F1 to the human genome</article-title>
<source>J. Biol. Chem.</source>
<year>2011</year>
<volume>286</volume>
<fpage>11985</fpage>
<lpage>11996</lpage>
<pub-id pub-id-type="pmid">21310950</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B47">
<label>47</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tuteja</surname>
<given-names>G</given-names>
</name>
<name>
<surname>White</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schug</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kaestner</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>Extracting transcription factor targets from ChIP-Seq data</article-title>
<source>Nucleic Acids Res.</source>
<year>2009</year>
<volume>37</volume>
<fpage>e113</fpage>
<pub-id pub-id-type="pmid">19553195</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B48">
<label>48</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liang</surname>
<given-names>HL</given-names>
</name>
<name>
<surname>Nien</surname>
<given-names>CY</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>HY</given-names>
</name>
<name>
<surname>Metzstein</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Kirov</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rushlow</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>The zinc-finger protein Zelda is a key activator of the early zygotic genome in Drosophila</article-title>
<source>Nature</source>
<year>2008</year>
<volume>456</volume>
<fpage>400</fpage>
<lpage>403</lpage>
<pub-id pub-id-type="pmid">18931655</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B49">
<label>49</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Badis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Kivioja</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Palin</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Enge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bonke</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jolma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Varjosalo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gehrke</surname>
<given-names>AR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo</article-title>
<source>EMBO J.</source>
<year>2010</year>
<volume>29</volume>
<fpage>2147</fpage>
<lpage>2160</lpage>
<pub-id pub-id-type="pmid">20517297</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B50">
<label>50</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Philippakis</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Qureshi</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>He</surname>
<given-names>FS</given-names>
</name>
<name>
<surname>Estep</surname>
<given-names>PW</given-names>
<suffix>III</suffix>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities</article-title>
<source>Nat. Biotechnol.</source>
<year>2006</year>
<volume> 24</volume>
<fpage>1429</fpage>
<lpage>1435</lpage>
<pub-id pub-id-type="pmid">16998473</pub-id>
</element-citation>
</ref>
<ref id="gkr1135-B51">
<label>51</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Whitington</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
</person-group>
<article-title>Inferring transcription factor complexes from ChIP-seq data</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>e98</fpage>
<pub-id pub-id-type="pmid">21602262</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F49  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000F49  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021