MersV1, Pmc, Corpus, bibRecord, 000273

***** Acces problem to record *****\

Identifieur interne : 000273 ( Pmc/Corpus ); précédent : 0002729; suivant : 0002740 ***** probable Xml problem with record *****

Links to Exploration step

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization</title>
<author><name sortKey="De Boer, Carl G" sort="De Boer, Carl G" uniqKey="De Boer C" first="Carl G." last="De Boer">Carl G. De Boer</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="GRID">grid.66859.34</institution-id>
<institution>Klarman Cell Observatory, Broad Institute of MIT and Harvard,</institution>
</institution-wrap>
Cambridge, MA 02142 USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Regev, Aviv" sort="Regev, Aviv" uniqKey="Regev A" first="Aviv" last="Regev">Aviv Regev</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="GRID">grid.66859.34</institution-id>
<institution>Klarman Cell Observatory, Broad Institute of MIT and Harvard,</institution>
</institution-wrap>
Cambridge, MA 02142 USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2341 2786</institution-id>
<institution-id institution-id-type="GRID">grid.116068.8</institution-id>
<institution>Department of Biology, Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology,</institution>
</institution-wrap>
Cambridge, MA 02140 USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="Aff3"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2167 1581</institution-id>
<institution-id institution-id-type="GRID">grid.413575.1</institution-id>
<institution>Howard Hughes Medical Institute,</institution>
</institution-wrap>
Chevy Chase, MD 20815 USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">29970004</idno>
<idno type="pmc">6029352</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6029352</idno>
<idno type="RBID">PMC:6029352</idno>
<idno type="doi">10.1186/s12859-018-2255-6</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000273</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000273</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization</title>
<author><name sortKey="De Boer, Carl G" sort="De Boer, Carl G" uniqKey="De Boer C" first="Carl G." last="De Boer">Carl G. De Boer</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="GRID">grid.66859.34</institution-id>
<institution>Klarman Cell Observatory, Broad Institute of MIT and Harvard,</institution>
</institution-wrap>
Cambridge, MA 02142 USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Regev, Aviv" sort="Regev, Aviv" uniqKey="Regev A" first="Aviv" last="Regev">Aviv Regev</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="GRID">grid.66859.34</institution-id>
<institution>Klarman Cell Observatory, Broad Institute of MIT and Harvard,</institution>
</institution-wrap>
Cambridge, MA 02142 USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2341 2786</institution-id>
<institution-id institution-id-type="GRID">grid.116068.8</institution-id>
<institution>Department of Biology, Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology,</institution>
</institution-wrap>
Cambridge, MA 02140 USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="Aff3"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2167 1581</institution-id>
<institution-id institution-id-type="GRID">grid.413575.1</institution-id>
<institution>Howard Hughes Medical Institute,</institution>
</institution-wrap>
Chevy Chase, MD 20815 USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p id="Par1">Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by <italic>K</italic>
-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark.</p>
</sec>
<sec><title>Results</title>
<p id="Par2">BROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups. Applied to single cell ATAC-seq, BROCKMAN readily distinguished cell types, treatments, batch effects, experimental artifacts, and cycling cells. We show that each variable component in the <italic>k</italic>
-mer landscape reflects a set of co-varying TFs, which are often known to physically interact. For example, in K562 cells, AP-1 TFs were central determinant of variability in chromatin accessibility through their variable expression levels and diverse interactions with other TFs. We provide a theoretical basis for why cooperative TF binding – and any associated epigenomic mark – is inherently more variable than non-cooperative binding.</p>
</sec>
<sec><title>Conclusions</title>
<p id="Par3">BROCKMAN and related approaches will help gain a mechanistic understanding of the <italic>trans</italic>
 determinants of chromatin variability between cells, treatments, and individuals.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (10.1186/s12859-018-2255-6) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Magnani, L" uniqKey="Magnani L">L Magnani</name>
</author>
<author><name sortKey="Eeckhoute, J" uniqKey="Eeckhoute J">J Eeckhoute</name>
</author>
<author><name sortKey="Lupien, M" uniqKey="Lupien M">M Lupien</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Sui, W" uniqKey="Sui W">W Sui</name>
</author>
<author><name sortKey="He, H" uniqKey="He H">H He</name>
</author>
<author><name sortKey="Yan, Q" uniqKey="Yan Q">Q Yan</name>
</author>
<author><name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author><name sortKey="Zhang, R" uniqKey="Zhang R">R Zhang</name>
</author>
<author><name sortKey="Dai, Y" uniqKey="Dai Y">Y Dai</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rendeiro, Af" uniqKey="Rendeiro A">AF Rendeiro</name>
</author>
<author><name sortKey="Schmidl, C" uniqKey="Schmidl C">C Schmidl</name>
</author>
<author><name sortKey="Strefford, Jc" uniqKey="Strefford J">JC Strefford</name>
</author>
<author><name sortKey="Walewska, R" uniqKey="Walewska R">R Walewska</name>
</author>
<author><name sortKey="Davis, Z" uniqKey="Davis Z">Z Davis</name>
</author>
<author><name sortKey="Farlik, M" uniqKey="Farlik M">M Farlik</name>
</author>
<author><name sortKey="Oscier, D" uniqKey="Oscier D">D Oscier</name>
</author>
<author><name sortKey="Bock, C" uniqKey="Bock C">C Bock</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Sun, W" uniqKey="Sun W">W Sun</name>
</author>
<author><name sortKey="Poschmann, J" uniqKey="Poschmann J">J Poschmann</name>
</author>
<author><name sortKey="Cruz Herrera Del Rosario, R" uniqKey="Cruz Herrera Del Rosario R">R Cruz-Herrera Del Rosario</name>
</author>
<author><name sortKey="Parikshak, Nn" uniqKey="Parikshak N">NN Parikshak</name>
</author>
<author><name sortKey="Hajan, Hs" uniqKey="Hajan H">HS Hajan</name>
</author>
<author><name sortKey="Kumar, V" uniqKey="Kumar V">V Kumar</name>
</author>
<author><name sortKey="Ramasamy, R" uniqKey="Ramasamy R">R Ramasamy</name>
</author>
<author><name sortKey="Belgard, Tg" uniqKey="Belgard T">TG Belgard</name>
</author>
<author><name sortKey="Elanggovan, B" uniqKey="Elanggovan B">B Elanggovan</name>
</author>
<author><name sortKey="Wong, Cc" uniqKey="Wong C">CC Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
<author><name sortKey="Ge, B" uniqKey="Ge B">B Ge</name>
</author>
<author><name sortKey="Casale, Fp" uniqKey="Casale F">FP Casale</name>
</author>
<author><name sortKey="Vasquez, L" uniqKey="Vasquez L">L Vasquez</name>
</author>
<author><name sortKey="Kwan, T" uniqKey="Kwan T">T Kwan</name>
</author>
<author><name sortKey="Garrido Martin, D" uniqKey="Garrido Martin D">D Garrido-Martin</name>
</author>
<author><name sortKey="Watt, S" uniqKey="Watt S">S Watt</name>
</author>
<author><name sortKey="Yan, Y" uniqKey="Yan Y">Y Yan</name>
</author>
<author><name sortKey="Kundu, K" uniqKey="Kundu K">K Kundu</name>
</author>
<author><name sortKey="Ecker, S" uniqKey="Ecker S">S Ecker</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rotem, A" uniqKey="Rotem A">A Rotem</name>
</author>
<author><name sortKey="Ram, O" uniqKey="Ram O">O Ram</name>
</author>
<author><name sortKey="Shoresh, N" uniqKey="Shoresh N">N Shoresh</name>
</author>
<author><name sortKey="Sperling, Ra" uniqKey="Sperling R">RA Sperling</name>
</author>
<author><name sortKey="Goren, A" uniqKey="Goren A">A Goren</name>
</author>
<author><name sortKey="Weitz, Da" uniqKey="Weitz D">DA Weitz</name>
</author>
<author><name sortKey="Bernstein, Be" uniqKey="Bernstein B">BE Bernstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Buenrostro, Jd" uniqKey="Buenrostro J">JD Buenrostro</name>
</author>
<author><name sortKey="Wu, B" uniqKey="Wu B">B Wu</name>
</author>
<author><name sortKey="Litzenburger, Um" uniqKey="Litzenburger U">UM Litzenburger</name>
</author>
<author><name sortKey="Ruff, D" uniqKey="Ruff D">D Ruff</name>
</author>
<author><name sortKey="Gonzales, Ml" uniqKey="Gonzales M">ML Gonzales</name>
</author>
<author><name sortKey="Snyder, Mp" uniqKey="Snyder M">MP Snyder</name>
</author>
<author><name sortKey="Chang, Hy" uniqKey="Chang H">HY Chang</name>
</author>
<author><name sortKey="Greenleaf, Wj" uniqKey="Greenleaf W">WJ Greenleaf</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cusanovich, Da" uniqKey="Cusanovich D">DA Cusanovich</name>
</author>
<author><name sortKey="Daza, R" uniqKey="Daza R">R Daza</name>
</author>
<author><name sortKey="Adey, A" uniqKey="Adey A">A Adey</name>
</author>
<author><name sortKey="Pliner, Ha" uniqKey="Pliner H">HA Pliner</name>
</author>
<author><name sortKey="Christiansen, L" uniqKey="Christiansen L">L Christiansen</name>
</author>
<author><name sortKey="Gunderson, Kl" uniqKey="Gunderson K">KL Gunderson</name>
</author>
<author><name sortKey="Steemers, Fj" uniqKey="Steemers F">FJ Steemers</name>
</author>
<author><name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author><name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jin, W" uniqKey="Jin W">W Jin</name>
</author>
<author><name sortKey="Tang, Q" uniqKey="Tang Q">Q Tang</name>
</author>
<author><name sortKey="Wan, M" uniqKey="Wan M">M Wan</name>
</author>
<author><name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author><name sortKey="Ren, G" uniqKey="Ren G">G Ren</name>
</author>
<author><name sortKey="Ni, B" uniqKey="Ni B">B Ni</name>
</author>
<author><name sortKey="Sklar, J" uniqKey="Sklar J">J Sklar</name>
</author>
<author><name sortKey="Przytycka, Tm" uniqKey="Przytycka T">TM Przytycka</name>
</author>
<author><name sortKey="Childs, R" uniqKey="Childs R">R Childs</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Clark, Sj" uniqKey="Clark S">SJ Clark</name>
</author>
<author><name sortKey="Lee, Hj" uniqKey="Lee H">HJ Lee</name>
</author>
<author><name sortKey="Smallwood, Sa" uniqKey="Smallwood S">SA Smallwood</name>
</author>
<author><name sortKey="Kelsey, G" uniqKey="Kelsey G">G Kelsey</name>
</author>
<author><name sortKey="Reik, W" uniqKey="Reik W">W Reik</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author><name sortKey="Knudsen, S" uniqKey="Knudsen S">S Knudsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Blanchette, M" uniqKey="Blanchette M">M Blanchette</name>
</author>
<author><name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Setty, M" uniqKey="Setty M">M Setty</name>
</author>
<author><name sortKey="Leslie, Cs" uniqKey="Leslie C">CS Leslie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ghandi, M" uniqKey="Ghandi M">M Ghandi</name>
</author>
<author><name sortKey="Lee, D" uniqKey="Lee D">D Lee</name>
</author>
<author><name sortKey="Mohammad Noori, M" uniqKey="Mohammad Noori M">M Mohammad-Noori</name>
</author>
<author><name sortKey="Beer, Ma" uniqKey="Beer M">MA Beer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lee, D" uniqKey="Lee D">D Lee</name>
</author>
<author><name sortKey="Gorkin, Du" uniqKey="Gorkin D">DU Gorkin</name>
</author>
<author><name sortKey="Baker, M" uniqKey="Baker M">M Baker</name>
</author>
<author><name sortKey="Strober, Bj" uniqKey="Strober B">BJ Strober</name>
</author>
<author><name sortKey="Asoni, Al" uniqKey="Asoni A">AL Asoni</name>
</author>
<author><name sortKey="Mccallion, As" uniqKey="Mccallion A">AS McCallion</name>
</author>
<author><name sortKey="Beer, Ma" uniqKey="Beer M">MA Beer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Consortium, Ep" uniqKey="Consortium E">EP Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Goke, J" uniqKey="Goke J">J Goke</name>
</author>
<author><name sortKey="Ng, Hh" uniqKey="Ng H">HH Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heinz, S" uniqKey="Heinz S">S Heinz</name>
</author>
<author><name sortKey="Benner, C" uniqKey="Benner C">C Benner</name>
</author>
<author><name sortKey="Spann, N" uniqKey="Spann N">N Spann</name>
</author>
<author><name sortKey="Bertolino, E" uniqKey="Bertolino E">E Bertolino</name>
</author>
<author><name sortKey="Lin, Yc" uniqKey="Lin Y">YC Lin</name>
</author>
<author><name sortKey="Laslo, P" uniqKey="Laslo P">P Laslo</name>
</author>
<author><name sortKey="Cheng, Jx" uniqKey="Cheng J">JX Cheng</name>
</author>
<author><name sortKey="Murre, C" uniqKey="Murre C">C Murre</name>
</author>
<author><name sortKey="Singh, H" uniqKey="Singh H">H Singh</name>
</author>
<author><name sortKey="Glass, Ck" uniqKey="Glass C">CK Glass</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chung, Nc" uniqKey="Chung N">NC Chung</name>
</author>
<author><name sortKey="Storey, Jd" uniqKey="Storey J">JD Storey</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Weirauch, Mt" uniqKey="Weirauch M">MT Weirauch</name>
</author>
<author><name sortKey="Yang, A" uniqKey="Yang A">A Yang</name>
</author>
<author><name sortKey="Albu, M" uniqKey="Albu M">M Albu</name>
</author>
<author><name sortKey="Cote, Ag" uniqKey="Cote A">AG Cote</name>
</author>
<author><name sortKey="Montenegro Montero, A" uniqKey="Montenegro Montero A">A Montenegro-Montero</name>
</author>
<author><name sortKey="Drewe, P" uniqKey="Drewe P">P Drewe</name>
</author>
<author><name sortKey="Najafabadi, Hs" uniqKey="Najafabadi H">HS Najafabadi</name>
</author>
<author><name sortKey="Lambert, Sa" uniqKey="Lambert S">SA Lambert</name>
</author>
<author><name sortKey="Mann, I" uniqKey="Mann I">I Mann</name>
</author>
<author><name sortKey="Cook, K" uniqKey="Cook K">K Cook</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Deininger, Mw" uniqKey="Deininger M">MW Deininger</name>
</author>
<author><name sortKey="Goldman, Jm" uniqKey="Goldman J">JM Goldman</name>
</author>
<author><name sortKey="Melo, Jv" uniqKey="Melo J">JV Melo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Raitano, Ab" uniqKey="Raitano A">AB Raitano</name>
</author>
<author><name sortKey="Halpern, Jr" uniqKey="Halpern J">JR Halpern</name>
</author>
<author><name sortKey="Hambuch, Tm" uniqKey="Hambuch T">TM Hambuch</name>
</author>
<author><name sortKey="Sawyers, Cl" uniqKey="Sawyers C">CL Sawyers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shaulian, E" uniqKey="Shaulian E">E Shaulian</name>
</author>
<author><name sortKey="Karin, M" uniqKey="Karin M">M Karin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hess, J" uniqKey="Hess J">J Hess</name>
</author>
<author><name sortKey="Angel, P" uniqKey="Angel P">P Angel</name>
</author>
<author><name sortKey="Schorpp Kistner, M" uniqKey="Schorpp Kistner M">M Schorpp-Kistner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Karin, M" uniqKey="Karin M">M Karin</name>
</author>
<author><name sortKey="Liu, Z" uniqKey="Liu Z">Z Liu</name>
</author>
<author><name sortKey="Zandi, E" uniqKey="Zandi E">E Zandi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dixit, A" uniqKey="Dixit A">A Dixit</name>
</author>
<author><name sortKey="Parnas, O" uniqKey="Parnas O">O Parnas</name>
</author>
<author><name sortKey="Li, B" uniqKey="Li B">B Li</name>
</author>
<author><name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author><name sortKey="Fulco, Cp" uniqKey="Fulco C">CP Fulco</name>
</author>
<author><name sortKey="Jerby Arnon, L" uniqKey="Jerby Arnon L">L Jerby-Arnon</name>
</author>
<author><name sortKey="Marjanovic, Nd" uniqKey="Marjanovic N">ND Marjanovic</name>
</author>
<author><name sortKey="Dionne, D" uniqKey="Dionne D">D Dionne</name>
</author>
<author><name sortKey="Burks, T" uniqKey="Burks T">T Burks</name>
</author>
<author><name sortKey="Raychowdhury, R" uniqKey="Raychowdhury R">R Raychowdhury</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Liberati, Nt" uniqKey="Liberati N">NT Liberati</name>
</author>
<author><name sortKey="Datto, Mb" uniqKey="Datto M">MB Datto</name>
</author>
<author><name sortKey="Frederick, Jp" uniqKey="Frederick J">JP Frederick</name>
</author>
<author><name sortKey="Shen, X" uniqKey="Shen X">X Shen</name>
</author>
<author><name sortKey="Wong, C" uniqKey="Wong C">C Wong</name>
</author>
<author><name sortKey="Rougier Chapman, Em" uniqKey="Rougier Chapman E">EM Rougier-Chapman</name>
</author>
<author><name sortKey="Wang, Xf" uniqKey="Wang X">XF Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Hai, T" uniqKey="Hai T">T Hai</name>
</author>
<author><name sortKey="Curran, T" uniqKey="Curran T">T Curran</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bassuk, Ag" uniqKey="Bassuk A">AG Bassuk</name>
</author>
<author><name sortKey="Leiden, Jm" uniqKey="Leiden J">JM Leiden</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chinenov, Y" uniqKey="Chinenov Y">Y Chinenov</name>
</author>
<author><name sortKey="Kerppola, Tk" uniqKey="Kerppola T">TK Kerppola</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rolland, T" uniqKey="Rolland T">T Rolland</name>
</author>
<author><name sortKey="Tasan, M" uniqKey="Tasan M">M Tasan</name>
</author>
<author><name sortKey="Charloteaux, B" uniqKey="Charloteaux B">B Charloteaux</name>
</author>
<author><name sortKey="Pevzner, Sj" uniqKey="Pevzner S">SJ Pevzner</name>
</author>
<author><name sortKey="Zhong, Q" uniqKey="Zhong Q">Q Zhong</name>
</author>
<author><name sortKey="Sahni, N" uniqKey="Sahni N">N Sahni</name>
</author>
<author><name sortKey="Yi, S" uniqKey="Yi S">S Yi</name>
</author>
<author><name sortKey="Lemmens, I" uniqKey="Lemmens I">I Lemmens</name>
</author>
<author><name sortKey="Fontanillo, C" uniqKey="Fontanillo C">C Fontanillo</name>
</author>
<author><name sortKey="Mosca, R" uniqKey="Mosca R">R Mosca</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shalek, Ak" uniqKey="Shalek A">AK Shalek</name>
</author>
<author><name sortKey="Satija, R" uniqKey="Satija R">R Satija</name>
</author>
<author><name sortKey="Adiconis, X" uniqKey="Adiconis X">X Adiconis</name>
</author>
<author><name sortKey="Gertner, Rs" uniqKey="Gertner R">RS Gertner</name>
</author>
<author><name sortKey="Gaublomme, Jt" uniqKey="Gaublomme J">JT Gaublomme</name>
</author>
<author><name sortKey="Raychowdhury, R" uniqKey="Raychowdhury R">R Raychowdhury</name>
</author>
<author><name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author><name sortKey="Yosef, N" uniqKey="Yosef N">N Yosef</name>
</author>
<author><name sortKey="Malboeuf, C" uniqKey="Malboeuf C">C Malboeuf</name>
</author>
<author><name sortKey="Lu, D" uniqKey="Lu D">D Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author><name sortKey="Cacchiarelli, D" uniqKey="Cacchiarelli D">D Cacchiarelli</name>
</author>
<author><name sortKey="Grimsby, J" uniqKey="Grimsby J">J Grimsby</name>
</author>
<author><name sortKey="Pokharel, P" uniqKey="Pokharel P">P Pokharel</name>
</author>
<author><name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author><name sortKey="Morse, M" uniqKey="Morse M">M Morse</name>
</author>
<author><name sortKey="Lennon, Nj" uniqKey="Lennon N">NJ Lennon</name>
</author>
<author><name sortKey="Livak, Kj" uniqKey="Livak K">KJ Livak</name>
</author>
<author><name sortKey="Mikkelsen, Ts" uniqKey="Mikkelsen T">TS Mikkelsen</name>
</author>
<author><name sortKey="Rinn, Jl" uniqKey="Rinn J">JL Rinn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Warren, L" uniqKey="Warren L">L Warren</name>
</author>
<author><name sortKey="Bryder, D" uniqKey="Bryder D">D Bryder</name>
</author>
<author><name sortKey="Weissman, Il" uniqKey="Weissman I">IL Weissman</name>
</author>
<author><name sortKey="Quake, Sr" uniqKey="Quake S">SR Quake</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tanay, A" uniqKey="Tanay A">A Tanay</name>
</author>
<author><name sortKey="Regev, A" uniqKey="Regev A">A Regev</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Taniguchi, Y" uniqKey="Taniguchi Y">Y Taniguchi</name>
</author>
<author><name sortKey="Choi, Pj" uniqKey="Choi P">PJ Choi</name>
</author>
<author><name sortKey="Li, Gw" uniqKey="Li G">GW Li</name>
</author>
<author><name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author><name sortKey="Babu, M" uniqKey="Babu M">M Babu</name>
</author>
<author><name sortKey="Hearn, J" uniqKey="Hearn J">J Hearn</name>
</author>
<author><name sortKey="Emili, A" uniqKey="Emili A">A Emili</name>
</author>
<author><name sortKey="Xie, Xs" uniqKey="Xie X">XS Xie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Corces, Mr" uniqKey="Corces M">MR Corces</name>
</author>
<author><name sortKey="Buenrostro, Jd" uniqKey="Buenrostro J">JD Buenrostro</name>
</author>
<author><name sortKey="Wu, B" uniqKey="Wu B">B Wu</name>
</author>
<author><name sortKey="Greenside, Pg" uniqKey="Greenside P">PG Greenside</name>
</author>
<author><name sortKey="Chan, Sm" uniqKey="Chan S">SM Chan</name>
</author>
<author><name sortKey="Koenig, Jl" uniqKey="Koenig J">JL Koenig</name>
</author>
<author><name sortKey="Snyder, Mp" uniqKey="Snyder M">MP Snyder</name>
</author>
<author><name sortKey="Pritchard, Jk" uniqKey="Pritchard J">JK Pritchard</name>
</author>
<author><name sortKey="Kundaje, A" uniqKey="Kundaje A">A Kundaje</name>
</author>
<author><name sortKey="Greenleaf, Wj" uniqKey="Greenleaf W">WJ Greenleaf</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Voss, Tc" uniqKey="Voss T">TC Voss</name>
</author>
<author><name sortKey="Schiltz, Rl" uniqKey="Schiltz R">RL Schiltz</name>
</author>
<author><name sortKey="Sung, Mh" uniqKey="Sung M">MH Sung</name>
</author>
<author><name sortKey="Yen, Pm" uniqKey="Yen P">PM Yen</name>
</author>
<author><name sortKey="Stamatoyannopoulos, Ja" uniqKey="Stamatoyannopoulos J">JA Stamatoyannopoulos</name>
</author>
<author><name sortKey="Biddie, Sc" uniqKey="Biddie S">SC Biddie</name>
</author>
<author><name sortKey="Johnson, Ta" uniqKey="Johnson T">TA Johnson</name>
</author>
<author><name sortKey="Miranda, Tb" uniqKey="Miranda T">TB Miranda</name>
</author>
<author><name sortKey="John, S" uniqKey="John S">S John</name>
</author>
<author><name sortKey="Hager, Gl" uniqKey="Hager G">GL Hager</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mirny, La" uniqKey="Mirny L">LA Mirny</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sheffield, Nc" uniqKey="Sheffield N">NC Sheffield</name>
</author>
<author><name sortKey="Thurman, Re" uniqKey="Thurman R">RE Thurman</name>
</author>
<author><name sortKey="Song, L" uniqKey="Song L">L Song</name>
</author>
<author><name sortKey="Safi, A" uniqKey="Safi A">A Safi</name>
</author>
<author><name sortKey="Stamatoyannopoulos, Ja" uniqKey="Stamatoyannopoulos J">JA Stamatoyannopoulos</name>
</author>
<author><name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
<author><name sortKey="Crawford, Ge" uniqKey="Crawford G">GE Crawford</name>
</author>
<author><name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Biddie, Sc" uniqKey="Biddie S">SC Biddie</name>
</author>
<author><name sortKey="John, S" uniqKey="John S">S John</name>
</author>
<author><name sortKey="Sabo, Pj" uniqKey="Sabo P">PJ Sabo</name>
</author>
<author><name sortKey="Thurman, Re" uniqKey="Thurman R">RE Thurman</name>
</author>
<author><name sortKey="Johnson, Ta" uniqKey="Johnson T">TA Johnson</name>
</author>
<author><name sortKey="Schiltz, Rl" uniqKey="Schiltz R">RL Schiltz</name>
</author>
<author><name sortKey="Miranda, Tb" uniqKey="Miranda T">TB Miranda</name>
</author>
<author><name sortKey="Sung, Mh" uniqKey="Sung M">MH Sung</name>
</author>
<author><name sortKey="Trump, S" uniqKey="Trump S">S Trump</name>
</author>
<author><name sortKey="Lightman, Sl" uniqKey="Lightman S">SL Lightman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Karolchik, D" uniqKey="Karolchik D">D Karolchik</name>
</author>
<author><name sortKey="Hinrichs, As" uniqKey="Hinrichs A">AS Hinrichs</name>
</author>
<author><name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
<author><name sortKey="Roskin, Km" uniqKey="Roskin K">KM Roskin</name>
</author>
<author><name sortKey="Sugnet, Cw" uniqKey="Sugnet C">CW Sugnet</name>
</author>
<author><name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
<author><name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Pollen, Aa" uniqKey="Pollen A">AA Pollen</name>
</author>
<author><name sortKey="Nowakowski, Tj" uniqKey="Nowakowski T">TJ Nowakowski</name>
</author>
<author><name sortKey="Shuga, J" uniqKey="Shuga J">J Shuga</name>
</author>
<author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Leyrat, Aa" uniqKey="Leyrat A">AA Leyrat</name>
</author>
<author><name sortKey="Lui, Jh" uniqKey="Lui J">JH Lui</name>
</author>
<author><name sortKey="Li, N" uniqKey="Li N">N Li</name>
</author>
<author><name sortKey="Szpankowski, L" uniqKey="Szpankowski L">L Szpankowski</name>
</author>
<author><name sortKey="Fowler, B" uniqKey="Fowler B">B Fowler</name>
</author>
<author><name sortKey="Chen, P" uniqKey="Chen P">P Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Granek, Ja" uniqKey="Granek J">JA Granek</name>
</author>
<author><name sortKey="Clarke, Nd" uniqKey="Clarke N">ND Clarke</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">29970004</article-id>
<article-id pub-id-type="pmc">6029352</article-id>
<article-id pub-id-type="publisher-id">2255</article-id>
<article-id pub-id-type="doi">10.1186/s12859-018-2255-6</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><contrib-id contrib-id-type="orcid">http://orcid.org/0000-0001-8935-5921</contrib-id>
<name><surname>de Boer</surname>
<given-names>Carl G.</given-names>
</name>
<address><email>cgdeboer@broadinstitute.org</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Regev</surname>
<given-names>Aviv</given-names>
</name>
<address><email>carlgdeboer@gmail.com</email>
<email>aregev@broadinstitute.org</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<aff id="Aff1"><label>1</label>
<institution-wrap><institution-id institution-id-type="GRID">grid.66859.34</institution-id>
<institution>Klarman Cell Observatory, Broad Institute of MIT and Harvard,</institution>
</institution-wrap>
Cambridge, MA 02142 USA</aff>
<aff id="Aff2"><label>2</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2341 2786</institution-id>
<institution-id institution-id-type="GRID">grid.116068.8</institution-id>
<institution>Department of Biology, Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology,</institution>
</institution-wrap>
Cambridge, MA 02140 USA</aff>
<aff id="Aff3"><label>3</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2167 1581</institution-id>
<institution-id institution-id-type="GRID">grid.413575.1</institution-id>
<institution>Howard Hughes Medical Institute,</institution>
</institution-wrap>
Chevy Chase, MD 20815 USA</aff>
</contrib-group>
<pub-date pub-type="epub"><day>3</day>
<month>7</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>3</day>
<month>7</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection"><year>2018</year>
</pub-date>
<volume>19</volume>
<elocation-id>253</elocation-id>
<history><date date-type="received"><day>6</day>
<month>9</month>
<year>2017</year>
</date>
<date date-type="accepted"><day>20</day>
<month>6</month>
<year>2018</year>
</date>
</history>
<permissions><copyright-statement>© The Author(s). 2018</copyright-statement>
<license license-type="OpenAccess"><license-p><bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1"><sec><title>Background</title>
<p id="Par1">Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by <italic>K</italic>
-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark.</p>
</sec>
<sec><title>Results</title>
<p id="Par2">BROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups. Applied to single cell ATAC-seq, BROCKMAN readily distinguished cell types, treatments, batch effects, experimental artifacts, and cycling cells. We show that each variable component in the <italic>k</italic>
-mer landscape reflects a set of co-varying TFs, which are often known to physically interact. For example, in K562 cells, AP-1 TFs were central determinant of variability in chromatin accessibility through their variable expression levels and diverse interactions with other TFs. We provide a theoretical basis for why cooperative TF binding – and any associated epigenomic mark – is inherently more variable than non-cooperative binding.</p>
</sec>
<sec><title>Conclusions</title>
<p id="Par3">BROCKMAN and related approaches will help gain a mechanistic understanding of the <italic>trans</italic>
 determinants of chromatin variability between cells, treatments, and individuals.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (10.1186/s12859-018-2255-6) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en"><title>Keywords</title>
<kwd>Single-cell</kwd>
<kwd>Epigenome</kwd>
<kwd>Chromatin</kwd>
<kwd>scATAC-seq</kwd>
<kwd>K-mer</kwd>
<kwd>N-gram</kwd>
<kwd>Factorization</kwd>
<kwd>Decomposition</kwd>
<kwd>Clustering</kwd>
<kwd>Transcription factor</kwd>
</kwd-group>
<funding-group><award-group><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100000024</institution-id>
<institution>Canadian Institutes of Health Research</institution>
</institution-wrap>
</funding-source>
<award-id>Fellowship</award-id>
<principal-award-recipient><name><surname>de Boer</surname>
<given-names>Carl G.</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<funding-group><award-group><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000011</institution-id>
<institution>Howard Hughes Medical Institute</institution>
</institution-wrap>
</funding-source>
<award-id>Investigator</award-id>
<principal-award-recipient><name><surname>Regev</surname>
<given-names>Aviv</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<funding-group><award-group><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000051</institution-id>
<institution>National Human Genome Research Institute</institution>
</institution-wrap>
</funding-source>
<award-id>CEGS</award-id>
<principal-award-recipient><name><surname>Regev</surname>
<given-names>Aviv</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<custom-meta-group><custom-meta><meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2018</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body><sec id="Sec1"><title>Background</title>
<p id="Par15">Understanding how the dynamic interaction of transcription factors (TFs) and chromatin governs cell types, differentiation, and responses is a fundamental challenge. TFs recognize and bind to specific DNA sequences and can potentially affect chromatin structure and gene expression through various means, including recruiting histone modifiers, chromatin remodelers, and the mediator complex. In particular, “pioneer” TFs may be able to open chromatin and, in so doing, allow other factors to bind to the now-accessible DNA [<xref ref-type="bibr" rid="CR1">1</xref>
]. Measurements of chromatin state, including features such as DNA accessibility, histone modifications, and TF occupancy, have shed important light on the mechanisms governing gene expression.</p>
<p id="Par16">Epigenomic data has recently increased dramatically in scale and complexity, with studies profiling either large numbers of individuals (e.g. [<xref ref-type="bibr" rid="CR2">2</xref>
–<xref ref-type="bibr" rid="CR7">7</xref>
]), or using single-cell epigenomics to profile chromatin traits in individual cells. Single cell epigenomics can help discover and understand the variation in chromatin organization and gene regulation within a single cell type or in a complex cell population [<xref ref-type="bibr" rid="CR8">8</xref>
–<xref ref-type="bibr" rid="CR12">12</xref>
]. In particular, single-cell ATAC-seq (scATAC-seq) allows measurement of DNA accessibility in single cells, including at high throughput [<xref ref-type="bibr" rid="CR9">9</xref>
, <xref ref-type="bibr" rid="CR10">10</xref>
].</p>
<p id="Par17">However, single cell epigenomics data is inherently sparse, since every locus is present at only two copies per diploid cell [<xref ref-type="bibr" rid="CR9">9</xref>
], such that ascertaining the state of an individual cell is challenging. One solution is to pool signals – either across cells (e.g., of the same known type or a discovered cluster) [<xref ref-type="bibr" rid="CR8">8</xref>
] or across loci sharing a known trait (e.g., binding by a TF) [<xref ref-type="bibr" rid="CR8">8</xref>
–<xref ref-type="bibr" rid="CR10">10</xref>
]. Unfortunately, rare cell states may be overlooked when common or bulk-based peaks are used as the basis for clustering or grouping [<xref ref-type="bibr" rid="CR8">8</xref>
–<xref ref-type="bibr" rid="CR10">10</xref>
], whereas clustering cells directly from sparse single cell epigenomic data is difficult [<xref ref-type="bibr" rid="CR8">8</xref>
, <xref ref-type="bibr" rid="CR10">10</xref>
]. Grouping loci by TF motifs [<xref ref-type="bibr" rid="CR9">9</xref>
] reduces this sparsity by averaging sparse signals across multiple loci that share a common feature (e.g., motif) and, furthermore, may represent the nature of TFs interacting with chromatin. However, it requires that motifs for all relevant TFs be known a priori, and that these motifs faithfully represent the specificities of the TFs.</p>
<p id="Par18">Conversely, the representation of regulatory DNA as a set of DNA words (<italic>k</italic>
-mers) has been used extensively in the past (e.g., [<xref ref-type="bibr" rid="CR13">13</xref>
–<xref ref-type="bibr" rid="CR15">15</xref>
]), and can even capture uncharacterized TF specificities. In particular, studies using chromatin profiles from bulk populations show a differential frequency of the <italic>k</italic>
-mers associated with these marks in different cell types [<xref ref-type="bibr" rid="CR16">16</xref>
, <xref ref-type="bibr" rid="CR17">17</xref>
]. This, in turn, captures the differential activity of TFs and the chromatin marks they relate to, such that a cell type with a higher level of an active TF has more of the <italic>k</italic>
-mers it recognizes associated with the chromatin mark (Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
 - top). This principle has been used to identify differential TF binding between samples [<xref ref-type="bibr" rid="CR18">18</xref>
]. However, existing approaches are unsuitable for exploratory analysis, where the identities of the samples are unknown, as may be the case for new cell subtypes or states in a population of single cells.<fig id="Fig1"><label>Fig. 1</label>
<caption><p>BROCKMAN. <bold>a</bold>
 The relationship between the differential activity of TFs that open chromatin and the numbers of their cognate motifs associated with open chromatin. Shown is a cartoon example of the impact of TFs (circles) on chromatin accessibility when the TF’s concentration is low (left) or high (right), for different scenarios of TFs that can (top and bottom rows) or cannot (middle row) open chromatin. If the TF can open chromatin either alone (top) or cooperatively (bottom), a change in the concentration or activity of TFs will affect the number of accessible binding sites in the cell (colored bars). If a TF has no effect on accessibility (middle), there will be no relationship between accessible motifs (bars) and the TF’s concentration. <bold>b</bold>
 BROCKMAN method. From left: genomic sequences associated with open chromatin or another feature of interest are used as input (left), and the frequency of each <italic>k</italic>
-mer in open chromatin/feature (row) is counted in each sample (column) (middle), the resulting <italic>k</italic>
-mer frequency matrix is then decomposed by PCA (right) into the <italic>k</italic>
-mers contributing to each PC (left matrix) and the projection of the samples into the new (PC) space (right matrix)</p>
</caption>
<graphic xlink:href="12859_2018_2255_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p id="Par19">Here, we present BROCKMAN, a method for representing epigenomic data by the <italic>k</italic>
-mer words associated with the epigenomic mark, using matrix factorization and dimensionality reduction to: (1) analyze variation in <italic>k</italic>
-mer occupancy across single cells as a basis for distinguishing different cell types, states, and treatments; (2) identify differentially active TFs; and (3) decipher TF-TF interactions. Applying BROCKMAN to scATAC-seq profiles, we show that cell-cell variation in <italic>k</italic>
-mers associated with open chromatin provides a robust and information-rich representation that can readily distinguish different cell types, drug treatments, biological artifacts, and cycling cells without any knowledge of TFs and without requiring peak calling on bulk or pooled single cell profiles. Leveraging known TF specificities, we demonstrate that the individual components of our reduced-dimensionality <italic>k</italic>
-mer space correspond to individual TFs or groups of TFs that tend to be more lowly expressed, consistent with transcriptional bursting causing noisy TF expression. The TFs that co-vary within a <italic>k</italic>
-mer component are more likely to physically interact, consistent with biochemical cooperativity between TFs, which we show is expected to be especially variable. BROCKMAN thus provides a highly effective tool for exploratory data analysis for high-dimensional or single cell epigenomics.</p>
</sec>
<sec id="Sec2"><title>Results</title>
<sec id="Sec3"><title>BROCKMAN captures variations in <italic>k</italic>
-mer frequency in open chromatin</title>
<p id="Par20">Since some TFs can modify chromatin where they bind, the differential activity of TFs should be reflected in differential chromatin states at locations containing the TF’s binding motif. For example, if the levels of a given active TF in a cell are too low for it to bind its motif and modify chromatin, then the chromatin modification will be not be associated with this TF’s motifs. As the level of an active TF rises, it will bind its motif in the DNA and modify chromatin, leaving signature motifs next to the chromatin modification it elicited. Thus, by capturing a motif (represented by <italic>k</italic>
-mers) associated with the chromatin mark, we can infer the activity of the motif’s cognate TF (i.e., the TF that recognizes the <italic>k</italic>
-mers, and places the mark). In the context of chromatin accessibility (Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
), as the level of an active TF that opens chromatin rises, it should bind more, opening chromatin around its binding sites in the process (Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
 - top). Meanwhile, changes in the concentration of an active TF that cannot open chromatin has no impact on the accessibility around its binding sites (Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
 – middle). Finally, if two TFs bind together (either because they work cooperatively, or because one potentiates the binding of the other), we expect that the accessibility of their binding sites should co-vary (Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
 – bottom). Although we may not know a priori what TFs are variable in a system, nor what sequences each TF recognizes, following the frequency of gapped <italic>k</italic>
-mers (DNA words of length <italic>k</italic>
, containing gaps) in different chromatin regions should allow us to uncover such dependencies. In particular, because a TF may recognize multiple related <italic>k</italic>
-mers, these related <italic>k</italic>
-mers should co-vary with each other, reflecting on the (hidden) activity of their joint, cognate TF.</p>
<p id="Par21">To capture these dependencies in <italic>k</italic>
-mer space we devised BROCKMAN, a procedure that combines matrix factorization with dimensionality reduction of chromatin mark-associated <italic>k</italic>
-mer frequencies (Fig. <xref rid="Fig1" ref-type="fig">1b</xref>
; Additional file <xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1). BROCKMAN (1) takes as input profiles of chromatin marks or accessibility across a set of cells or samples; and (2) counts, for each cell or sample, the frequencies of gapped <italic>k</italic>
-mers (length 1–8, all possible gaps) at loci associated with a chromatin mark of interest, yielding a matrix of <italic>k</italic>
-mer frequencies by samples. It then (3) decomposes this matrix of <italic>k</italic>
-mer frequencies to identify groups of <italic>k</italic>
-mers that co-vary across the samples and reduces the dimensionality of the data. Finally, (4) we can explore the relationships between cells/samples in this reduced-dimension space, and identify the <italic>k</italic>
-mers (and associated TFs) that underlie differences between cells or samples.</p>
</sec>
<sec id="Sec4"><title>BROCKMAN identifies cell types, treatments, and outliers</title>
<p id="Par22">We applied BROCKMAN to scATAC-seq data from 1440 single human cells, spanning drug treated and untreated cells from the chronic myelogenous leukaemia cell line K562, as well as lymphoblastoid cell lines (LCLs; GM12878 (GM)), human embryonic stem cells (H1ESC), fibroblasts (BJ), erythroblasts (TF-1), and promyeloblasts (HL-60), sometimes including multiple replicates [<xref ref-type="bibr" rid="CR9">9</xref>
] (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
). We scored <italic>k</italic>
-mers within 50 bp of each transposon integration site (open chromatin locus; <xref rid="Sec11" ref-type="sec">Methods</xref>
), decomposed the resulting <italic>k</italic>
-mer frequency matrix using principal component analysis (PCA), and applied <italic>t</italic>
-stochastic neighborhood embedding (t-SNE) to the resulting significant principal components (PCs; <xref rid="Sec11" ref-type="sec">Methods</xref>
) to facilitate visual inspection (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
).<fig id="Fig2"><label>Fig. 2</label>
<caption><p>BROCKMAN identifies cell types, drug treatments, cycling cells, and experimental artifacts in scATAC-seq data. <bold>a</bold>
 Identification of cell types. t-SNE two dimensional projection of the 131 significant PCs for all cells. Cells are colored by pre-annotated type (legend) and major cell type clusters are encircled. GM = GM12878 (LCLs), rep = replicate, Imat = Imatinib (BCR-ABL inhibition), CDKi = CDK4/6 inhibition, JNKi = JNK inhibition, TNFa = TNFa treatment. <bold>b</bold>
 Detection of outliers. Shown are the cell indices (position on C1 chip) for cells from K562-replicate 3, with outlier K562 cells (as in <bold>a</bold>
) marked in black. The outlier cells have consecutive indices suggesting a shared location on the chip. White: cells filtered out prior to analysis. <bold>c</bold>
 Cell cycle phases. t-SNE projection as in <bold>a</bold>
, but with color indicating cell cycle stage as determined by the ATAC reads falling within replication domains, showing that the “mixed” population from <bold>a</bold>
 are comprised primarily of replicating cells</p>
</caption>
<graphic xlink:href="12859_2018_2255_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p id="Par23">Note that while there are many factorization approaches, PCA proved highly appropriate because it has been repeatedly successful at capturing biological signals in diverse datasets, allows projection of new samples onto learned components, yields interpretable <italic>k</italic>
-mer loadings (defined as the weights by which the scaled <italic>k</italic>
-mer frequencies are multiplied to yield the projections of cells on to PCs), and is appropriate for our relatively non-sparse data (most 8-mers (our maximum <italic>k</italic>
) are observed at least 9 times per cell in our analysis). Indeed, performing PCA on a subset of cells yields similar PCs to the entire set and projecting held-out cells onto the learned PCs, results in co-clustering of related cells (data not shown). Factorization by Independent Component Analysis and Sparse Minibatch PCA yielded similar results (data not shown).</p>
<p id="Par24">Cells from the different cell types readily partitioned into distinct clusters, as did cells of the same type (K562) between treatments (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
). We also observed separation between different untreated replicates, suggesting possible batch effects with biological implications. In particular, a subset of K562 cells from one replicate formed a separate cluster (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
 “K562-rep3 outliers”), distinct from the other K562 cells. These outlier cells had consecutive cell indices (Fig. <xref rid="Fig2" ref-type="fig">2b</xref>
), representing adjacent cells on the C1 chip used to collect the data, suggesting an experimental artifact.</p>
<p id="Par25">One grouping (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
, “Mixed”) was comprised of multiple distinct cell types, including some of every cell type except fibroblast (BJ) cells, and we hypothesized these may represent cycling cells sharing a common cell cycle signature. To test this hypothesis, we counted the number of ATAC-seq reads in the different replication timing domains previously defined by Repli-seq in K562 cells [<xref ref-type="bibr" rid="CR19">19</xref>
] and calculated, for each cell, the ratio of reads from (G2 + S) replication timing domains to those from G1 domains (Fig. <xref rid="Fig2" ref-type="fig">2c</xref>
). Cells with a high (G2 + S)/G1 ATAC-read ratio either fall into the “mixed” grouping, or form a separate sub-region of a single cell type grouping, alongside the non-replicating cells of the same type (e.g., HL60 cells – right side; Fig. <xref rid="Fig2" ref-type="fig">2a, c</xref>
). Thus, BROCKMAN was able to group cells by cell type, treatment, batch, and cell cycle without ever calling peaks or directly considering TFs.</p>
</sec>
<sec id="Sec5"><title>Chromatin accessibility in repetitive DNA and outside peaks impacts cell grouping</title>
<p id="Par26">Current analyses are typically performed for only a sub-set of reads, often those that reside within peaks and can be uniquely mapped. However, this could lead to loss of key biological information. For example, although reads outside of ATAC-peaks may reflect assay noise, they could also include cell-specific chromatin signatures, especially from regions open only in rare cell types, which may not be evident from bulk ATAC-seq or even from aggregate scATAC-seq data, and would be excluded if only reads within peaks are considered. In another example, although repeat regions may be important loci of gene regulation [<xref ref-type="bibr" rid="CR20">20</xref>
], challenges in correct mapping and genetic variability between cells may make it difficult to include them in analyses.</p>
<p id="Par27">We thus next determined how such variables affect our ability to group cells, considering only the different K562 samples. We quantified how well cells were grouped within the PC space (of only significant PCs), using the sample label for treatment and replicate as the “ground truth”. First, as a local measure, we assessed the number of cells from the same sample among each cell’s <italic>k</italic>
-nearest neighbors (<italic>k</italic>
 = 20, by Euclidean distance in significant PC space) (Fig. <xref rid="Fig3" ref-type="fig">3a</xref>
-<xref rid="Fig3" ref-type="fig">d</xref>
); Second, as a global measure, we compared how well Euclidean distance in the PC space discriminates between cells from the same sample and cells from all other samples (Fig. <xref rid="Fig3" ref-type="fig">3e</xref>
-<xref rid="Fig3" ref-type="fig">h</xref>
).<fig id="Fig3"><label>Fig. 3</label>
<caption><p>scATAC-seq reads outside of peaks or within repeat regions improve cell grouping. <bold>a</bold>
-<bold>d</bold>
 Local grouping. The distribution for all K562 cells of the number of cells among each cell’s 20 nearest neighbors that share its sample label (<italic>x</italic>
 axis). <italic>P</italic>
-values: Wilcoxon rank sum test. <bold>e</bold>
-<bold>h</bold>
 Global grouping. ROC curves for how well cells within the same sample are distinguished from those in different samples by their distance in significant PC space. P-values calculated by bootstrapping (<xref rid="Sec11" ref-type="sec">Methods</xref>
). (<bold>a</bold>
, <bold>e</bold>
) reads in (red) vs. outside (blue) of peaks called on pooled scATAC data for K562 s; (<bold>b</bold>
, <bold>f</bold>
) reads in (red) vs. outside (blue) of peaks called on high-coverage K562 DNaseI-seq, considering only untreated K562 cells; (<bold>c</bold>
, <bold>g</bold>
) all reads (red) vs. only reads outside repeat elements (blue); or (<bold>d</bold>
, <bold>h</bold>
) using gapped (red) or ungapped (blue) <italic>k</italic>
-mers</p>
</caption>
<graphic xlink:href="12859_2018_2255_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p id="Par28">Surprisingly, including only reads outside of peak regions improved cell grouping. To show this, we partitioned reads into two groups, and performed BROCKMAN on each set separately: reads within 250 bp of any of the 46,145 called peaks, and reads outside this window. (Peaks were called by Homer [<xref ref-type="bibr" rid="CR21">21</xref>
] after pooling the single cell profiles of all K562 cells; <xref rid="Sec11" ref-type="sec">Methods</xref>
). Remarkably, using only the set of reads outside of peaks performed better than using only reads within peaks (Fig. <xref rid="Fig3" ref-type="fig">3a, e</xref>
), particularly when considering the local neighborhood (Fig. <xref rid="Fig3" ref-type="fig">3a</xref>
). We considered that this surprising observation could result from a decreased power to detect peaks using pooled scATAC profiles, and so we performed the same analysis again, but this time considering only untreated K562 scATAC samples and using peaks from high-coverage K562 DNaseI-seq data from ENCODE [<xref ref-type="bibr" rid="CR19">19</xref>
], which included 360,648 distinct hypersensitive sites. Here too, we found reads outside of peaks (comprising, on average, 55% of reads), could better distinguish replicates than reads within peaks (Fig. <xref rid="Fig3" ref-type="fig">3b, f</xref>
). Although we are looking for biological variation between batches, this difference could be partly driven by technical batch issues (e.g. library preparation, transposition) that also distinguish the samples. However, this is unlikely to be a complete explanation since: (1) BROCKMAN operates on sequence features alone, and (2) there are more significant PCs for reads outside of peaks (47 vs. 31), so it is not driven entirely by simple sequence features (e.g. G/C-bias).</p>
<p id="Par29">In considering repeat elements, including reads that lie within repetitive DNA is superior at grouping cells from the same sample both locally (Fig. <xref rid="Fig3" ref-type="fig">3c</xref>
) and globally (Fig. <xref rid="Fig3" ref-type="fig">3g</xref>
). Since this comparison is performed by BROCKMAN analysis of only K562 cells, any differences in grouping are unlikely to be driven by genetic polymorphisms.</p>
<p id="Par30">Using the same approach to assess the impact of gapped <italic>k</italic>
-mers (vs. ungapped ones), indicated that gapped <italic>k</italic>
-mers only improved cell grouping globally (Fig. <xref rid="Fig3" ref-type="fig">3h</xref>
), but not locally (Fig. <xref rid="Fig3" ref-type="fig">3d</xref>
). Although gapped <italic>k</italic>
-mers should better capture TF motifs with internal uninformative bases, including gaps increases computation time. Notably, there were fewer significant PCs (57 vs. 88) when using gapped <italic>k</italic>
-mers, indicating that gaps may allow for more complex relationships to be captured in fewer PCs.</p>
</sec>
<sec id="Sec6"><title>Principal components of accessible <italic>k</italic>
-mer space represent differential TF activity</title>
<p id="Par31">In identifying significant PCs [<xref ref-type="bibr" rid="CR22">22</xref>
] in the space of accessible <italic>k</italic>
-mers amongst all cells, we found 131 significant PCs, suggesting variation in the activities of individual or combinations of TFs between or within cell types. Specifically, we hypothesized that each PC may represent the differential activity of one or more correlated TFs or sets of TFs, captured by the relevant <italic>k</italic>
-mers (e.g., Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
), across cells.</p>
<p id="Par32">To identify PC-defining <italic>k</italic>
-mers, we examined the loadings of the <italic>k</italic>
-mers for each significant PC (Fig. <xref rid="Fig1" ref-type="fig">1b</xref>
), reflecting the relative contribution of each <italic>k</italic>
-mer to that PC (specifically: these are the <italic>k</italic>
-mer weights that are multiplied by standardized <italic>k</italic>
-mer frequencies to obtain the cell’s projection onto that PC). Next, we relate the different PCs to differential TF activity by classifying each <italic>k</italic>
-mer into “cognate” and “non-cognate” for each TF using both the in vitro preference of each TF to individual 8-mers as measured by Protein Binding Microarrays (PBMs) and position weight matrix (PWM) motifs derived from these same experiments and others (e.g., SELEX, ChIP-seq, etc.) [<xref ref-type="bibr" rid="CR23">23</xref>
]. Finally, we calculated the enrichment or depletion of “cognate” <italic>k</italic>
-mers among <italic>k</italic>
-mer weights for each PC using the minimum hypergeometric statistic (<xref rid="Sec11" ref-type="sec">Methods</xref>
).</p>
<p id="Par33">We applied this approach to determine differential TF activity across treated and untreated K562 cells. We performed BROCKMAN analysis of only the K562 treated and untreated cells in the two main K562 clusters (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
; “K562-treated” + “K562-untreated”), recomputing the PCs using only these cells. We found 53 significant PCs, some of which located differences between treated and untreated cells (<xref rid="Sec11" ref-type="sec">Methods</xref>
). Both in the full initial analysis and here, the three different K562 treatments (JNK inhibition, BCR-ABL kinase inhibition [Imatinib; which is upstream of JNK [<xref ref-type="bibr" rid="CR24">24</xref>
, <xref ref-type="bibr" rid="CR25">25</xref>
]], and CDK4/6 inhibition) yield similar partitioning of cells in accessible <italic>k</italic>
-mer space (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
 and <xref rid="Fig4" ref-type="fig">4a</xref>
). Since PC3 and PC5 best distinguished treated from untreated cells (Fig. <xref rid="Fig4" ref-type="fig">4a</xref>
), we examined the loadings of the <italic>k</italic>
-mers for these PCs, reflecting the relative contribution of each <italic>k</italic>
-mer to each PC (Fig. <xref rid="Fig4" ref-type="fig">4b</xref>
). Whereas some <italic>k</italic>
-mers have high loadings in both PC3 and 5 (Fig. <xref rid="Fig4" ref-type="fig">4b</xref>
 – top right quadrant of scatter plot), others are distinctly highly or lowly loaded in one PC but not the other (Fig. <xref rid="Fig4" ref-type="fig">4b</xref>
 – e.g., <italic>k</italic>
-mers recognized by both JUND and JUNB have high loadings in PC3 and low weightings in PC5).<fig id="Fig4"><label>Fig. 4</label>
<caption><p>PCs represent TF variation. <bold>a</bold>
 Partitioning cells by treatment. Shown is a projection of treated (shades of blue) and untreated (shades of pink) K562 cells onto PC 3 and 5 from BROCKMAN analysis of only K562 cells. <bold>b</bold>
 Identification of TFs associated with specific PCs. Scatter plot shows the PC weights for each 8-mer (dot) for PC 3 (x axis) and PC5 (y axis). Colored dots: <italic>k</italic>
-mers recognized by JUNB (red), JUND (blue), and both (green), with consensus JUN 7-mer shown as a pink star, as defined using PBM 8-mer Z-scores [<xref ref-type="bibr" rid="CR23">23</xref>
]; the legend (bottom right) shows PWMs derived from the same PBM 8-mer Z-scores. Side graphs show the Log2 fold enrichment of JUNB- and JUND-bound <italic>k</italic>
-mers amongst lowly-weighted PC <italic>k</italic>
-mer weights for PC 3 (bottom) and PC 5 (right)</p>
</caption>
<graphic xlink:href="12859_2018_2255_Fig4_HTML" id="MO4"></graphic>
</fig>
</p>
<p id="Par34">Relating the PCs to known specificities of human TFs, we found a large number of enriched/depleted TFs for PC3 and PC5 (107 and 37 motifs enriched or depleted in PCs 3 and 5, respectively). Two interesting examples are the AP-1 family TFs JUNB and JUND, which were enriched in PC3 and 5, respectively (Fig. <xref rid="Fig4" ref-type="fig">4b</xref>
). Even though the two PWM motifs derived from the PBM data are remarkably similar for these two factors (Fig. <xref rid="Fig4" ref-type="fig">4b</xref>
, bottom right), the PBM Z-scores on which these enrichments are based clearly distinguish these two PCs. Interestingly, these two motifs are enriched in open chromatin in cells treated with JNK inhibitors that prevent the activation of JUN by JNK (Fig. <xref rid="Fig4" ref-type="fig">4a</xref>
, lower left). AP-1 factors are known to play important roles in the cell cycle [<xref ref-type="bibr" rid="CR26">26</xref>
], consistent with our observation that CDK4/6 inhibition (CDKi) and JNK inhibition result in a very similar chromatin phenotype. However, CDKi appears to be distinguished mostly by PC5 (Fig. <xref rid="Fig4" ref-type="fig">4a</xref>
, bottom), whereas Imatinib and JNK inhibition are differentiated primarily by PC3 (Fig. <xref rid="Fig4" ref-type="fig">4a</xref>
, left), where JUNB, thought to act as a negative regulator of the cell cycle [<xref ref-type="bibr" rid="CR26">26</xref>
, <xref ref-type="bibr" rid="CR27">27</xref>
], is enriched (Fig. <xref rid="Fig4" ref-type="fig">4b</xref>
, PC3-left). Since JUNB and JUND homodimers (which these PBM Z-scores represent) are not substrates for JNK [<xref ref-type="bibr" rid="CR28">28</xref>
], the decreased stability of JUN resulting from JNK inhibition may yield more JUNB and JUND homodimers, resulting in more of these homodimer binding sites in open chromatin and inhibition of the cell cycle through increased JUNB/JUND activity [<xref ref-type="bibr" rid="CR27">27</xref>
].</p>
</sec>
<sec id="Sec7"><title>PCs capture variation in TF activity across individual cells</title>
<p id="Par35">Next, we explored TFs for variation in their inferred activity <italic>within</italic>
 a cell type, by performing BROCKMAN analysis of only the untreated K562 cells (Fig. <xref rid="Fig2" ref-type="fig">2a</xref>
 – “K562-untreated”; <xref rid="Sec11" ref-type="sec">Methods</xref>
). Of the 27 significant PCs, 13 distinguished different replicates (Additional file <xref rid="MOESM2" ref-type="media">2</xref>
: Figure S2), indicating that at least some of the variability captured on these PCs represents differences between batches. We excluded these PCs from subsequent analyses, and tested for enriched TFs the remaining 14 PCs that showed primarily cell-cell variability (<xref rid="Sec11" ref-type="sec">Methods</xref>
). Overall, 40.5% (167/412) of expressed TFs with known motifs were associated with at least one PC, but this number may be inflated because many TF binding sites are so similar.</p>
<p id="Par36">We considered some of the possible causes for the cell-cell variation in the (inferred) activity of TFs. In particular, TFs with variable activity may be more variably expressed at the RNA level, leading to cell-cell variation at the protein level, or generally lowly expressed, such that the protein level is significantly impacted by bursts of transcription. (There are, of course, other options, independent of RNA or expression levels, such as variation in upstream signaling molecules that affect the TF’s activity.) To consider the first two options, we used scRNA-seq of untreated K562 cells [<xref ref-type="bibr" rid="CR29">29</xref>
] to compare the average expression levels and variability (mean corrected coefficient of variation [CV]) in expression across single cells for our <italic>k</italic>
-mer-based “variable” and “constant” TFs.</p>
<p id="Par37">We found that the TFs that were most enriched among the PCs, and hence inferred to have the most variable activity, were expressed on average at lower levels than the least enriched TFs (Wilcoxon rank sum test <italic>P</italic>
 = 0.08; Additional file <xref rid="MOESM3" ref-type="media">3</xref>
: Figure S3a), but the two groups had a similar mean-corrected CV (Wilcoxon rank sum test <italic>P</italic>
 = 0.54; Additional file <xref rid="MOESM3" ref-type="media">3</xref>
: Figure S3b; <xref rid="Sec11" ref-type="sec">Methods</xref>
). Most TFs tend to have a low mean-corrected CV, with notable exceptions including the AP-1 proteins JUN, FOSL1, BATF, and ATF3 (Additional file <xref rid="MOESM3" ref-type="media">3</xref>
: Figure S3c).</p>
</sec>
<sec id="Sec8"><title>PCs help identify TF-TF interactions</title>
<p id="Par38">Finally, we hypothesized that different TFs that are co-enriched (or co-depleted) on the same PC could reflect dependencies or interactions between the activity of those TFs, such as cooperative binding in a complex or through one TF rendering the sites of the other accessible (Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
 – bottom). However, because many TFs have very similar specificities and are difficult to distinguish from their cognate motifs alone, we first eliminated any motifs that closely match another more highly enriched motif (<xref rid="Sec11" ref-type="sec">Methods</xref>
). This was particularly important for TFs in the AP-1 family, which share very similar motifs and were often enriched together (e.g. JUN, JUNB, JUND, FOS, FOSL1, FOSB, BATF, BACH1, ATF3, SMARCC1), and are associated with five of the 13 cell-variable PCs, often in combination with other TFs.</p>
<p id="Par39">Such analysis of individual PCs highlights putative interactions. For example, in PC13, AP-1 + SNAI3 + MAFF + SMAD3 are co-enriched (one putative interaction), whereas CTCF + NFYA are co-depleted (an opposite interaction), while PC7 represents AP-1 + IRF2/9/STAT1 (enriched) vs. HIC2 + other TFs (depleted) (Additional file <xref rid="MOESM4" ref-type="media">4</xref>
: Table S1). Some of the TFs co-enriched in the same PC are known to interact with each other physically. For instance, the AP-1 transcription factors (e.g. JUN and JUNB) are known to interact with both RUNX2 (CBFA1) [<xref ref-type="bibr" rid="CR30">30</xref>
] and SMAD3 [<xref ref-type="bibr" rid="CR31">31</xref>
] (PCs 3 and 13, respectively). In another example, interactions are also known between IRF9 and STAT1 [<xref ref-type="bibr" rid="CR32">32</xref>
] (PC7), ATF3 and JUN [<xref ref-type="bibr" rid="CR33">33</xref>
] (PC6; AP-1 motif represented by BATF motif), and the JUN factors and SPI1 (PU.1) [<xref ref-type="bibr" rid="CR34">34</xref>
, <xref ref-type="bibr" rid="CR35">35</xref>
]; (PC7; AP-1 factors represented by SMARCC1 motif). Overall, there are 2.5 times more high-confidence protein-protein interactions [<xref ref-type="bibr" rid="CR36">36</xref>
] amongst TFs that are enriched together in a PC than expected by chance (hypergeometric test <italic>P</italic>
 = 0.03, considering all possible pairs for TFs enriched/depleted in any PC).</p>
</sec>
</sec>
<sec id="Sec9"><title>Discussion</title>
<p id="Par40">BROCKMAN provides a new approach to leverage scATAC-seq data, to partition cells by distinct epigenomic landscapes, and to understand their regulatory underpinning. Since BROCKMAN does not require that peaks be called, it can potentially detect cell types that are too rare to result in a peak call. By comparing to known TF specificities, we can identify the transcriptional regulators that mediate underlying differences in chromatin. Here, we found that BROCKMAN distinguishes cell types, cycling cells, and experimental artifacts, and discovered a large number of significant PCs in all datasets analyzed, each appearing to represent one or more TFs.</p>
<p id="Par41">One possible explanation for the variation in inferred TF activity across single cells is variation in the expression of the TF between the cells, as has been previously shown by scRNA-seq, RNA-FISH, and single cell protein staining (e.g. [<xref ref-type="bibr" rid="CR37">37</xref>
–<xref ref-type="bibr" rid="CR39">39</xref>
]; reviewed in [<xref ref-type="bibr" rid="CR40">40</xref>
]). However, we found that TFs associated with cell-cell epigenomic variability across untreated K562 cells are relatively lowly expressed in all cells, but not particularly variable across cells, as reflected by scRNA-seq. One possible explanation is that variation would be more apparent post-transcriptionally, such as in protein translation, modification, or stability, either because of direct regulation of these steps or because of separation of time scales. Consistent with this possibility, low mRNA expression levels generally result in more variable (noisier) protein levels [<xref ref-type="bibr" rid="CR41">41</xref>
] since transcription or decay of a single mRNA results in greater fold differences in low-abundance genes. An alternative explanation is that a TF would show variable binding dependent on a variable co-factor, while itself not being variable (e.g. Fig. <xref rid="Fig1" ref-type="fig">1a</xref>
 - bottom).</p>
<p id="Par42">We found that reads lying outside of called peaks actually contain more information than those within peaks, in terms of defining cell clusters. This may be partly explained by the fact that the open chromatin at promoters is easily identified and comparatively stable across cells [<xref ref-type="bibr" rid="CR42">42</xref>
], leading to the motifs present in these regions having less discriminatory power. However, this is likely to be only a partial explanation since the called peaks also included many enhancers. We consider two possible further explanations: (1) dynamic enhancers are both more difficult to identify and more informative of cell state, and (2) pioneer TFs stochastically sample the genome, transiently opening potentially non-functional loci that contain their motif, similar to the previously proposed “hit and run” model, where TFs can cause transient disruption of nucleosome integrity [<xref ref-type="bibr" rid="CR43">43</xref>
].</p>
<p id="Par43">The primary axes of variation in the K562 scATAC-seq data, as reflected by the PCs, appear to represent the combined actions of multiple TFs, often known to interact physically. This may reflect cooperative binding by these TFs. Cooperative binding mediated by physical interaction between TFs (Additional file <xref rid="MOESM5" ref-type="media">5</xref>
: Figure S4) or by mutual competition with nucleosomes [<xref ref-type="bibr" rid="CR44">44</xref>
] results in a steeper binding curve, such that small changes in concentration around the critical point result in larger changes in occupancy than in a non-cooperative setting. Thus, cell-cell variability in TF concentration around this point will result in higher occupancy/accessibility variability than would be expected in the non-cooperative case.</p>
<p id="Par44">Cooperativity may also provide some insight into the prevalence of AP-1 factors in our analysis, whose binding sites were enriched in many PCs for both treatment-associated and cell-variable PCs. AP-1 TFs are bZIP TFs and can form a large number of heterodimers with other bZIP TFs [<xref ref-type="bibr" rid="CR35">35</xref>
], some of whose motifs were also found to be enriched on the same PCs as the AP-1 factors. The strong enrichment of AP-1 motifs in variable <italic>k</italic>
-mer axes associated with scATAC-seq indicates that AP-1 factors may themselves be associated with mediating chromatin accessibility. Indeed, it has been suggested previously that AP-1 factors have pioneer activity [<xref ref-type="bibr" rid="CR45">45</xref>
, <xref ref-type="bibr" rid="CR46">46</xref>
].</p>
<p id="Par45">A remaining challenge – present whenever motifs are used to infer TF binding – is the definitive identification of causal TFs when many TFs have similar motifs and the specificities of many TFs remains unknown [<xref ref-type="bibr" rid="CR23">23</xref>
]. One advantage of a <italic>k</italic>
-mer-based approach is that much of the analysis can be done without ever knowing the identities or specificities of the TFs. In this way, our knowledge deficits regarding TF binding specificities are shifted from the analysis to the interpretation stage, knowing that the specificities themselves can be captured in <italic>k</italic>
-mer space. Thus, <italic>k</italic>
-mer space could distinguish two cell types that differ by an as-yet undescribed TF, while strictly using known TF specificities could not. As we learn more about how TFs function, our interpretation of the <italic>k</italic>
-mer space will improve.</p>
<p id="Par46">Before we were able to publish BROCKMAN, a related approach, ChromVAR, was published [<xref ref-type="bibr" rid="CR47">47</xref>
]. ChromVAR depends on a set of previously defined peaks, and considers only reads occurring within these peaks [<xref ref-type="bibr" rid="CR47">47</xref>
], which, according to our analysis, may reduce its sensitivity to distinguish cell types, particularly if those are rare. It also uses ungapped 7-mers [<xref ref-type="bibr" rid="CR47">47</xref>
], which may make the detected PCs more difficult to interpret.</p>
</sec>
<sec id="Sec10"><title>Conclusions</title>
<p id="Par47">As the number of cells per experiment grows, BROCKMAN analysis may provide additional insights into chromatin regulation by allowing us to detect rare cell types, variable TFs, and TF interactions. We anticipate that BROCKMAN will also be useful in the study of other chromatin profiles collected across single cells (e.g., scChIP-seq [<xref ref-type="bibr" rid="CR8">8</xref>
]), and can also help understand variation in chromatin organization in the analysis of many bulk samples, for example, those collected across individuals in a population (e.g., [<xref ref-type="bibr" rid="CR2">2</xref>
–<xref ref-type="bibr" rid="CR7">7</xref>
]). Although other <italic>k</italic>
-mer based methods have been applied to study of variation in <italic>cis</italic>
 [<xref ref-type="bibr" rid="CR18">18</xref>
], we anticipate that the unsupervised approach of BROCKMAN will be useful in dissecting variation in <italic>trans</italic>
. With epigenomic data of ever increasing complexity, tools and approaches like these will continue to provide insight into the regulation of chromatin.</p>
</sec>
<sec id="Sec11"><title>Methods</title>
<sec id="Sec12"><title>Data processing</title>
<p id="Par48">A summary of the data processing steps and tools used is included in Additional file <xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1, and a bash pipeline for processing samples as well as an R package to facilitate analysis are available on GitHub (<ext-link ext-link-type="uri" xlink:href="https://carldeboer.github.io/brockman.html">https://carldeboer.github.io/brockman.html</ext-link>
).</p>
<p id="Par49">Data was obtained from the Gene Expression Omnibus, accession GSE65360. Samples were demultiplexed, and reads trimmed for Nextera adaptors and mapped to the human genome (hg19) using Bowtie2 [<xref ref-type="bibr" rid="CR48">48</xref>
] using paired reads (−X 2000), as described previously [<xref ref-type="bibr" rid="CR9">9</xref>
]. Regions of interest were defined as windows of 50 bp to either side of the 5′ end of mapped reads, representing the integration sites of the Tn5 transposase, merging overlapping regions (which removes duplicate reads). DNA sequences were then extracted from these loci using twoBitToFa [<xref ref-type="bibr" rid="CR49">49</xref>
] and scanned for <italic>k</italic>
-mer content using AMUSED (<ext-link ext-link-type="uri" xlink:href="https://github.com/Carldeboer/AMUSED">https://github.com/Carldeboer/AMUSED</ext-link>
), considering both DNA strands, to yield a vector of <italic>k</italic>
-mer frequencies for each cell that was used in subsequent analyses, including all gapped <italic>k</italic>
-mers from length 1 to 8. We stopped at a length of <italic>k</italic>
 = 8 because for <italic>k</italic>
 > 8 <italic>k</italic>
-mer frequencies become very sparse when analyzing as few loci per cell as are present in scATAC-seq data, although larger k may be more suitable to analysis of bulk samples. Cells with fewer than 3162 (10<sup>3.5</sup>
) distinct Tn5 integration loci were excluded from subsequent analyses to remove dead cells and cells with poor data quality.</p>
<p id="Par50">The individual cells’ <italic>k</italic>
-mer frequency vectors were merged and scaled so that each <italic>k</italic>
-mer had mean 0 and a standard deviation (SD) of 1, and this matrix was decomposed into its principal components. For all analyses, PCA was done with the prcomp R function and the number of significant PCs was estimated using the permutationPA function from the jackstraw R package [<xref ref-type="bibr" rid="CR22">22</xref>
], while the tsne R package was used for t-SNE, using the default parameters and including only the significant PCs. Because the frequencies of <italic>k</italic>
-mers of varying G + C-content are so correlated to G + C content itself, the first PC often has a significant G + C-content component and should be analysed carefully (e.g., GG tends to occur more frequently with higher G + C-content, and so the two will be correlated and both will be anticorrelated with A + T-rich <italic>k</italic>
-mers).</p>
</sec>
<sec id="Sec13"><title>Scoring cells for cell cycle signatures</title>
<p id="Par51">Using the ENCODE Repli-seq data for K562 cells [<xref ref-type="bibr" rid="CR19">19</xref>
], the genome was divided into replication domains using a percent signal cutoff of 25%, where any region with a signal greater than this cutoff was considered a domain for the respective stage of the cell cycle. ATAC-seq reads were then counted within each domain to yield a matrix of ATAC-seq read counts for each domain in each cell. This matrix was scaled by the total number of reads per cell, yielding a matrix of proportions of reads per domain per cell, and the ratio of (G2 + S1 + S2 + S3 + S4)/G1 (termed (G2 + S)/G1 above) was used to distinguish cycling cells.</p>
</sec>
<sec id="Sec14"><title>Comparing input data and analysis techniques</title>
<p id="Par52">To compare different analysis approaches (e.g., reads within or outside of peaks, reads in/outside of repetitive DNA, or gapped/ungapped <italic>k</italic>
-mers), we took the following general approach (with details for each comparison noted below). Using only K562 samples that passed quality control (see above), <italic>k</italic>
-mer frequencies were calculated given the appropriate set of scATAC-seq reads, scaled, and PCA was performed, calculating the number of significant PCs for each approach as described above. Considering only the set of significant PCs, cell-cell Euclidean distances in PC space were calculated for each pair of cells and each analysis approach. Here, we considered Euclidean distance to be most appropriate because nearby points represent cells that are also similar in <italic>k</italic>
-mer space. Using these distances, the proportion of the 20 nearest neighbors derived from the same biological samples was calculated (Fig. <xref rid="Fig3" ref-type="fig">3a</xref>
-<xref rid="Fig3" ref-type="fig">c</xref>
). Using these same cell-cell distances, the ability for distance to distinguish between cells from the same sample (positives) from those from different samples (negatives) was calculated as the Area Under the ROC Curve (AUROC; Fig. <xref rid="Fig3" ref-type="fig">3d</xref>
-<xref rid="Fig3" ref-type="fig">f</xref>
). Bootstrap <italic>P</italic>
-values were calculated by sampling 80% of cells without replacement 2001 times, considering the fraction of random samples where the AUROC was larger in one approach than the other, and correcting for a two-tailed test.</p>
<p id="Par53">In order to classify reads into those that lie within a peak (where a “peak” is defined as an enriched region formed by a cluster of reads), and those that like outside of peaks, we first defined peaks as regions that we considered to be enriched. For calling peaks on the scATAC-seq data, reads for all K562 samples were aggregated, duplicates removed using Picard Tools (MarkDuplicates) (<ext-link ext-link-type="uri" xlink:href="http://broadinstitute.github.io/picard">http://broadinstitute.github.io/picard/</ext-link>
), and only uniquely mapping read pairs were considered. Peaks were called on this aggregate data using Homer [<xref ref-type="bibr" rid="CR21">21</xref>
] (version 4.7; using “-style dnase”). For the comparison using the more densely sequenced ENCODE DNase data, peaks were defined as the previously described DNaseI-seq hot spots [<xref ref-type="bibr" rid="CR19">19</xref>
] whose coordinates were downloaded from UCSC (wgEncodeUwDnaseK562HotspotsRep1.broadPeak.gz and wgEncodeUwDnaseK562HotspotsRep2.broadPeak.gz from <ext-link ext-link-type="uri" xlink:href="http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnase/">http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnase/</ext-link>
), and the union of peaks from each replicate was used. Both DNaseI and pooled scATAC peaks were expanded by 250 bp in either direction and any scATAC reads whose corresponding transposition site (the 5′ end of each read) landed within one of these regions were considered to be in a peak. All other scATAC reads were considered to be outside of peaks. When excluding repeat regions, DNA sequence for repeat-masked regions of the genome was excluded when counting <italic>k</italic>
-mers. For comparing gapped vs. ungapped <italic>k</italic>
-mers, we compare all <italic>k</italic>
-mer frequency data (containing both gapped and ungapped <italic>k</italic>
-mers; termed “gapped”) to the subset of frequency data for only ungapped <italic>k</italic>
-mers (“ungapped”).</p>
</sec>
<sec id="Sec15"><title>Identifying PCs that distinguish treated from untreated K562 cells</title>
<p id="Par54">Every cell was “scored” by its position as it is projected onto the respective PC axis. The area under the ROC curve (AUROC) statistic and rank sum <italic>P</italic>
-value, representing how well the projected cell positions divide the cells into treated and untreated cells, were calculated, and the PCs with the AUROC furthest from 0.5 (i.e. those for which treated cells are either enriched or depleted by the PC) were considered those that segregated treated from untreated best.</p>
</sec>
<sec id="Sec16"><title>Identifying TF-specific PCs</title>
<p id="Par55">Ungapped 8-mer protein binding microarray Z-scores and position weight matrices (PWMs) for all human TFs (inferred or directly determined) were downloaded from CIS-BP [<xref ref-type="bibr" rid="CR23">23</xref>
]. For PWMs, gapped <italic>k</italic>
-mer scores were derived by finding the maximum log-odds score for that <italic>k</italic>
-mer in the PWM, considering every possible offset. These scores were then converted into Z-scores by centering them about the median and scaling them to the median absolute deviation, taking a Z-score of > 2 as “cognate” and leaving others as “non-cognate” <italic>k</italic>
-mers. For PBM Z-scores, Z-scores between experiments for the same TF were combined using Stouffer’s method and those <italic>k</italic>
-mers with a Z-score above 3 were considered “cognate”, with others “non-cognate”. In total, we considered 638 PBM-derived 8-mer motifs, and 1882 PWM motifs representing a total of 870 TFs, which were further narrowed down to those TFs (and corresponding motifs) that were expressed in K562 s [<xref ref-type="bibr" rid="CR29">29</xref>
], leaving 412 TFs.</p>
<p id="Par56">With this set of “bound” and “unbound” <italic>k</italic>
-mers for each TF, the enrichment of each TF in each PC axis was calculated using the minimum hypergeometric test [<xref ref-type="bibr" rid="CR50">50</xref>
]. Briefly, the bound and unbound <italic>k</italic>
-mers were ranked by their PC weights and, moving in increasing rank order, hypergeometric <italic>P</italic>
-values were calculated representing the enrichment of cognate <italic>k</italic>
-mers amongst the top N most highly (lowly) weighted <italic>k</italic>
-mers. Exact <italic>P</italic>
-values (considering the dependence between tests) were not calculated and instead multiple hypothesis testing correction using Bonferroni’s method was done as if the tests were independent, yielding a more conservative <italic>P</italic>
-value (to minimize the number of non-specific TF enrichments). For PBM Z-scores, only the top 3000 <italic>k</italic>
-mers were considered, while for PWM scores it was the top 15,000 <italic>k</italic>
-mers (because these also included gapped <italic>k</italic>
-mers and was approximately the same percent of all <italic>k</italic>
-mers). Only TFs expressed in K562 s were considered [<xref ref-type="bibr" rid="CR51">51</xref>
].</p>
<p id="Par57">Because many TFs share similar <italic>k</italic>
-mer binding profiles and the number of <italic>k</italic>
-mers considered for PWM motifs was so high, these appeared to have a high false positive rate and so we set the threshold for significance much lower for PWM motifs (<italic>P</italic>
 < 10<sup>− 112</sup>
) than for 8-mer Z-scores (<italic>P</italic>
 < 10<sup>− 2</sup>
). (log<sub>10</sub>
(P-values) are “inflated” with PWMs as a result of common shared submotifs and a very large number of gapped k-mers; we chose these cutoffs based on the “elbow” of the log-P-value distributions, which are similar at these values.) To eliminate redundant motifs and select only the most enriched of each group of related motifs, the most enriched (or depleted) motif was retained and any redundant motifs (<italic>k</italic>
-mer Pearson <italic>R</italic>
 > 0.5) were eliminated until all TFs were either eliminated due to redundancy or selected to represent the PC, the outcome of which is included in Additional file <xref rid="MOESM4" ref-type="media">4</xref>
: Table S1.</p>
</sec>
<sec id="Sec17"><title>Comparison to K562 single-cell RNA-seq</title>
<p id="Par58">A matrix of single cell count data was downloaded from GEO (GSE90063) for wild type K562 cells [<xref ref-type="bibr" rid="CR29">29</xref>
] and a negative binomial distribution was fit to the gene-wise mean and variance, representing a theoretical minimum variance dependent on the mean, and this was used to calculate the theoretical minimum log coefficient of variation (CV). We then subtracted the theoretical minimum CV from the observed log CV per gene to get the excess CV over that expected from its dependence on the mean (“mean-corrected CV”). We then compared the distributions of the mean-corrected CV and expression mean for TFs that had a significant enrichment among the cell-variable PCs and those that did not, using the Wilcoxon rank sum test. Cell-variable PCs excluded any PCs that significantly distinguished any replicate from the other two (Bonferroni-corrected Wilcoxon rank sum test <italic>P</italic>
 < 0.1), and also excluded PC1 because of the association with G + C content.</p>
</sec>
<sec id="Sec18"><title>TF cooperativity occupancy</title>
<p id="Par59">As described previously [<xref ref-type="bibr" rid="CR52">52</xref>
], a TF’s (<italic>x</italic>
) fractional occupancy of a single binding site (<italic>O</italic>
<sub><italic>x</italic>
</sub>
) depends on its concentration ([<italic>x</italic>
]) and the dissociation constant (<italic>Kd</italic>
<sub><italic>x</italic>
</sub>
) of its binding site in the following relationship, which represents 1 minus the probability the binding site will <italic>not</italic>
 be bound:<disp-formula id="Equa"><alternatives><tex-math id="M1">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$ {o}_x=1-\frac{1}{1+\frac{\left[x\right]}{Kdx}} $$\end{document}</tex-math>
<mml:math id="M2" display="block"><mml:msub><mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mfrac><mml:mn>1</mml:mn>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mfrac><mml:mfenced close="]" open="["><mml:mi>x</mml:mi>
</mml:mfenced>
<mml:mi mathvariant="italic">Kdx</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2018_2255_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p id="Par60">If TF <italic>x</italic>
 can also bind with a partner <italic>y</italic>
, occupancy of <italic>x</italic>
 depends on <italic>x</italic>
 binding in isolation, as before, but also binding with <italic>y</italic>
 as a <italic>xy</italic>
 heterodimer, depending on the concentration [<italic>xy</italic>
] and the <italic>Kd</italic>
<sub><italic>xy</italic>
</sub>
 of the heterodimer. At equilibrium, [<italic>xy</italic>
] = [<italic>x</italic>
][<italic>y</italic>
]<italic>Ka</italic>
<sub><italic>xy</italic>
</sub>
, where <italic>Ka</italic>
<sub><italic>xy</italic>
</sub>
 is the association constant of <italic>x</italic>
 and <italic>y</italic>
. Thus, for <italic>x</italic>
 binding to a single binding site with or without cooperative binding of <italic>y</italic>
, we have:<disp-formula id="Equb"><alternatives><tex-math id="M3">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$ {O}_x^{coop}=1-\left(\frac{1}{1+\frac{\left[x\right]\left(1-{Ka}_{xy}\left[y\right]\kern0.5em \right)}{ Kd x}}\right)\left(\frac{1}{1+\frac{\left[x\right]{Ka}_{xy}\left[y\right]}{Kd_{xy}}}\right) $$\end{document}</tex-math>
<mml:math id="M4" display="block"><mml:msubsup><mml:mi>O</mml:mi>
<mml:mi>x</mml:mi>
<mml:mtext mathvariant="italic">coop</mml:mtext>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mfenced close=")" open="("><mml:mfrac><mml:mn>1</mml:mn>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mfrac><mml:mrow><mml:mfenced close="]" open="["><mml:mi>x</mml:mi>
</mml:mfenced>
<mml:mfenced close=")" open="("><mml:mrow><mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub><mml:mi mathvariant="italic">Ka</mml:mi>
<mml:mi mathvariant="italic">xy</mml:mi>
</mml:msub>
<mml:mfenced close="]" open="["><mml:mi>y</mml:mi>
</mml:mfenced>
<mml:mspace width="0.5em"></mml:mspace>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mi mathvariant="italic">Kdx</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
<mml:mfenced close=")" open="("><mml:mfrac><mml:mn>1</mml:mn>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mfrac><mml:mrow><mml:mfenced close="]" open="["><mml:mi>x</mml:mi>
</mml:mfenced>
<mml:msub><mml:mi mathvariant="italic">Ka</mml:mi>
<mml:mi mathvariant="italic">xy</mml:mi>
</mml:msub>
<mml:mfenced close="]" open="["><mml:mi>y</mml:mi>
</mml:mfenced>
</mml:mrow>
<mml:msub><mml:mi mathvariant="italic">Kd</mml:mi>
<mml:mi mathvariant="italic">xy</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
</mml:math>
<graphic xlink:href="12859_2018_2255_Article_Equb.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p id="Par61">For simplicity, we can assume that [<italic>y</italic>
] is constant since the same logic holds if <italic>x</italic>
 and <italic>y</italic>
 are interchanged and for arbitrary [<italic>y</italic>
]. Thus, <italic>Ka</italic>
<sub><italic>xy</italic>
</sub>
[<italic>y</italic>
] is a constant corresponding to the fraction of <italic>x</italic>
 that is in <italic>xy</italic>
 form. Assuming <italic>Kd</italic>
<sub><italic>xy</italic>
</sub>
 < <italic>Kd</italic>
<sub><italic>x</italic>
</sub>
 (since <italic>xy</italic>
 has both <italic>x</italic>
 and <italic>y</italic>
 binding DNA, and so is expected to bind more tightly), as [<italic>x</italic>
] changes, this cooperative occupancy is always at least as steep as without cooperativity at concentrations yielding intermediate occupancy, regardless of choice of parameters, resulting in saturation of binding over a shorter range of [<italic>x</italic>
] with cooperative binding. Intuitively, this is because increasing [<italic>x</italic>
] increases cooperative and non-cooperative binding equally when <italic>Kd</italic>
<sub><italic>xy</italic>
</sub>
 = <italic>Kd</italic>
<sub><italic>x</italic>
</sub>
, but when <italic>Kd</italic>
<sub><italic>xy</italic>
</sub>
 < <italic>Kd</italic>
<sub><italic>x</italic>
</sub>
 cooperative binding increases more rapidly until saturation. Additional file <xref rid="MOESM5" ref-type="media">5</xref>
: Figure S4 was made assuming 1% of <italic>x</italic>
 is in <italic>xy</italic>
 form, and <italic>Kd</italic>
<sub><italic>xy</italic>
</sub>
 is 100× lower than <italic>Kd</italic>
<sub><italic>x</italic>
</sub>
.</p>
</sec>
</sec>
<sec sec-type="supplementary-material"><title>Additional files</title>
<sec id="Sec19"><p><supplementary-material content-type="local-data" id="MOESM1"><media xlink:href="12859_2018_2255_MOESM1_ESM.pdf"><label>Additional file 1:</label>
<caption><p><bold>Figure S1.</bold>
 BROCKMAN computational pipeline. A bash pipeline and other computational resources are available on GitHub (<ext-link ext-link-type="uri" xlink:href="https://carldeboer.github.io/brockman.html">https://carldeboer.github.io/brockman.html</ext-link>
). Tools/functions used for each step are indicated in brackets. (PDF 545 kb)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM2"><media xlink:href="12859_2018_2255_MOESM2_ESM.pdf"><label>Additional file 2:</label>
<caption><p><bold>Figure S2.</bold>
 PCs that distinguish replicates. Shown are the Bonferroni-corrected <italic>P</italic>
-values (<italic>y</italic>
 axis) and AUROC values (<italic>x</italic>
 axis) for how well each PC separates each untreated K562 replicate from the other two replicates. Colors indicate the replicate being compared to the other two. Red horizontal line: P-value cutoff (0.1) below which PCs were considered to separate batches.) (PDF 185 kb)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM3"><media xlink:href="12859_2018_2255_MOESM3_ESM.pdf"><label>Additional file 3:</label>
<caption><p><bold>Figure S3.</bold>
 The TFs enriched in PCs have lower expression. <bold>A, B</bold>
) CDF of the mean (population) expression (<bold>A</bold>
, <italic>x</italic>
 axis) or mean-corrected CV (<bold>B</bold>
, <italic>x</italic>
 axis; <xref rid="Sec11" ref-type="sec">Methods</xref>
) for the most (blue) and least (pink) significant TFs enriched in the PCs from a BROCKMAN analysis of untreated K562 cells. <bold>C</bold>
) The relationship between the mean expression (<italic>x</italic>
 axis) and CV (<italic>y</italic>
 axis) for all genes in WT K562 data (dots). Names of TFs with the highest mean-corrected CV are labeled and AP-1 factors are bolded. Pink, blue: TFs with least and most significant PC enrichment. (PDF 200 kb)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM4"><media xlink:href="12859_2018_2255_MOESM4_ESM.docx"><label>Additional file 4:</label>
<caption><p><bold>Table S1.</bold>
 Summary of TFs associated with the different untreated K562 cell-variable PCs. TFs are listed in decreasing order of enrichment significance, with TFs filtered for redundancy between motifs as described in the <xref rid="Sec11" ref-type="sec">Methods</xref>
. Interacting TFs are not indicated and examples given in the text are for illustrative purposes. (DOCX 16 kb)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM5"><media xlink:href="12859_2018_2255_MOESM5_ESM.pdf"><label>Additional file 5:</label>
<caption><p><bold>Figure S4.</bold>
 Cooperativity between TFs results in steeper binding curves. The predicted fractional TF occupancy (<italic>y</italic>
 axis) for a given concentration of the TF (<italic>x</italic>
 axis), when the concentration of the cooperatively-interacting TF is constant. The two binding curves are aligned at 50% occupancy to emphasize the differences in the slopes. Modeling was done as described in <xref rid="Sec11" ref-type="sec">Methods</xref>
. (PDF 1969 kb)</p>
</caption>
</media>
</supplementary-material>
</p>
</sec>
</sec>
</body>
<back><glossary><title>Abbreviations</title>
<def-list><def-item><term>AUROC</term>
<def><p id="Par4">Area Under the ROC Curve</p>
</def>
</def-item>
<def-item><term>BROCKMAN</term>
<def><p id="Par5">Brockman Representation Of Chromatin by K-mers in Mark-Associated Nucleotides</p>
</def>
</def-item>
<def-item><term>CV</term>
<def><p id="Par6">mean corrected coefficient of variation</p>
</def>
</def-item>
<def-item><term>PBMs</term>
<def><p id="Par7">Protein Binding Microarrays</p>
</def>
</def-item>
<def-item><term>PCA</term>
<def><p id="Par8">principal component analysis</p>
</def>
</def-item>
<def-item><term>PCs</term>
<def><p id="Par9">principal components</p>
</def>
</def-item>
<def-item><term>PWMs</term>
<def><p id="Par10">position weight matrices</p>
</def>
</def-item>
<def-item><term>scATAC-seq</term>
<def><p id="Par11">single-cell ATAC-seq</p>
</def>
</def-item>
<def-item><term>SD</term>
<def><p id="Par12">standard deviation</p>
</def>
</def-item>
<def-item><term>TF</term>
<def><p id="Par13">transcription factor</p>
</def>
</def-item>
<def-item><term>t-SNE</term>
<def><p id="Par14">t-stochastic neighborhood embedding</p>
</def>
</def-item>
</def-list>
</glossary>
<fn-group><fn><p><bold>Electronic supplementary material</bold>
</p>
<p>The online version of this article (10.1186/s12859-018-2255-6) contains supplementary material, which is available to authorized users.</p>
</fn>
</fn-group>
<ack><title>Acknowledgements</title>
<p>We thank Jason D. Buenrostro, Christine S. Cheng, Marcin Tabaka, Nir Friedman, and Karthik Shekhar for helpful discussions and careful review of the manuscript, Atray Dixit and Karthik Shekhar for help with the K562 scRNA-seq data, Leslie Gaffney for help with figures, and William J. Greenleaf for helpful discussions.</p>
<sec id="FPar1"><title>Funding</title>
<p id="Par62">CGD is supported by a Canadian Institutes for Health Research Fellowship. AR is an HHMI Investigator. Work was supported by a CEGS grant from NHGRI and by HHMI.</p>
</sec>
<sec id="FPar2"><title>Availability of data and materials</title>
<p id="Par63">Computational pipelines (bash), and the BROCKMAN R package are available on the BROCKMAN GitHub project (<ext-link ext-link-type="uri" xlink:href="https://carldeboer.github.io/brockman.html">https://carldeboer.github.io/brockman.html</ext-link>
) under GPL v3. Datasets analyzed are available from GEO under accession numbers GSE90063 [<xref ref-type="bibr" rid="CR29">29</xref>
] and GSE65360 [<xref ref-type="bibr" rid="CR9">9</xref>
], and from the CIS-BP database (v1.02; <ext-link ext-link-type="uri" xlink:href="http://cisbp.ccbr.utoronto.ca/">http://cisbp.ccbr.utoronto.ca/</ext-link>
) [<xref ref-type="bibr" rid="CR23">23</xref>
].</p>
</sec>
</ack>
<notes notes-type="author-contribution"><title>Authors’ contributions</title>
<p>CGD and AR wrote the manuscript, and CGD analyzed the data. All authors read and approved the final manuscript.</p>
</notes>
<notes notes-type="COI-statement"><sec id="FPar3"><title>Ethics approval and consent to participate</title>
<p id="Par64">Not applicable.</p>
</sec>
<sec id="FPar4"><title>Consent for publication</title>
<p id="Par65">Not applicable.</p>
</sec>
<sec id="FPar5"><title>Competing interests</title>
<p id="Par66">AR is a member of the Scientific Advisory Board of ThermoFisher Scientific, Driver Group and Syros Pharmaceuticals.</p>
</sec>
</notes>
<ref-list id="Bib1"><title>References</title>
<ref id="CR1"><label>1.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Magnani</surname>
<given-names>L</given-names>
</name>
<name><surname>Eeckhoute</surname>
<given-names>J</given-names>
</name>
<name><surname>Lupien</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Pioneer factors: directing transcriptional regulators within the chromatin environment</article-title>
<source>Trends in genetics: TIG</source>
<year>2011</year>
<volume>27</volume>
<issue>11</issue>
<fpage>465</fpage>
<lpage>474</lpage>
<pub-id pub-id-type="doi">10.1016/j.tig.2011.07.002</pub-id>
<pub-id pub-id-type="pmid">21885149</pub-id>
</element-citation>
</ref>
<ref id="CR2"><label>2.</label>
<mixed-citation publication-type="other">Sui WG, He HY, Yan Q, Chen JJ, Zhang RH, Dai Y: ChIP-seq analysis of histone H3K9 trimethylation in peripheral blood mononuclear cells of membranous nephropathy patients. Brazilian journal of medical and biological research = Revista brasileira de pesquisas medicas e biologicas/Sociedade Brasileira de Biofisica [et al] 2014, 47(1):42–49.</mixed-citation>
</ref>
<ref id="CR3"><label>3.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sui</surname>
<given-names>W</given-names>
</name>
<name><surname>He</surname>
<given-names>H</given-names>
</name>
<name><surname>Yan</surname>
<given-names>Q</given-names>
</name>
<name><surname>Chen</surname>
<given-names>J</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>R</given-names>
</name>
<name><surname>Dai</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Genome-wide analysis of histone H3 lysine9 trimethylation by ChIP-seq in peripheral blood mononuclear cells of uremia patients</article-title>
<source>Hemodialysis international International Symposium on Home Hemodialysis</source>
<year>2013</year>
<volume>17</volume>
<issue>4</issue>
<fpage>493</fpage>
<lpage>501</lpage>
<pub-id pub-id-type="pmid">23621585</pub-id>
</element-citation>
</ref>
<ref id="CR4"><label>4.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rendeiro</surname>
<given-names>AF</given-names>
</name>
<name><surname>Schmidl</surname>
<given-names>C</given-names>
</name>
<name><surname>Strefford</surname>
<given-names>JC</given-names>
</name>
<name><surname>Walewska</surname>
<given-names>R</given-names>
</name>
<name><surname>Davis</surname>
<given-names>Z</given-names>
</name>
<name><surname>Farlik</surname>
<given-names>M</given-names>
</name>
<name><surname>Oscier</surname>
<given-names>D</given-names>
</name>
<name><surname>Bock</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks</article-title>
<source>Nat Commun</source>
<year>2016</year>
<volume>7</volume>
<fpage>11938</fpage>
<pub-id pub-id-type="doi">10.1038/ncomms11938</pub-id>
<pub-id pub-id-type="pmid">27346425</pub-id>
</element-citation>
</ref>
<ref id="CR5"><label>5.</label>
<mixed-citation publication-type="other">Cheng CS, Gate RE, Aiden AP, Siba A, Tabaka M, Lituiev D, Machol I, Subramaniam M, Shammim M, Hougen KL, et al. Genetic determinants of chromatin accessibility and gene regulation in T cell activation across human individuals. bioRxiv. 2016;</mixed-citation>
</ref>
<ref id="CR6"><label>6.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname>
<given-names>W</given-names>
</name>
<name><surname>Poschmann</surname>
<given-names>J</given-names>
</name>
<name><surname>Cruz-Herrera Del Rosario</surname>
<given-names>R</given-names>
</name>
<name><surname>Parikshak</surname>
<given-names>NN</given-names>
</name>
<name><surname>Hajan</surname>
<given-names>HS</given-names>
</name>
<name><surname>Kumar</surname>
<given-names>V</given-names>
</name>
<name><surname>Ramasamy</surname>
<given-names>R</given-names>
</name>
<name><surname>Belgard</surname>
<given-names>TG</given-names>
</name>
<name><surname>Elanggovan</surname>
<given-names>B</given-names>
</name>
<name><surname>Wong</surname>
<given-names>CC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Histone Acetylome-wide association study of autism Spectrum disorder</article-title>
<source>Cell</source>
<year>2016</year>
<volume>167</volume>
<issue>5</issue>
<fpage>1385</fpage>
<lpage>1397</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2016.10.031</pub-id>
<pub-id pub-id-type="pmid">27863250</pub-id>
</element-citation>
</ref>
<ref id="CR7"><label>7.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname>
<given-names>L</given-names>
</name>
<name><surname>Ge</surname>
<given-names>B</given-names>
</name>
<name><surname>Casale</surname>
<given-names>FP</given-names>
</name>
<name><surname>Vasquez</surname>
<given-names>L</given-names>
</name>
<name><surname>Kwan</surname>
<given-names>T</given-names>
</name>
<name><surname>Garrido-Martin</surname>
<given-names>D</given-names>
</name>
<name><surname>Watt</surname>
<given-names>S</given-names>
</name>
<name><surname>Yan</surname>
<given-names>Y</given-names>
</name>
<name><surname>Kundu</surname>
<given-names>K</given-names>
</name>
<name><surname>Ecker</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genetic drivers of epigenetic and transcriptional variation in human immune cells</article-title>
<source>Cell</source>
<year>2016</year>
<volume>167</volume>
<issue>5</issue>
<fpage>1398</fpage>
<lpage>1414</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2016.10.026</pub-id>
<pub-id pub-id-type="pmid">27863251</pub-id>
</element-citation>
</ref>
<ref id="CR8"><label>8.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rotem</surname>
<given-names>A</given-names>
</name>
<name><surname>Ram</surname>
<given-names>O</given-names>
</name>
<name><surname>Shoresh</surname>
<given-names>N</given-names>
</name>
<name><surname>Sperling</surname>
<given-names>RA</given-names>
</name>
<name><surname>Goren</surname>
<given-names>A</given-names>
</name>
<name><surname>Weitz</surname>
<given-names>DA</given-names>
</name>
<name><surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
</person-group>
<article-title>Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state</article-title>
<source>Nat Biotechnol</source>
<year>2015</year>
<volume>33</volume>
<issue>11</issue>
<fpage>1165</fpage>
<lpage>1172</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.3383</pub-id>
<pub-id pub-id-type="pmid">26458175</pub-id>
</element-citation>
</ref>
<ref id="CR9"><label>9.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Buenrostro</surname>
<given-names>JD</given-names>
</name>
<name><surname>Wu</surname>
<given-names>B</given-names>
</name>
<name><surname>Litzenburger</surname>
<given-names>UM</given-names>
</name>
<name><surname>Ruff</surname>
<given-names>D</given-names>
</name>
<name><surname>Gonzales</surname>
<given-names>ML</given-names>
</name>
<name><surname>Snyder</surname>
<given-names>MP</given-names>
</name>
<name><surname>Chang</surname>
<given-names>HY</given-names>
</name>
<name><surname>Greenleaf</surname>
<given-names>WJ</given-names>
</name>
</person-group>
<article-title>Single-cell chromatin accessibility reveals principles of regulatory variation</article-title>
<source>Nature</source>
<year>2015</year>
<volume>523</volume>
<issue>7561</issue>
<fpage>486</fpage>
<lpage>490</lpage>
<pub-id pub-id-type="doi">10.1038/nature14590</pub-id>
<pub-id pub-id-type="pmid">26083756</pub-id>
</element-citation>
</ref>
<ref id="CR10"><label>10.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cusanovich</surname>
<given-names>DA</given-names>
</name>
<name><surname>Daza</surname>
<given-names>R</given-names>
</name>
<name><surname>Adey</surname>
<given-names>A</given-names>
</name>
<name><surname>Pliner</surname>
<given-names>HA</given-names>
</name>
<name><surname>Christiansen</surname>
<given-names>L</given-names>
</name>
<name><surname>Gunderson</surname>
<given-names>KL</given-names>
</name>
<name><surname>Steemers</surname>
<given-names>FJ</given-names>
</name>
<name><surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name><surname>Shendure</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing</article-title>
<source>Science</source>
<year>2015</year>
<volume>348</volume>
<issue>6237</issue>
<fpage>910</fpage>
<lpage>914</lpage>
<pub-id pub-id-type="doi">10.1126/science.aab1601</pub-id>
<pub-id pub-id-type="pmid">25953818</pub-id>
</element-citation>
</ref>
<ref id="CR11"><label>11.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname>
<given-names>W</given-names>
</name>
<name><surname>Tang</surname>
<given-names>Q</given-names>
</name>
<name><surname>Wan</surname>
<given-names>M</given-names>
</name>
<name><surname>Cui</surname>
<given-names>K</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name><surname>Ren</surname>
<given-names>G</given-names>
</name>
<name><surname>Ni</surname>
<given-names>B</given-names>
</name>
<name><surname>Sklar</surname>
<given-names>J</given-names>
</name>
<name><surname>Przytycka</surname>
<given-names>TM</given-names>
</name>
<name><surname>Childs</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples</article-title>
<source>Nature</source>
<year>2015</year>
<volume>528</volume>
<issue>7580</issue>
<fpage>142</fpage>
<lpage>146</lpage>
<pub-id pub-id-type="pmid">26605532</pub-id>
</element-citation>
</ref>
<ref id="CR12"><label>12.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Clark</surname>
<given-names>SJ</given-names>
</name>
<name><surname>Lee</surname>
<given-names>HJ</given-names>
</name>
<name><surname>Smallwood</surname>
<given-names>SA</given-names>
</name>
<name><surname>Kelsey</surname>
<given-names>G</given-names>
</name>
<name><surname>Reik</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity</article-title>
<source>Genome Biol</source>
<year>2016</year>
<volume>17</volume>
<fpage>72</fpage>
<pub-id pub-id-type="doi">10.1186/s13059-016-0944-x</pub-id>
<pub-id pub-id-type="pmid">27091476</pub-id>
</element-citation>
</ref>
<ref id="CR13"><label>13.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
</person-group>
<article-title>Identification of human gene core promoters in silico</article-title>
<source>Genome Res</source>
<year>1998</year>
<volume>8</volume>
<issue>3</issue>
<fpage>319</fpage>
<lpage>326</lpage>
<pub-id pub-id-type="doi">10.1101/gr.8.3.319</pub-id>
<pub-id pub-id-type="pmid">9521935</pub-id>
</element-citation>
</ref>
<ref id="CR14"><label>14.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
<name><surname>Knudsen</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation</article-title>
<source>Bioinformatics</source>
<year>2000</year>
<volume>16</volume>
<issue>4</issue>
<fpage>326</fpage>
<lpage>333</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/16.4.326</pub-id>
<pub-id pub-id-type="pmid">10869030</pub-id>
</element-citation>
</ref>
<ref id="CR15"><label>15.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Blanchette</surname>
<given-names>M</given-names>
</name>
<name><surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Discovery of regulatory elements by a computational method for phylogenetic footprinting</article-title>
<source>Genome Res</source>
<year>2002</year>
<volume>12</volume>
<issue>5</issue>
<fpage>739</fpage>
<lpage>748</lpage>
<pub-id pub-id-type="doi">10.1101/gr.6902</pub-id>
<pub-id pub-id-type="pmid">11997340</pub-id>
</element-citation>
</ref>
<ref id="CR16"><label>16.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Setty</surname>
<given-names>M</given-names>
</name>
<name><surname>Leslie</surname>
<given-names>CS</given-names>
</name>
</person-group>
<article-title>SeqGL identifies context-dependent binding signals in genome-wide regulatory element maps</article-title>
<source>PLoS Comput Biol</source>
<year>2015</year>
<volume>11</volume>
<issue>5</issue>
<fpage>e1004271</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1004271</pub-id>
<pub-id pub-id-type="pmid">26016777</pub-id>
</element-citation>
</ref>
<ref id="CR17"><label>17.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ghandi</surname>
<given-names>M</given-names>
</name>
<name><surname>Lee</surname>
<given-names>D</given-names>
</name>
<name><surname>Mohammad-Noori</surname>
<given-names>M</given-names>
</name>
<name><surname>Beer</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Enhanced regulatory sequence prediction using gapped k-mer features</article-title>
<source>PLoS Comput Biol</source>
<year>2014</year>
<volume>10</volume>
<issue>7</issue>
<fpage>e1003711</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1003711</pub-id>
<pub-id pub-id-type="pmid">25033408</pub-id>
</element-citation>
</ref>
<ref id="CR18"><label>18.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname>
<given-names>D</given-names>
</name>
<name><surname>Gorkin</surname>
<given-names>DU</given-names>
</name>
<name><surname>Baker</surname>
<given-names>M</given-names>
</name>
<name><surname>Strober</surname>
<given-names>BJ</given-names>
</name>
<name><surname>Asoni</surname>
<given-names>AL</given-names>
</name>
<name><surname>McCallion</surname>
<given-names>AS</given-names>
</name>
<name><surname>Beer</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>A method to predict the impact of regulatory variants from DNA sequence</article-title>
<source>Nat Genet</source>
<year>2015</year>
<volume>47</volume>
<issue>8</issue>
<fpage>955</fpage>
<lpage>961</lpage>
<pub-id pub-id-type="doi">10.1038/ng.3331</pub-id>
<pub-id pub-id-type="pmid">26075791</pub-id>
</element-citation>
</ref>
<ref id="CR19"><label>19.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Consortium</surname>
<given-names>EP</given-names>
</name>
</person-group>
<article-title>An integrated encyclopedia of DNA elements in the human genome</article-title>
<source>Nature</source>
<year>2012</year>
<volume>489</volume>
<issue>7414</issue>
<fpage>57</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="doi">10.1038/nature11247</pub-id>
<pub-id pub-id-type="pmid">22955616</pub-id>
</element-citation>
</ref>
<ref id="CR20"><label>20.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goke</surname>
<given-names>J</given-names>
</name>
<name><surname>Ng</surname>
<given-names>HH</given-names>
</name>
</person-group>
<article-title>CTRL+INSERT: retrotransposons and their contribution to regulation and innovation of the transcriptome</article-title>
<source>EMBO Rep</source>
<year>2016</year>
<volume>17</volume>
<issue>8</issue>
<fpage>1131</fpage>
<lpage>1144</lpage>
<pub-id pub-id-type="doi">10.15252/embr.201642743</pub-id>
<pub-id pub-id-type="pmid">27402545</pub-id>
</element-citation>
</ref>
<ref id="CR21"><label>21.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Heinz</surname>
<given-names>S</given-names>
</name>
<name><surname>Benner</surname>
<given-names>C</given-names>
</name>
<name><surname>Spann</surname>
<given-names>N</given-names>
</name>
<name><surname>Bertolino</surname>
<given-names>E</given-names>
</name>
<name><surname>Lin</surname>
<given-names>YC</given-names>
</name>
<name><surname>Laslo</surname>
<given-names>P</given-names>
</name>
<name><surname>Cheng</surname>
<given-names>JX</given-names>
</name>
<name><surname>Murre</surname>
<given-names>C</given-names>
</name>
<name><surname>Singh</surname>
<given-names>H</given-names>
</name>
<name><surname>Glass</surname>
<given-names>CK</given-names>
</name>
</person-group>
<article-title>Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities</article-title>
<source>Mol Cell</source>
<year>2010</year>
<volume>38</volume>
<issue>4</issue>
<fpage>576</fpage>
<lpage>589</lpage>
<pub-id pub-id-type="doi">10.1016/j.molcel.2010.05.004</pub-id>
<pub-id pub-id-type="pmid">20513432</pub-id>
</element-citation>
</ref>
<ref id="CR22"><label>22.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chung</surname>
<given-names>NC</given-names>
</name>
<name><surname>Storey</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>Statistical significance of variables driving systematic variation in high-dimensional data</article-title>
<source>Bioinformatics</source>
<year>2015</year>
<volume>31</volume>
<issue>4</issue>
<fpage>545</fpage>
<lpage>554</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btu674</pub-id>
<pub-id pub-id-type="pmid">25336500</pub-id>
</element-citation>
</ref>
<ref id="CR23"><label>23.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Weirauch</surname>
<given-names>MT</given-names>
</name>
<name><surname>Yang</surname>
<given-names>A</given-names>
</name>
<name><surname>Albu</surname>
<given-names>M</given-names>
</name>
<name><surname>Cote</surname>
<given-names>AG</given-names>
</name>
<name><surname>Montenegro-Montero</surname>
<given-names>A</given-names>
</name>
<name><surname>Drewe</surname>
<given-names>P</given-names>
</name>
<name><surname>Najafabadi</surname>
<given-names>HS</given-names>
</name>
<name><surname>Lambert</surname>
<given-names>SA</given-names>
</name>
<name><surname>Mann</surname>
<given-names>I</given-names>
</name>
<name><surname>Cook</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Determination and inference of eukaryotic transcription factor sequence specificity</article-title>
<source>Cell</source>
<year>2014</year>
<volume>158</volume>
<issue>6</issue>
<fpage>1431</fpage>
<lpage>1443</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2014.08.009</pub-id>
<pub-id pub-id-type="pmid">25215497</pub-id>
</element-citation>
</ref>
<ref id="CR24"><label>24.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Deininger</surname>
<given-names>MW</given-names>
</name>
<name><surname>Goldman</surname>
<given-names>JM</given-names>
</name>
<name><surname>Melo</surname>
<given-names>JV</given-names>
</name>
</person-group>
<article-title>The molecular biology of chronic myeloid leukemia</article-title>
<source>Blood</source>
<year>2000</year>
<volume>96</volume>
<issue>10</issue>
<fpage>3343</fpage>
<lpage>3356</lpage>
<pub-id pub-id-type="pmid">11071626</pub-id>
</element-citation>
</ref>
<ref id="CR25"><label>25.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Raitano</surname>
<given-names>AB</given-names>
</name>
<name><surname>Halpern</surname>
<given-names>JR</given-names>
</name>
<name><surname>Hambuch</surname>
<given-names>TM</given-names>
</name>
<name><surname>Sawyers</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>The Bcr-Abl leukemia oncogene activates Jun kinase and requires Jun for transformation</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>1995</year>
<volume>92</volume>
<issue>25</issue>
<fpage>11746</fpage>
<lpage>11750</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.92.25.11746</pub-id>
<pub-id pub-id-type="pmid">8524841</pub-id>
</element-citation>
</ref>
<ref id="CR26"><label>26.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shaulian</surname>
<given-names>E</given-names>
</name>
<name><surname>Karin</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>AP-1 as a regulator of cell life and death</article-title>
<source>Nat Cell Biol</source>
<year>2002</year>
<volume>4</volume>
<issue>5</issue>
<fpage>E131</fpage>
<lpage>E136</lpage>
<pub-id pub-id-type="doi">10.1038/ncb0502-e131</pub-id>
<pub-id pub-id-type="pmid">11988758</pub-id>
</element-citation>
</ref>
<ref id="CR27"><label>27.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hess</surname>
<given-names>J</given-names>
</name>
<name><surname>Angel</surname>
<given-names>P</given-names>
</name>
<name><surname>Schorpp-Kistner</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>AP-1 subunits: quarrel and harmony among siblings</article-title>
<source>J Cell Sci</source>
<year>2004</year>
<volume>117</volume>
<issue>Pt 25</issue>
<fpage>5965</fpage>
<lpage>5973</lpage>
<pub-id pub-id-type="doi">10.1242/jcs.01589</pub-id>
<pub-id pub-id-type="pmid">15564374</pub-id>
</element-citation>
</ref>
<ref id="CR28"><label>28.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Karin</surname>
<given-names>M</given-names>
</name>
<name><surname>Liu</surname>
<given-names>Z</given-names>
</name>
<name><surname>Zandi</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>AP-1 function and regulation</article-title>
<source>Curr Opin Cell Biol</source>
<year>1997</year>
<volume>9</volume>
<issue>2</issue>
<fpage>240</fpage>
<lpage>246</lpage>
<pub-id pub-id-type="doi">10.1016/S0955-0674(97)80068-3</pub-id>
<pub-id pub-id-type="pmid">9069263</pub-id>
</element-citation>
</ref>
<ref id="CR29"><label>29.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dixit</surname>
<given-names>A</given-names>
</name>
<name><surname>Parnas</surname>
<given-names>O</given-names>
</name>
<name><surname>Li</surname>
<given-names>B</given-names>
</name>
<name><surname>Chen</surname>
<given-names>J</given-names>
</name>
<name><surname>Fulco</surname>
<given-names>CP</given-names>
</name>
<name><surname>Jerby-Arnon</surname>
<given-names>L</given-names>
</name>
<name><surname>Marjanovic</surname>
<given-names>ND</given-names>
</name>
<name><surname>Dionne</surname>
<given-names>D</given-names>
</name>
<name><surname>Burks</surname>
<given-names>T</given-names>
</name>
<name><surname>Raychowdhury</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens</article-title>
<source>Cell</source>
<year>2016</year>
<volume>167</volume>
<issue>7</issue>
<fpage>1853</fpage>
<lpage>1866</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2016.11.038</pub-id>
<pub-id pub-id-type="pmid">27984732</pub-id>
</element-citation>
</ref>
<ref id="CR30"><label>30.</label>
<mixed-citation publication-type="other">D'Alonzo RC, Selvamurugan N, Karsenty G, Partridge NC: Physical interaction of the activator protein-1 factors c-Fos and c-Jun with Cbfa1 for collagenase-3 promoter activation. <italic>The</italic>
 J Biol Chem 2002, 277(1):816–822.</mixed-citation>
</ref>
<ref id="CR31"><label>31.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liberati</surname>
<given-names>NT</given-names>
</name>
<name><surname>Datto</surname>
<given-names>MB</given-names>
</name>
<name><surname>Frederick</surname>
<given-names>JP</given-names>
</name>
<name><surname>Shen</surname>
<given-names>X</given-names>
</name>
<name><surname>Wong</surname>
<given-names>C</given-names>
</name>
<name><surname>Rougier-Chapman</surname>
<given-names>EM</given-names>
</name>
<name><surname>Wang</surname>
<given-names>XF</given-names>
</name>
</person-group>
<article-title>Smads bind directly to the Jun family of AP-1 transcription factors</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>1999</year>
<volume>96</volume>
<issue>9</issue>
<fpage>4844</fpage>
<lpage>4849</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.96.9.4844</pub-id>
<pub-id pub-id-type="pmid">10220381</pub-id>
</element-citation>
</ref>
<ref id="CR32"><label>32.</label>
<mixed-citation publication-type="other">Horvath CM, Stark GR, Kerr IM, Darnell JE, Jr.: Interactions between STAT and non-STAT proteins in the interferon-stimulated gene factor 3 transcription complex. Mol Cell Biol 1996, 16(12):6957–6964.</mixed-citation>
</ref>
<ref id="CR33"><label>33.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hai</surname>
<given-names>T</given-names>
</name>
<name><surname>Curran</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Cross-family dimerization of transcription factors Fos/Jun and ATF/CREB alters DNA binding specificity</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>1991</year>
<volume>88</volume>
<issue>9</issue>
<fpage>3720</fpage>
<lpage>3724</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.88.9.3720</pub-id>
<pub-id pub-id-type="pmid">1827203</pub-id>
</element-citation>
</ref>
<ref id="CR34"><label>34.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bassuk</surname>
<given-names>AG</given-names>
</name>
<name><surname>Leiden</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>A direct physical association between ETS and AP-1 transcription factors in normal human T cells</article-title>
<source>Immunity</source>
<year>1995</year>
<volume>3</volume>
<issue>2</issue>
<fpage>223</fpage>
<lpage>237</lpage>
<pub-id pub-id-type="doi">10.1016/1074-7613(95)90092-6</pub-id>
<pub-id pub-id-type="pmid">7648395</pub-id>
</element-citation>
</ref>
<ref id="CR35"><label>35.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chinenov</surname>
<given-names>Y</given-names>
</name>
<name><surname>Kerppola</surname>
<given-names>TK</given-names>
</name>
</person-group>
<article-title>Close encounters of many kinds: Fos-Jun interactions that mediate transcription regulatory specificity</article-title>
<source>Oncogene</source>
<year>2001</year>
<volume>20</volume>
<issue>19</issue>
<fpage>2438</fpage>
<lpage>2452</lpage>
<pub-id pub-id-type="doi">10.1038/sj.onc.1204385</pub-id>
<pub-id pub-id-type="pmid">11402339</pub-id>
</element-citation>
</ref>
<ref id="CR36"><label>36.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rolland</surname>
<given-names>T</given-names>
</name>
<name><surname>Tasan</surname>
<given-names>M</given-names>
</name>
<name><surname>Charloteaux</surname>
<given-names>B</given-names>
</name>
<name><surname>Pevzner</surname>
<given-names>SJ</given-names>
</name>
<name><surname>Zhong</surname>
<given-names>Q</given-names>
</name>
<name><surname>Sahni</surname>
<given-names>N</given-names>
</name>
<name><surname>Yi</surname>
<given-names>S</given-names>
</name>
<name><surname>Lemmens</surname>
<given-names>I</given-names>
</name>
<name><surname>Fontanillo</surname>
<given-names>C</given-names>
</name>
<name><surname>Mosca</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A proteome-scale map of the human interactome network</article-title>
<source>Cell</source>
<year>2014</year>
<volume>159</volume>
<issue>5</issue>
<fpage>1212</fpage>
<lpage>1226</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2014.10.050</pub-id>
<pub-id pub-id-type="pmid">25416956</pub-id>
</element-citation>
</ref>
<ref id="CR37"><label>37.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shalek</surname>
<given-names>AK</given-names>
</name>
<name><surname>Satija</surname>
<given-names>R</given-names>
</name>
<name><surname>Adiconis</surname>
<given-names>X</given-names>
</name>
<name><surname>Gertner</surname>
<given-names>RS</given-names>
</name>
<name><surname>Gaublomme</surname>
<given-names>JT</given-names>
</name>
<name><surname>Raychowdhury</surname>
<given-names>R</given-names>
</name>
<name><surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name><surname>Yosef</surname>
<given-names>N</given-names>
</name>
<name><surname>Malboeuf</surname>
<given-names>C</given-names>
</name>
<name><surname>Lu</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells</article-title>
<source>Nature</source>
<year>2013</year>
<volume>498</volume>
<issue>7453</issue>
<fpage>236</fpage>
<lpage>240</lpage>
<pub-id pub-id-type="doi">10.1038/nature12172</pub-id>
<pub-id pub-id-type="pmid">23685454</pub-id>
</element-citation>
</ref>
<ref id="CR38"><label>38.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name><surname>Cacchiarelli</surname>
<given-names>D</given-names>
</name>
<name><surname>Grimsby</surname>
<given-names>J</given-names>
</name>
<name><surname>Pokharel</surname>
<given-names>P</given-names>
</name>
<name><surname>Li</surname>
<given-names>S</given-names>
</name>
<name><surname>Morse</surname>
<given-names>M</given-names>
</name>
<name><surname>Lennon</surname>
<given-names>NJ</given-names>
</name>
<name><surname>Livak</surname>
<given-names>KJ</given-names>
</name>
<name><surname>Mikkelsen</surname>
<given-names>TS</given-names>
</name>
<name><surname>Rinn</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells</article-title>
<source>Nat Biotechnol</source>
<year>2014</year>
<volume>32</volume>
<issue>4</issue>
<fpage>381</fpage>
<lpage>386</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2859</pub-id>
<pub-id pub-id-type="pmid">24658644</pub-id>
</element-citation>
</ref>
<ref id="CR39"><label>39.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Warren</surname>
<given-names>L</given-names>
</name>
<name><surname>Bryder</surname>
<given-names>D</given-names>
</name>
<name><surname>Weissman</surname>
<given-names>IL</given-names>
</name>
<name><surname>Quake</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Transcription factor profiling in individual hematopoietic progenitors by digital RT-PCR</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>2006</year>
<volume>103</volume>
<issue>47</issue>
<fpage>17807</fpage>
<lpage>17812</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0608512103</pub-id>
<pub-id pub-id-type="pmid">17098862</pub-id>
</element-citation>
</ref>
<ref id="CR40"><label>40.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tanay</surname>
<given-names>A</given-names>
</name>
<name><surname>Regev</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Scaling single-cell genomics from phenomenology to mechanism</article-title>
<source>Nature</source>
<year>2017</year>
<volume>541</volume>
<issue>7637</issue>
<fpage>331</fpage>
<lpage>338</lpage>
<pub-id pub-id-type="doi">10.1038/nature21350</pub-id>
<pub-id pub-id-type="pmid">28102262</pub-id>
</element-citation>
</ref>
<ref id="CR41"><label>41.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Taniguchi</surname>
<given-names>Y</given-names>
</name>
<name><surname>Choi</surname>
<given-names>PJ</given-names>
</name>
<name><surname>Li</surname>
<given-names>GW</given-names>
</name>
<name><surname>Chen</surname>
<given-names>H</given-names>
</name>
<name><surname>Babu</surname>
<given-names>M</given-names>
</name>
<name><surname>Hearn</surname>
<given-names>J</given-names>
</name>
<name><surname>Emili</surname>
<given-names>A</given-names>
</name>
<name><surname>Xie</surname>
<given-names>XS</given-names>
</name>
</person-group>
<article-title>Quantifying E. Coli proteome and transcriptome with single-molecule sensitivity in single cells</article-title>
<source>Science</source>
<year>2010</year>
<volume>329</volume>
<issue>5991</issue>
<fpage>533</fpage>
<lpage>538</lpage>
<pub-id pub-id-type="doi">10.1126/science.1188308</pub-id>
<pub-id pub-id-type="pmid">20671182</pub-id>
</element-citation>
</ref>
<ref id="CR42"><label>42.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Corces</surname>
<given-names>MR</given-names>
</name>
<name><surname>Buenrostro</surname>
<given-names>JD</given-names>
</name>
<name><surname>Wu</surname>
<given-names>B</given-names>
</name>
<name><surname>Greenside</surname>
<given-names>PG</given-names>
</name>
<name><surname>Chan</surname>
<given-names>SM</given-names>
</name>
<name><surname>Koenig</surname>
<given-names>JL</given-names>
</name>
<name><surname>Snyder</surname>
<given-names>MP</given-names>
</name>
<name><surname>Pritchard</surname>
<given-names>JK</given-names>
</name>
<name><surname>Kundaje</surname>
<given-names>A</given-names>
</name>
<name><surname>Greenleaf</surname>
<given-names>WJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution</article-title>
<source>Nat Genet</source>
<year>2016</year>
<volume>48</volume>
<issue>10</issue>
<fpage>1193</fpage>
<lpage>1203</lpage>
<pub-id pub-id-type="doi">10.1038/ng.3646</pub-id>
<pub-id pub-id-type="pmid">27526324</pub-id>
</element-citation>
</ref>
<ref id="CR43"><label>43.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Voss</surname>
<given-names>TC</given-names>
</name>
<name><surname>Schiltz</surname>
<given-names>RL</given-names>
</name>
<name><surname>Sung</surname>
<given-names>MH</given-names>
</name>
<name><surname>Yen</surname>
<given-names>PM</given-names>
</name>
<name><surname>Stamatoyannopoulos</surname>
<given-names>JA</given-names>
</name>
<name><surname>Biddie</surname>
<given-names>SC</given-names>
</name>
<name><surname>Johnson</surname>
<given-names>TA</given-names>
</name>
<name><surname>Miranda</surname>
<given-names>TB</given-names>
</name>
<name><surname>John</surname>
<given-names>S</given-names>
</name>
<name><surname>Hager</surname>
<given-names>GL</given-names>
</name>
</person-group>
<article-title>Dynamic exchange at regulatory elements during chromatin remodeling underlies assisted loading mechanism</article-title>
<source>Cell</source>
<year>2011</year>
<volume>146</volume>
<issue>4</issue>
<fpage>544</fpage>
<lpage>554</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2011.07.006</pub-id>
<pub-id pub-id-type="pmid">21835447</pub-id>
</element-citation>
</ref>
<ref id="CR44"><label>44.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mirny</surname>
<given-names>LA</given-names>
</name>
</person-group>
<article-title>Nucleosome-mediated cooperativity between transcription factors</article-title>
<source>Proc Natl Acad Sci U S A</source>
<year>2010</year>
<volume>107</volume>
<issue>52</issue>
<fpage>22534</fpage>
<lpage>22539</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0913805107</pub-id>
<pub-id pub-id-type="pmid">21149679</pub-id>
</element-citation>
</ref>
<ref id="CR45"><label>45.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sheffield</surname>
<given-names>NC</given-names>
</name>
<name><surname>Thurman</surname>
<given-names>RE</given-names>
</name>
<name><surname>Song</surname>
<given-names>L</given-names>
</name>
<name><surname>Safi</surname>
<given-names>A</given-names>
</name>
<name><surname>Stamatoyannopoulos</surname>
<given-names>JA</given-names>
</name>
<name><surname>Lenhard</surname>
<given-names>B</given-names>
</name>
<name><surname>Crawford</surname>
<given-names>GE</given-names>
</name>
<name><surname>Furey</surname>
<given-names>TS</given-names>
</name>
</person-group>
<article-title>Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions</article-title>
<source>Genome Res</source>
<year>2013</year>
<volume>23</volume>
<issue>5</issue>
<fpage>777</fpage>
<lpage>788</lpage>
<pub-id pub-id-type="doi">10.1101/gr.152140.112</pub-id>
<pub-id pub-id-type="pmid">23482648</pub-id>
</element-citation>
</ref>
<ref id="CR46"><label>46.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Biddie</surname>
<given-names>SC</given-names>
</name>
<name><surname>John</surname>
<given-names>S</given-names>
</name>
<name><surname>Sabo</surname>
<given-names>PJ</given-names>
</name>
<name><surname>Thurman</surname>
<given-names>RE</given-names>
</name>
<name><surname>Johnson</surname>
<given-names>TA</given-names>
</name>
<name><surname>Schiltz</surname>
<given-names>RL</given-names>
</name>
<name><surname>Miranda</surname>
<given-names>TB</given-names>
</name>
<name><surname>Sung</surname>
<given-names>MH</given-names>
</name>
<name><surname>Trump</surname>
<given-names>S</given-names>
</name>
<name><surname>Lightman</surname>
<given-names>SL</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Transcription factor AP1 potentiates chromatin accessibility and glucocorticoid receptor binding</article-title>
<source>Mol Cell</source>
<year>2011</year>
<volume>43</volume>
<issue>1</issue>
<fpage>145</fpage>
<lpage>155</lpage>
<pub-id pub-id-type="doi">10.1016/j.molcel.2011.06.016</pub-id>
<pub-id pub-id-type="pmid">21726817</pub-id>
</element-citation>
</ref>
<ref id="CR47"><label>47.</label>
<mixed-citation publication-type="other">Schep AN, Wu B, Buenrostro JD, Greenleaf WJ: chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods 2017, advance online publication.</mixed-citation>
</ref>
<ref id="CR48"><label>48.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Fast gapped-read alignment with bowtie 2</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>9</volume>
<issue>4</issue>
<fpage>357</fpage>
<lpage>359</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id>
<pub-id pub-id-type="pmid">22388286</pub-id>
</element-citation>
</ref>
<ref id="CR49"><label>49.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Karolchik</surname>
<given-names>D</given-names>
</name>
<name><surname>Hinrichs</surname>
<given-names>AS</given-names>
</name>
<name><surname>Furey</surname>
<given-names>TS</given-names>
</name>
<name><surname>Roskin</surname>
<given-names>KM</given-names>
</name>
<name><surname>Sugnet</surname>
<given-names>CW</given-names>
</name>
<name><surname>Haussler</surname>
<given-names>D</given-names>
</name>
<name><surname>Kent</surname>
<given-names>WJ</given-names>
</name>
</person-group>
<article-title>The UCSC table browser data retrieval tool</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<issue>Database issue</issue>
<fpage>D493</fpage>
<lpage>D496</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh103</pub-id>
<pub-id pub-id-type="pmid">14681465</pub-id>
</element-citation>
</ref>
<ref id="CR50"><label>50.</label>
<mixed-citation publication-type="other">Zilberstein CB-Z, Eskin E, Yakhini Z. Using expression data to discover RNA and DNA regulatory sequence motifs. Proceedings of the First Annual RECOMB Satellite Workshop on Regulatory Genomics. 2004:65–78.</mixed-citation>
</ref>
<ref id="CR51"><label>51.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pollen</surname>
<given-names>AA</given-names>
</name>
<name><surname>Nowakowski</surname>
<given-names>TJ</given-names>
</name>
<name><surname>Shuga</surname>
<given-names>J</given-names>
</name>
<name><surname>Wang</surname>
<given-names>X</given-names>
</name>
<name><surname>Leyrat</surname>
<given-names>AA</given-names>
</name>
<name><surname>Lui</surname>
<given-names>JH</given-names>
</name>
<name><surname>Li</surname>
<given-names>N</given-names>
</name>
<name><surname>Szpankowski</surname>
<given-names>L</given-names>
</name>
<name><surname>Fowler</surname>
<given-names>B</given-names>
</name>
<name><surname>Chen</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex</article-title>
<source>Nat Biotechnol</source>
<year>2014</year>
<volume>32</volume>
<issue>10</issue>
<fpage>1053</fpage>
<lpage>1058</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2967</pub-id>
<pub-id pub-id-type="pmid">25086649</pub-id>
</element-citation>
</ref>
<ref id="CR52"><label>52.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Granek</surname>
<given-names>JA</given-names>
</name>
<name><surname>Clarke</surname>
<given-names>ND</given-names>
</name>
</person-group>
<article-title>Explicit equilibrium modeling of transcription-factor binding and gene regulation</article-title>
<source>Genome Biol</source>
<year>2005</year>
<volume>6</volume>
<issue>10</issue>
<fpage>R87</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2005-6-10-r87</pub-id>
<pub-id pub-id-type="pmid">16207358</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000273  | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000273  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri