Serveur d'exploration sur Pittsburgh

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Integrating Diverse Datasets Improves Developmental Enhancer Prediction

Identifieur interne : 000259 ( Pmc/Corpus ); précédent : 000258; suivant : 000260

Integrating Diverse Datasets Improves Developmental Enhancer Prediction

Auteurs : Genevieve D. Erwin ; Nir Oksenberg ; Rebecca M. Truty ; Dennis Kostka ; Karl K. Murphy ; Nadav Ahituv ; Katherine S. Pollard ; John A. Capra

Source :

RBID : PMC:4072507

Abstract

Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.


Url:
DOI: 10.1371/journal.pcbi.1003677
PubMed: 24967590
PubMed Central: 4072507

Links to Exploration step

PMC:4072507

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Integrating Diverse Datasets Improves Developmental Enhancer Prediction</title>
<author>
<name sortKey="Erwin, Genevieve D" sort="Erwin, Genevieve D" uniqKey="Erwin G" first="Genevieve D." last="Erwin">Genevieve D. Erwin</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oksenberg, Nir" sort="Oksenberg, Nir" uniqKey="Oksenberg N" first="Nir" last="Oksenberg">Nir Oksenberg</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Truty, Rebecca M" sort="Truty, Rebecca M" uniqKey="Truty R" first="Rebecca M." last="Truty">Rebecca M. Truty</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kostka, Dennis" sort="Kostka, Dennis" uniqKey="Kostka D" first="Dennis" last="Kostka">Dennis Kostka</name>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Department of Developmental Biology and Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Murphy, Karl K" sort="Murphy, Karl K" uniqKey="Murphy K" first="Karl K." last="Murphy">Karl K. Murphy</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ahituv, Nadav" sort="Ahituv, Nadav" uniqKey="Ahituv N" first="Nadav" last="Ahituv">Nadav Ahituv</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pollard, Katherine S" sort="Pollard, Katherine S" uniqKey="Pollard K" first="Katherine S." last="Pollard">Katherine S. Pollard</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff5">
<addr-line>Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Capra, John A" sort="Capra, John A" uniqKey="Capra J" first="John A." last="Capra">John A. Capra</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Center for Human Genetics Research and Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24967590</idno>
<idno type="pmc">4072507</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4072507</idno>
<idno type="RBID">PMC:4072507</idno>
<idno type="doi">10.1371/journal.pcbi.1003677</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000259</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000259</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Integrating Diverse Datasets Improves Developmental Enhancer Prediction</title>
<author>
<name sortKey="Erwin, Genevieve D" sort="Erwin, Genevieve D" uniqKey="Erwin G" first="Genevieve D." last="Erwin">Genevieve D. Erwin</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oksenberg, Nir" sort="Oksenberg, Nir" uniqKey="Oksenberg N" first="Nir" last="Oksenberg">Nir Oksenberg</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Truty, Rebecca M" sort="Truty, Rebecca M" uniqKey="Truty R" first="Rebecca M." last="Truty">Rebecca M. Truty</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kostka, Dennis" sort="Kostka, Dennis" uniqKey="Kostka D" first="Dennis" last="Kostka">Dennis Kostka</name>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Department of Developmental Biology and Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Murphy, Karl K" sort="Murphy, Karl K" uniqKey="Murphy K" first="Karl K." last="Murphy">Karl K. Murphy</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ahituv, Nadav" sort="Ahituv, Nadav" uniqKey="Ahituv N" first="Nadav" last="Ahituv">Nadav Ahituv</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pollard, Katherine S" sort="Pollard, Katherine S" uniqKey="Pollard K" first="Katherine S." last="Pollard">Katherine S. Pollard</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff5">
<addr-line>Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Capra, John A" sort="Capra, John A" uniqKey="Capra J" first="John A." last="Capra">John A. Capra</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Center for Human Genetics Research and Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS Computational Biology</title>
<idno type="ISSN">1553-734X</idno>
<idno type="eISSN">1553-7358</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through
<italic>in vivo</italic>
validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Ong, Ct" uniqKey="Ong C">CT Ong</name>
</author>
<author>
<name sortKey="Corces, Vg" uniqKey="Corces V">VG Corces</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bulger, M" uniqKey="Bulger M">M Bulger</name>
</author>
<author>
<name sortKey="Groudine, M" uniqKey="Groudine M">M Groudine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author>
<name sortKey="Pennacchio, La" uniqKey="Pennacchio L">LA Pennacchio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sakabe, Nj" uniqKey="Sakabe N">NJ Sakabe</name>
</author>
<author>
<name sortKey="Savic, D" uniqKey="Savic D">D Savic</name>
</author>
<author>
<name sortKey="Nobrega, Ma" uniqKey="Nobrega M">MA Nobrega</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Noonan, Jp" uniqKey="Noonan J">JP Noonan</name>
</author>
<author>
<name sortKey="Mccallion, As" uniqKey="Mccallion A">AS McCallion</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lomvardas, S" uniqKey="Lomvardas S">S Lomvardas</name>
</author>
<author>
<name sortKey="Barnea, G" uniqKey="Barnea G">G Barnea</name>
</author>
<author>
<name sortKey="Pisapia, Dj" uniqKey="Pisapia D">DJ Pisapia</name>
</author>
<author>
<name sortKey="Mendelsohn, M" uniqKey="Mendelsohn M">M Mendelsohn</name>
</author>
<author>
<name sortKey="Kirkland, J" uniqKey="Kirkland J">J Kirkland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Akiyama, Ja" uniqKey="Akiyama J">JA Akiyama</name>
</author>
<author>
<name sortKey="Shoukry, M" uniqKey="Shoukry M">M Shoukry</name>
</author>
<author>
<name sortKey="Afzal, V" uniqKey="Afzal V">V Afzal</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Taher, L" uniqKey="Taher L">L Taher</name>
</author>
<author>
<name sortKey="Girgis, H" uniqKey="Girgis H">H Girgis</name>
</author>
<author>
<name sortKey="May, D" uniqKey="May D">D May</name>
</author>
<author>
<name sortKey="Golonzhka, O" uniqKey="Golonzhka O">O Golonzhka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koch, Cm" uniqKey="Koch C">CM Koch</name>
</author>
<author>
<name sortKey="Andrews, Rm" uniqKey="Andrews R">RM Andrews</name>
</author>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author>
<name sortKey="Dillon, Sc" uniqKey="Dillon S">SC Dillon</name>
</author>
<author>
<name sortKey="Karaoz, U" uniqKey="Karaoz U">U Karaoz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heintzman, Nd" uniqKey="Heintzman N">ND Heintzman</name>
</author>
<author>
<name sortKey="Hon, Gc" uniqKey="Hon G">GC Hon</name>
</author>
<author>
<name sortKey="Hawkins, Rd" uniqKey="Hawkins R">RD Hawkins</name>
</author>
<author>
<name sortKey="Kheradpour, P" uniqKey="Kheradpour P">P Kheradpour</name>
</author>
<author>
<name sortKey="Stark, A" uniqKey="Stark A">A Stark</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sholtis, Sj" uniqKey="Sholtis S">SJ Sholtis</name>
</author>
<author>
<name sortKey="Noonan, Jp" uniqKey="Noonan J">JP Noonan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Levine, M" uniqKey="Levine M">M Levine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Banerji, J" uniqKey="Banerji J">J Banerji</name>
</author>
<author>
<name sortKey="Rusconi, S" uniqKey="Rusconi S">S Rusconi</name>
</author>
<author>
<name sortKey="Schaffner, W" uniqKey="Schaffner W">W Schaffner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gillies, Sd" uniqKey="Gillies S">SD Gillies</name>
</author>
<author>
<name sortKey="Morrison, Sl" uniqKey="Morrison S">SL Morrison</name>
</author>
<author>
<name sortKey="Oi, Vt" uniqKey="Oi V">VT Oi</name>
</author>
<author>
<name sortKey="Tonegawa, S" uniqKey="Tonegawa S">S Tonegawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nobrega, Ma" uniqKey="Nobrega M">MA Nobrega</name>
</author>
<author>
<name sortKey="Ovcharenko, I" uniqKey="Ovcharenko I">I Ovcharenko</name>
</author>
<author>
<name sortKey="Afzal, V" uniqKey="Afzal V">V Afzal</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pennacchio, La" uniqKey="Pennacchio L">LA Pennacchio</name>
</author>
<author>
<name sortKey="Ahituv, N" uniqKey="Ahituv N">N Ahituv</name>
</author>
<author>
<name sortKey="Moses, Am" uniqKey="Moses A">AM Moses</name>
</author>
<author>
<name sortKey="Prabhakar, S" uniqKey="Prabhakar S">S Prabhakar</name>
</author>
<author>
<name sortKey="Nobrega, Ma" uniqKey="Nobrega M">MA Nobrega</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Blow, Mj" uniqKey="Blow M">MJ Blow</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Zhang, T" uniqKey="Zhang T">T Zhang</name>
</author>
<author>
<name sortKey="Akiyama, Ja" uniqKey="Akiyama J">JA Akiyama</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Prabhakar, S" uniqKey="Prabhakar S">S Prabhakar</name>
</author>
<author>
<name sortKey="Akiyama, Ja" uniqKey="Akiyama J">JA Akiyama</name>
</author>
<author>
<name sortKey="Shoukry, M" uniqKey="Shoukry M">M Shoukry</name>
</author>
<author>
<name sortKey="Lewis, Kd" uniqKey="Lewis K">KD Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woolfe, A" uniqKey="Woolfe A">A Woolfe</name>
</author>
<author>
<name sortKey="Goodson, M" uniqKey="Goodson M">M Goodson</name>
</author>
<author>
<name sortKey="Goode, Dk" uniqKey="Goode D">DK Goode</name>
</author>
<author>
<name sortKey="Snell, P" uniqKey="Snell P">P Snell</name>
</author>
<author>
<name sortKey="Mcewen, Gk" uniqKey="Mcewen G">GK McEwen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Prabhakar, S" uniqKey="Prabhakar S">S Prabhakar</name>
</author>
<author>
<name sortKey="Poulin, F" uniqKey="Poulin F">F Poulin</name>
</author>
<author>
<name sortKey="Shoukry, M" uniqKey="Shoukry M">M Shoukry</name>
</author>
<author>
<name sortKey="Afzal, V" uniqKey="Afzal V">V Afzal</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcgaughey, Dm" uniqKey="Mcgaughey D">DM McGaughey</name>
</author>
<author>
<name sortKey="Vinton, Rm" uniqKey="Vinton R">RM Vinton</name>
</author>
<author>
<name sortKey="Huynh, J" uniqKey="Huynh J">J Huynh</name>
</author>
<author>
<name sortKey="Al Saif, A" uniqKey="Al Saif A">A Al-Saif</name>
</author>
<author>
<name sortKey="Beer, Ma" uniqKey="Beer M">MA Beer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boyle, Ap" uniqKey="Boyle A">AP Boyle</name>
</author>
<author>
<name sortKey="Davis, S" uniqKey="Davis S">S Davis</name>
</author>
<author>
<name sortKey="Shulha, Hp" uniqKey="Shulha H">HP Shulha</name>
</author>
<author>
<name sortKey="Meltzer, P" uniqKey="Meltzer P">P Meltzer</name>
</author>
<author>
<name sortKey="Margulies, Eh" uniqKey="Margulies E">EH Margulies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giresi, Pg" uniqKey="Giresi P">PG Giresi</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="Mcdaniell, Rm" uniqKey="Mcdaniell R">RM McDaniell</name>
</author>
<author>
<name sortKey="Iyer, Vr" uniqKey="Iyer V">VR Iyer</name>
</author>
<author>
<name sortKey="Lieb, Jd" uniqKey="Lieb J">JD Lieb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dunham, I" uniqKey="Dunham I">I Dunham</name>
</author>
<author>
<name sortKey="Kundaje, A" uniqKey="Kundaje A">A Kundaje</name>
</author>
<author>
<name sortKey="Aldred, Sf" uniqKey="Aldred S">SF Aldred</name>
</author>
<author>
<name sortKey="Collins, Pj" uniqKey="Collins P">PJ Collins</name>
</author>
<author>
<name sortKey="Davis, Ca" uniqKey="Davis C">CA Davis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andersson, R" uniqKey="Andersson R">R Andersson</name>
</author>
<author>
<name sortKey="Gebhard, C" uniqKey="Gebhard C">C Gebhard</name>
</author>
<author>
<name sortKey="Miguel Escalada, I" uniqKey="Miguel Escalada I">I Miguel-Escalada</name>
</author>
<author>
<name sortKey="Hoof, I" uniqKey="Hoof I">I Hoof</name>
</author>
<author>
<name sortKey="Bornholdt, J" uniqKey="Bornholdt J">J Bornholdt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wamstad, Ja" uniqKey="Wamstad J">JA Wamstad</name>
</author>
<author>
<name sortKey="Alexander, Jm" uniqKey="Alexander J">JM Alexander</name>
</author>
<author>
<name sortKey="Truty, Rm" uniqKey="Truty R">RM Truty</name>
</author>
<author>
<name sortKey="Shrikumar, A" uniqKey="Shrikumar A">A Shrikumar</name>
</author>
<author>
<name sortKey="Li, F" uniqKey="Li F">F Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paige, Sl" uniqKey="Paige S">SL Paige</name>
</author>
<author>
<name sortKey="Thomas, S" uniqKey="Thomas S">S Thomas</name>
</author>
<author>
<name sortKey="Stoick Cooper, Cl" uniqKey="Stoick Cooper C">CL Stoick-Cooper</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Maves, L" uniqKey="Maves L">L Maves</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jin, C" uniqKey="Jin C">C Jin</name>
</author>
<author>
<name sortKey="Zang, C" uniqKey="Zang C">C Zang</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
<author>
<name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author>
<name sortKey="Peng, W" uniqKey="Peng W">W Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, Hh" uniqKey="He H">HH He</name>
</author>
<author>
<name sortKey="Meyer, Ca" uniqKey="Meyer C">CA Meyer</name>
</author>
<author>
<name sortKey="Shin, H" uniqKey="Shin H">H Shin</name>
</author>
<author>
<name sortKey="Bailey, St" uniqKey="Bailey S">ST Bailey</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thurman, Re" uniqKey="Thurman R">RE Thurman</name>
</author>
<author>
<name sortKey="Rynes, E" uniqKey="Rynes E">E Rynes</name>
</author>
<author>
<name sortKey="Humbert, R" uniqKey="Humbert R">R Humbert</name>
</author>
<author>
<name sortKey="Vierstra, J" uniqKey="Vierstra J">J Vierstra</name>
</author>
<author>
<name sortKey="Maurano, Mt" uniqKey="Maurano M">MT Maurano</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heintzman, Nd" uniqKey="Heintzman N">ND Heintzman</name>
</author>
<author>
<name sortKey="Stuart, Rk" uniqKey="Stuart R">RK Stuart</name>
</author>
<author>
<name sortKey="Hon, G" uniqKey="Hon G">G Hon</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Ching, Cw" uniqKey="Ching C">CW Ching</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cotney, J" uniqKey="Cotney J">J Cotney</name>
</author>
<author>
<name sortKey="Leng, J" uniqKey="Leng J">J Leng</name>
</author>
<author>
<name sortKey="Oh, S" uniqKey="Oh S">S Oh</name>
</author>
<author>
<name sortKey="Demare, Le" uniqKey="Demare L">LE DeMare</name>
</author>
<author>
<name sortKey="Reilly, Sk" uniqKey="Reilly S">SK Reilly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Creyghton, Mp" uniqKey="Creyghton M">MP Creyghton</name>
</author>
<author>
<name sortKey="Cheng, Aw" uniqKey="Cheng A">AW Cheng</name>
</author>
<author>
<name sortKey="Welstead, Gg" uniqKey="Welstead G">GG Welstead</name>
</author>
<author>
<name sortKey="Kooistra, T" uniqKey="Kooistra T">T Kooistra</name>
</author>
<author>
<name sortKey="Carey, Bw" uniqKey="Carey B">BW Carey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rada Iglesias, A" uniqKey="Rada Iglesias A">A Rada-Iglesias</name>
</author>
<author>
<name sortKey="Bajpai, R" uniqKey="Bajpai R">R Bajpai</name>
</author>
<author>
<name sortKey="Swigut, T" uniqKey="Swigut T">T Swigut</name>
</author>
<author>
<name sortKey="Brugmann, Sa" uniqKey="Brugmann S">SA Brugmann</name>
</author>
<author>
<name sortKey="Flynn, Ra" uniqKey="Flynn R">RA Flynn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mikkelsen, Ts" uniqKey="Mikkelsen T">TS Mikkelsen</name>
</author>
<author>
<name sortKey="Ku, M" uniqKey="Ku M">M Ku</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author>
<name sortKey="Issac, B" uniqKey="Issac B">B Issac</name>
</author>
<author>
<name sortKey="Lieberman, E" uniqKey="Lieberman E">E Lieberman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Vw" uniqKey="Zhou V">VW Zhou</name>
</author>
<author>
<name sortKey="Goren, A" uniqKey="Goren A">A Goren</name>
</author>
<author>
<name sortKey="Bernstein, Be" uniqKey="Bernstein B">BE Bernstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blow, Mj" uniqKey="Blow M">MJ Blow</name>
</author>
<author>
<name sortKey="Mcculley, Dj" uniqKey="Mcculley D">DJ McCulley</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Zhang, T" uniqKey="Zhang T">T Zhang</name>
</author>
<author>
<name sortKey="Akiyama, Ja" uniqKey="Akiyama J">JA Akiyama</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghisletti, S" uniqKey="Ghisletti S">S Ghisletti</name>
</author>
<author>
<name sortKey="Barozzi, I" uniqKey="Barozzi I">I Barozzi</name>
</author>
<author>
<name sortKey="Mietton, F" uniqKey="Mietton F">F Mietton</name>
</author>
<author>
<name sortKey="Polletti, S" uniqKey="Polletti S">S Polletti</name>
</author>
<author>
<name sortKey="De Santa, F" uniqKey="De Santa F">F De Santa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="May, D" uniqKey="May D">D May</name>
</author>
<author>
<name sortKey="Blow, Mj" uniqKey="Blow M">MJ Blow</name>
</author>
<author>
<name sortKey="Kaplan, T" uniqKey="Kaplan T">T Kaplan</name>
</author>
<author>
<name sortKey="Mcculley, Dj" uniqKey="Mcculley D">DJ McCulley</name>
</author>
<author>
<name sortKey="Jensen, Bc" uniqKey="Jensen B">BC Jensen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zinzen, Rp" uniqKey="Zinzen R">RP Zinzen</name>
</author>
<author>
<name sortKey="Girardot, C" uniqKey="Girardot C">C Girardot</name>
</author>
<author>
<name sortKey="Gagneur, J" uniqKey="Gagneur J">J Gagneur</name>
</author>
<author>
<name sortKey="Braun, M" uniqKey="Braun M">M Braun</name>
</author>
<author>
<name sortKey="Furlong, Ee" uniqKey="Furlong E">EE Furlong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, A" uniqKey="He A">A He</name>
</author>
<author>
<name sortKey="Kong, Sw" uniqKey="Kong S">SW Kong</name>
</author>
<author>
<name sortKey="Ma, Q" uniqKey="Ma Q">Q Ma</name>
</author>
<author>
<name sortKey="Pu, Wt" uniqKey="Pu W">WT Pu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yip, Ky" uniqKey="Yip K">KY Yip</name>
</author>
<author>
<name sortKey="Cheng, C" uniqKey="Cheng C">C Cheng</name>
</author>
<author>
<name sortKey="Bhardwaj, N" uniqKey="Bhardwaj N">N Bhardwaj</name>
</author>
<author>
<name sortKey="Brown, Jb" uniqKey="Brown J">JB Brown</name>
</author>
<author>
<name sortKey="Leng, J" uniqKey="Leng J">J Leng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cheng, C" uniqKey="Cheng C">C Cheng</name>
</author>
<author>
<name sortKey="Alexander, R" uniqKey="Alexander R">R Alexander</name>
</author>
<author>
<name sortKey="Min, R" uniqKey="Min R">R Min</name>
</author>
<author>
<name sortKey="Leng, J" uniqKey="Leng J">J Leng</name>
</author>
<author>
<name sortKey="Yip, Ky" uniqKey="Yip K">KY Yip</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orom, Ua" uniqKey="Orom U">UA Orom</name>
</author>
<author>
<name sortKey="Derrien, T" uniqKey="Derrien T">T Derrien</name>
</author>
<author>
<name sortKey="Beringer, M" uniqKey="Beringer M">M Beringer</name>
</author>
<author>
<name sortKey="Gumireddy, K" uniqKey="Gumireddy K">K Gumireddy</name>
</author>
<author>
<name sortKey="Gardini, A" uniqKey="Gardini A">A Gardini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
<author>
<name sortKey="Cuddapah, S" uniqKey="Cuddapah S">S Cuddapah</name>
</author>
<author>
<name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author>
<name sortKey="Roh, Ty" uniqKey="Roh T">TY Roh</name>
</author>
<author>
<name sortKey="Schones, De" uniqKey="Schones D">DE Schones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Zang, C" uniqKey="Zang C">C Zang</name>
</author>
<author>
<name sortKey="Rosenfeld, Ja" uniqKey="Rosenfeld J">JA Rosenfeld</name>
</author>
<author>
<name sortKey="Schones, De" uniqKey="Schones D">DE Schones</name>
</author>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zentner, Ge" uniqKey="Zentner G">GE Zentner</name>
</author>
<author>
<name sortKey="Tesar, Pj" uniqKey="Tesar P">PJ Tesar</name>
</author>
<author>
<name sortKey="Scacheri, Pc" uniqKey="Scacheri P">PC Scacheri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bonn, S" uniqKey="Bonn S">S Bonn</name>
</author>
<author>
<name sortKey="Zinzen, Rp" uniqKey="Zinzen R">RP Zinzen</name>
</author>
<author>
<name sortKey="Girardot, C" uniqKey="Girardot C">C Girardot</name>
</author>
<author>
<name sortKey="Gustafson, Eh" uniqKey="Gustafson E">EH Gustafson</name>
</author>
<author>
<name sortKey="Perez Gonzalez, A" uniqKey="Perez Gonzalez A">A Perez-Gonzalez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Narlikar, L" uniqKey="Narlikar L">L Narlikar</name>
</author>
<author>
<name sortKey="Sakabe, Nj" uniqKey="Sakabe N">NJ Sakabe</name>
</author>
<author>
<name sortKey="Blanski, Aa" uniqKey="Blanski A">AA Blanski</name>
</author>
<author>
<name sortKey="Arimura, Fe" uniqKey="Arimura F">FE Arimura</name>
</author>
<author>
<name sortKey="Westlund, Jm" uniqKey="Westlund J">JM Westlund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burzynski, Gm" uniqKey="Burzynski G">GM Burzynski</name>
</author>
<author>
<name sortKey="Reed, X" uniqKey="Reed X">X Reed</name>
</author>
<author>
<name sortKey="Taher, L" uniqKey="Taher L">L Taher</name>
</author>
<author>
<name sortKey="Stine, Ze" uniqKey="Stine Z">ZE Stine</name>
</author>
<author>
<name sortKey="Matsui, T" uniqKey="Matsui T">T Matsui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Busser, Bw" uniqKey="Busser B">BW Busser</name>
</author>
<author>
<name sortKey="Taher, L" uniqKey="Taher L">L Taher</name>
</author>
<author>
<name sortKey="Kim, Y" uniqKey="Kim Y">Y Kim</name>
</author>
<author>
<name sortKey="Tansey, T" uniqKey="Tansey T">T Tansey</name>
</author>
<author>
<name sortKey="Bloom, Mj" uniqKey="Bloom M">MJ Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, D" uniqKey="Lee D">D Lee</name>
</author>
<author>
<name sortKey="Karchin, R" uniqKey="Karchin R">R Karchin</name>
</author>
<author>
<name sortKey="Beer, Ma" uniqKey="Beer M">MA Beer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gorkin, Du" uniqKey="Gorkin D">DU Gorkin</name>
</author>
<author>
<name sortKey="Lee, D" uniqKey="Lee D">D Lee</name>
</author>
<author>
<name sortKey="Reed, X" uniqKey="Reed X">X Reed</name>
</author>
<author>
<name sortKey="Fletez Brant, C" uniqKey="Fletez Brant C">C Fletez-Brant</name>
</author>
<author>
<name sortKey="Bessling, Sl" uniqKey="Bessling S">SL Bessling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rajagopal, N" uniqKey="Rajagopal N">N Rajagopal</name>
</author>
<author>
<name sortKey="Xie, W" uniqKey="Xie W">W Xie</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Wagner, U" uniqKey="Wagner U">U Wagner</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lahdesmaki, H" uniqKey="Lahdesmaki H">H Lahdesmaki</name>
</author>
<author>
<name sortKey="Rust, Ag" uniqKey="Rust A">AG Rust</name>
</author>
<author>
<name sortKey="Shmulevich, I" uniqKey="Shmulevich I">I Shmulevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kantorovitz, Mr" uniqKey="Kantorovitz M">MR Kantorovitz</name>
</author>
<author>
<name sortKey="Kazemian, M" uniqKey="Kazemian M">M Kazemian</name>
</author>
<author>
<name sortKey="Kinston, S" uniqKey="Kinston S">S Kinston</name>
</author>
<author>
<name sortKey="Miranda Saavedra, D" uniqKey="Miranda Saavedra D">D Miranda-Saavedra</name>
</author>
<author>
<name sortKey="Zhu, Q" uniqKey="Zhu Q">Q Zhu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Won, Kj" uniqKey="Won K">KJ Won</name>
</author>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pique Regi, R" uniqKey="Pique Regi R">R Pique-Regi</name>
</author>
<author>
<name sortKey="Degner, Jf" uniqKey="Degner J">JF Degner</name>
</author>
<author>
<name sortKey="Pai, Aa" uniqKey="Pai A">AA Pai</name>
</author>
<author>
<name sortKey="Gaffney, Dj" uniqKey="Gaffney D">DJ Gaffney</name>
</author>
<author>
<name sortKey="Gilad, Y" uniqKey="Gilad Y">Y Gilad</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arvey, A" uniqKey="Arvey A">A Arvey</name>
</author>
<author>
<name sortKey="Agius, P" uniqKey="Agius P">P Agius</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
<author>
<name sortKey="Leslie, C" uniqKey="Leslie C">C Leslie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cuellar Partida, G" uniqKey="Cuellar Partida G">G Cuellar-Partida</name>
</author>
<author>
<name sortKey="Buske, Fa" uniqKey="Buske F">FA Buske</name>
</author>
<author>
<name sortKey="Mcleay, Rc" uniqKey="Mcleay R">RC McLeay</name>
</author>
<author>
<name sortKey="Whitington, T" uniqKey="Whitington T">T Whitington</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, D" uniqKey="Wang D">D Wang</name>
</author>
<author>
<name sortKey="Do, Ht" uniqKey="Do H">HT Do</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ernst, J" uniqKey="Ernst J">J Ernst</name>
</author>
<author>
<name sortKey="Kheradpour, P" uniqKey="Kheradpour P">P Kheradpour</name>
</author>
<author>
<name sortKey="Mikkelsen, Ts" uniqKey="Mikkelsen T">TS Mikkelsen</name>
</author>
<author>
<name sortKey="Shoresh, N" uniqKey="Shoresh N">N Shoresh</name>
</author>
<author>
<name sortKey="Ward, Ld" uniqKey="Ward L">LD Ward</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffman, Mm" uniqKey="Hoffman M">MM Hoffman</name>
</author>
<author>
<name sortKey="Buske, Oj" uniqKey="Buske O">OJ Buske</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
<author>
<name sortKey="Bilmes, Ja" uniqKey="Bilmes J">JA Bilmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sonnenburg, S" uniqKey="Sonnenburg S">S Sonnenburg</name>
</author>
<author>
<name sortKey="Zien, A" uniqKey="Zien A">A Zien</name>
</author>
<author>
<name sortKey="Ratsch, G" uniqKey="Ratsch G">G Ratsch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kloft, M" uniqKey="Kloft M">M Kloft</name>
</author>
<author>
<name sortKey="Brefeld, U" uniqKey="Brefeld U">U Brefeld</name>
</author>
<author>
<name sortKey="Sonnenburg, S" uniqKey="Sonnenburg S">S Sonnenburg</name>
</author>
<author>
<name sortKey="Zien, A" uniqKey="Zien A">A Zien</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Minovitsky, S" uniqKey="Minovitsky S">S Minovitsky</name>
</author>
<author>
<name sortKey="Dubchak, I" uniqKey="Dubchak I">I Dubchak</name>
</author>
<author>
<name sortKey="Pennacchio, La" uniqKey="Pennacchio L">LA Pennacchio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Rahilly, R" uniqKey="O Rahilly R">R O'Rahilly</name>
</author>
<author>
<name sortKey="Muller, F" uniqKey="Muller F">F Muller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leslie, C" uniqKey="Leslie C">C Leslie</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Siepel, A" uniqKey="Siepel A">A Siepel</name>
</author>
<author>
<name sortKey="Bejerano, G" uniqKey="Bejerano G">G Bejerano</name>
</author>
<author>
<name sortKey="Pedersen, Js" uniqKey="Pedersen J">JS Pedersen</name>
</author>
<author>
<name sortKey="Hinrichs, As" uniqKey="Hinrichs A">AS Hinrichs</name>
</author>
<author>
<name sortKey="Hou, M" uniqKey="Hou M">M Hou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Taher, L" uniqKey="Taher L">L Taher</name>
</author>
<author>
<name sortKey="Narlikar, L" uniqKey="Narlikar L">L Narlikar</name>
</author>
<author>
<name sortKey="Ovcharenko, I" uniqKey="Ovcharenko I">I Ovcharenko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Capra, Ja" uniqKey="Capra J">JA Capra</name>
</author>
<author>
<name sortKey="Erwin, Gd" uniqKey="Erwin G">GD Erwin</name>
</author>
<author>
<name sortKey="Mckinsey, G" uniqKey="Mckinsey G">G McKinsey</name>
</author>
<author>
<name sortKey="Rubenstein, Jlr" uniqKey="Rubenstein J">JLR Rubenstein</name>
</author>
<author>
<name sortKey="Pollard, Ks" uniqKey="Pollard K">KS Pollard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nord, As" uniqKey="Nord A">AS Nord</name>
</author>
<author>
<name sortKey="Blow, Mj" uniqKey="Blow M">MJ Blow</name>
</author>
<author>
<name sortKey="Attanasio, C" uniqKey="Attanasio C">C Attanasio</name>
</author>
<author>
<name sortKey="Akiyama, Ja" uniqKey="Akiyama J">JA Akiyama</name>
</author>
<author>
<name sortKey="Holt, A" uniqKey="Holt A">A Holt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hindorff, La" uniqKey="Hindorff L">LA Hindorff</name>
</author>
<author>
<name sortKey="Sethupathy, P" uniqKey="Sethupathy P">P Sethupathy</name>
</author>
<author>
<name sortKey="Junkins, Ha" uniqKey="Junkins H">HA Junkins</name>
</author>
<author>
<name sortKey="Ramos, Em" uniqKey="Ramos E">EM Ramos</name>
</author>
<author>
<name sortKey="Mehta, Jp" uniqKey="Mehta J">JP Mehta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kume, T" uniqKey="Kume T">T Kume</name>
</author>
<author>
<name sortKey="Deng, K" uniqKey="Deng K">K Deng</name>
</author>
<author>
<name sortKey="Hogan, Bl" uniqKey="Hogan B">BL Hogan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kume, T" uniqKey="Kume T">T Kume</name>
</author>
<author>
<name sortKey="Jiang, H" uniqKey="Jiang H">H Jiang</name>
</author>
<author>
<name sortKey="Topczewska, Jm" uniqKey="Topczewska J">JM Topczewska</name>
</author>
<author>
<name sortKey="Hogan, Bl" uniqKey="Hogan B">BL Hogan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Rs" uniqKey="Smith R">RS Smith</name>
</author>
<author>
<name sortKey="Zabaleta, A" uniqKey="Zabaleta A">A Zabaleta</name>
</author>
<author>
<name sortKey="Kume, T" uniqKey="Kume T">T Kume</name>
</author>
<author>
<name sortKey="Savinova, Ov" uniqKey="Savinova O">OV Savinova</name>
</author>
<author>
<name sortKey="Kidson, Sh" uniqKey="Kidson S">SH Kidson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aldinger, Ka" uniqKey="Aldinger K">KA Aldinger</name>
</author>
<author>
<name sortKey="Lehmann, Oj" uniqKey="Lehmann O">OJ Lehmann</name>
</author>
<author>
<name sortKey="Hudgins, L" uniqKey="Hudgins L">L Hudgins</name>
</author>
<author>
<name sortKey="Chizhikov, Vv" uniqKey="Chizhikov V">VV Chizhikov</name>
</author>
<author>
<name sortKey="Bassuk, Ag" uniqKey="Bassuk A">AG Bassuk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Seuntjens, E" uniqKey="Seuntjens E">E Seuntjens</name>
</author>
<author>
<name sortKey="Nityanandam, A" uniqKey="Nityanandam A">A Nityanandam</name>
</author>
<author>
<name sortKey="Miquelajauregui, A" uniqKey="Miquelajauregui A">A Miquelajauregui</name>
</author>
<author>
<name sortKey="Debruyn, J" uniqKey="Debruyn J">J Debruyn</name>
</author>
<author>
<name sortKey="Stryjewska, A" uniqKey="Stryjewska A">A Stryjewska</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miquelajauregui, A" uniqKey="Miquelajauregui A">A Miquelajauregui</name>
</author>
<author>
<name sortKey="Van De Putte, T" uniqKey="Van De Putte T">T Van de Putte</name>
</author>
<author>
<name sortKey="Polyakov, A" uniqKey="Polyakov A">A Polyakov</name>
</author>
<author>
<name sortKey="Nityanandam, A" uniqKey="Nityanandam A">A Nityanandam</name>
</author>
<author>
<name sortKey="Boppana, S" uniqKey="Boppana S">S Boppana</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weng, Q" uniqKey="Weng Q">Q Weng</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Xu, X" uniqKey="Xu X">X Xu</name>
</author>
<author>
<name sortKey="Yang, B" uniqKey="Yang B">B Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Renthal, Ne" uniqKey="Renthal N">NE Renthal</name>
</author>
<author>
<name sortKey="Chen, Cc" uniqKey="Chen C">CC Chen</name>
</author>
<author>
<name sortKey="Williams, Kc" uniqKey="Williams K">KC Williams</name>
</author>
<author>
<name sortKey="Gerard, Rd" uniqKey="Gerard R">RD Gerard</name>
</author>
<author>
<name sortKey="Prange Kiel, J" uniqKey="Prange Kiel J">J Prange-Kiel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, M" uniqKey="Wilson M">M Wilson</name>
</author>
<author>
<name sortKey="Mowat, D" uniqKey="Mowat D">D Mowat</name>
</author>
<author>
<name sortKey="Dastot Le Moal, F" uniqKey="Dastot Le Moal F">F Dastot-Le Moal</name>
</author>
<author>
<name sortKey="Cacheux, V" uniqKey="Cacheux V">V Cacheux</name>
</author>
<author>
<name sortKey="Kaariainen, H" uniqKey="Kaariainen H">H Kaariainen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="El Kasti, Mm" uniqKey="El Kasti M">MM El-Kasti</name>
</author>
<author>
<name sortKey="Wells, T" uniqKey="Wells T">T Wells</name>
</author>
<author>
<name sortKey="Carter, Da" uniqKey="Carter D">DA Carter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pollard, Ks" uniqKey="Pollard K">KS Pollard</name>
</author>
<author>
<name sortKey="Salama, Sr" uniqKey="Salama S">SR Salama</name>
</author>
<author>
<name sortKey="King, B" uniqKey="King B">B King</name>
</author>
<author>
<name sortKey="Kern, Ad" uniqKey="Kern A">AD Kern</name>
</author>
<author>
<name sortKey="Dreszer, T" uniqKey="Dreszer T">T Dreszer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lindblad Toh, K" uniqKey="Lindblad Toh K">K Lindblad-Toh</name>
</author>
<author>
<name sortKey="Garber, M" uniqKey="Garber M">M Garber</name>
</author>
<author>
<name sortKey="Zuk, O" uniqKey="Zuk O">O Zuk</name>
</author>
<author>
<name sortKey="Lin, Mf" uniqKey="Lin M">MF Lin</name>
</author>
<author>
<name sortKey="Parker, Bj" uniqKey="Parker B">BJ Parker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Capra, Ja" uniqKey="Capra J">JA Capra</name>
</author>
<author>
<name sortKey="Erwin, Gd" uniqKey="Erwin G">GD Erwin</name>
</author>
<author>
<name sortKey="Mckinsey, G" uniqKey="Mckinsey G">G McKinsey</name>
</author>
<author>
<name sortKey="Rubenstein, Jlr" uniqKey="Rubenstein J">JLR Rubenstein</name>
</author>
<author>
<name sortKey="Pollard, Ks" uniqKey="Pollard K">KS Pollard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woznica, A" uniqKey="Woznica A">A Woznica</name>
</author>
<author>
<name sortKey="Haeussler, M" uniqKey="Haeussler M">M Haeussler</name>
</author>
<author>
<name sortKey="Starobinska, E" uniqKey="Starobinska E">E Starobinska</name>
</author>
<author>
<name sortKey="Jemmett, J" uniqKey="Jemmett J">J Jemmett</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koshiba Takeuchi, K" uniqKey="Koshiba Takeuchi K">K Koshiba-Takeuchi</name>
</author>
<author>
<name sortKey="Mori, Ad" uniqKey="Mori A">AD Mori</name>
</author>
<author>
<name sortKey="Kaynak, Bl" uniqKey="Kaynak B">BL Kaynak</name>
</author>
<author>
<name sortKey="Cebra Thomas, J" uniqKey="Cebra Thomas J">J Cebra-Thomas</name>
</author>
<author>
<name sortKey="Sukonnik, T" uniqKey="Sukonnik T">T Sukonnik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Casci, T" uniqKey="Casci T">T Casci</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="White, Ma" uniqKey="White M">MA White</name>
</author>
<author>
<name sortKey="Myers, Ca" uniqKey="Myers C">CA Myers</name>
</author>
<author>
<name sortKey="Corbo, Jc" uniqKey="Corbo J">JC Corbo</name>
</author>
<author>
<name sortKey="Cohen, Ba" uniqKey="Cohen B">BA Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
<author>
<name sortKey="Stamatoyannopoulos, Ja" uniqKey="Stamatoyannopoulos J">JA Stamatoyannopoulos</name>
</author>
<author>
<name sortKey="Dutta, A" uniqKey="Dutta A">A Dutta</name>
</author>
<author>
<name sortKey="Guigo, R" uniqKey="Guigo R">R Guigo</name>
</author>
<author>
<name sortKey="Gingeras, Tr" uniqKey="Gingeras T">TR Gingeras</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quinlan, Ar" uniqKey="Quinlan A">AR Quinlan</name>
</author>
<author>
<name sortKey="Hall, Im" uniqKey="Hall I">IM Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ben Hur, A" uniqKey="Ben Hur A">A Ben-Hur</name>
</author>
<author>
<name sortKey="Weston, J" uniqKey="Weston J">J Weston</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sonnenburg, S" uniqKey="Sonnenburg S">S Sonnenburg</name>
</author>
<author>
<name sortKey="Ratsch, G" uniqKey="Ratsch G">G Ratsch</name>
</author>
<author>
<name sortKey="Henschel, S" uniqKey="Henschel S">S Henschel</name>
</author>
<author>
<name sortKey="Widmer, C" uniqKey="Widmer C">C Widmer</name>
</author>
<author>
<name sortKey="Behr, J" uniqKey="Behr J">J Behr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dietterich, Tg" uniqKey="Dietterich T">TG Dietterich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Su, Ai" uniqKey="Su A">AI Su</name>
</author>
<author>
<name sortKey="Wiltshire, T" uniqKey="Wiltshire T">T Wiltshire</name>
</author>
<author>
<name sortKey="Batalov, S" uniqKey="Batalov S">S Batalov</name>
</author>
<author>
<name sortKey="Lapp, H" uniqKey="Lapp H">H Lapp</name>
</author>
<author>
<name sortKey="Ching, Ka" uniqKey="Ching K">KA Ching</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mclean, Cy" uniqKey="Mclean C">CY McLean</name>
</author>
<author>
<name sortKey="Bristor, D" uniqKey="Bristor D">D Bristor</name>
</author>
<author>
<name sortKey="Hiller, M" uniqKey="Hiller M">M Hiller</name>
</author>
<author>
<name sortKey="Clarke, Sl" uniqKey="Clarke S">SL Clarke</name>
</author>
<author>
<name sortKey="Schaar, Bt" uniqKey="Schaar B">BT Schaar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grant, Ce" uniqKey="Grant C">CE Grant</name>
</author>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Q" uniqKey="Li Q">Q Li</name>
</author>
<author>
<name sortKey="Ritter, D" uniqKey="Ritter D">D Ritter</name>
</author>
<author>
<name sortKey="Yang, N" uniqKey="Yang N">N Yang</name>
</author>
<author>
<name sortKey="Dong, Z" uniqKey="Dong Z">Z Dong</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oksenberg, N" uniqKey="Oksenberg N">N Oksenberg</name>
</author>
<author>
<name sortKey="Stevison, L" uniqKey="Stevison L">L Stevison</name>
</author>
<author>
<name sortKey="Wall, Jd" uniqKey="Wall J">JD Wall</name>
</author>
<author>
<name sortKey="Ahituv, N" uniqKey="Ahituv N">N Ahituv</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS Comput. Biol</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id>
<journal-title-group>
<journal-title>PLoS Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24967590</article-id>
<article-id pub-id-type="pmc">4072507</article-id>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-13-01447</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1003677</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Comparative Genomics</subject>
<subject>Genome Analysis</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Developmental Biology</subject>
<subj-group>
<subject>Organism Development</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Functional Genomics</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Gene Expression</subject>
<subject>Molecular Genetics</subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Integrating Diverse Datasets Improves Developmental Enhancer Prediction</article-title>
<alt-title alt-title-type="running-head">Integrative Developmental Enhancer Prediction</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Erwin</surname>
<given-names>Genevieve D.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Oksenberg</surname>
<given-names>Nir</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Truty</surname>
<given-names>Rebecca M.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kostka</surname>
<given-names>Dennis</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Murphy</surname>
<given-names>Karl K.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ahituv</surname>
<given-names>Nadav</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pollard</surname>
<given-names>Katherine S.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Capra</surname>
<given-names>John A.</given-names>
</name>
<xref ref-type="aff" rid="aff6">
<sup>6</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Department of Developmental Biology and Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America</addr-line>
</aff>
<aff id="aff5">
<label>5</label>
<addr-line>Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America</addr-line>
</aff>
<aff id="aff6">
<label>6</label>
<addr-line>Center for Human Genetics Research and Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Ohler</surname>
<given-names>Uwe</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Duke University, United States of America</addr-line>
</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>kpollard@gladstone.ucsf.edu</email>
(KSP);
<email>tony.capra@vanderbilt.edu</email>
(JAC)</corresp>
<fn fn-type="conflict">
<p>The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con">
<p>Conceived and designed the experiments: GDE RMT DK KSP JAC. Performed the experiments: GDE DK JAC. Analyzed the data: GDE DK KSP JAC. Contributed reagents/materials/analysis tools: NO KKM NA. Wrote the paper: GDE KSP JAC.</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<month>6</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>26</day>
<month>6</month>
<year>2014</year>
</pub-date>
<volume>10</volume>
<issue>6</issue>
<elocation-id>e1003677</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>8</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>6</day>
<month>5</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>© 2014 Erwin et al</copyright-statement>
<copyright-year>2014</copyright-year>
<copyright-holder>Erwin et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.</license-p>
</license>
</permissions>
<abstract>
<p>Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through
<italic>in vivo</italic>
validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.</p>
</abstract>
<abstract abstract-type="summary">
<title>Author Summary</title>
<p>The human genome contains an immense amount of non-protein-coding DNA with unknown function. Some of this DNA regulates when, where, and at what levels genes are active during development. Enhancers, one type of regulatory element, are short stretches of DNA that can act as “switches” to turn a gene on or off at specific times in specific cells or tissues. Understanding where in the genome enhancers are located can provide insight into the genetic basis of development and disease. Enhancers are hard to identify, but clues about their locations are found in different types of data including DNA sequence, evolutionary history, and where proteins bind to DNA. Here, we introduce a new tool, called EnhancerFinder, which combines these data to predict the location and activity of enhancers during embryonic development. We trained EnhancerFinder on a large set of functionally validated human enhancers, and it proved to be very accurate. We used EnhancerFinder to predict tens of thousands of enhancers in the human genome and validated several of the predictions near three important developmental genes in mouse or zebrafish. EnhancerFinder's predictions will be useful in understanding functional regions hidden in the vast amounts of human non-coding DNA.</p>
</abstract>
<funding-group>
<funding-statement>This project was supported by NIH (
<ext-link ext-link-type="uri" xlink:href="http://www.nih.gov/">http://www.nih.gov/</ext-link>
) grants from NIGMS (GM082901, GM61390), NHGRI (HG005058, HG006768), NICHD (HD059862), NIDDK (DK090382), NINDS (NS079231), and NHLBI (HL098179), a PhRMA Foundation fellowship (
<ext-link ext-link-type="uri" xlink:href="http://www.phrmafoundation.org/">http://www.phrmafoundation.org/</ext-link>
), a University of California Achievement Awards for College Scientists (ARCS) Scholarship (
<ext-link ext-link-type="uri" xlink:href="https://www.arcsfoundation.org/">https://www.arcsfoundation.org/</ext-link>
), a gift from the San Simeon Fund (URL unavailable), and institutional funds from the J. David Gladstone Institutes (
<ext-link ext-link-type="uri" xlink:href="http://gladstoneinstitutes.org/">http://gladstoneinstitutes.org/</ext-link>
) as well as institutional funds from Vanderbilt University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<page-count count="20"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Eukaryotic gene expression is regulated by a highly orchestrated network of events, including the binding of regulatory proteins to DNA, chemical modifications to DNA and nucleosomes, recruitment of the transcriptional machinery, splicing, and post-transcriptional modifications. Enhancers are genomic regions that influence the timing, amplitude, and tissue specificity of gene expression through the binding of transcription factors and co-factors that increase transcription (as reviewed in
<xref rid="pcbi.1003677-Ong1" ref-type="bibr">[1]</xref>
,
<xref rid="pcbi.1003677-Bulger1" ref-type="bibr">[2]</xref>
). In humans, genetic variation in enhancer regions is implicated in a wide variety of developmental disorders, diseases, and adverse responses to treatments
<xref rid="pcbi.1003677-Visel1" ref-type="bibr">[3]</xref>
,
<xref rid="pcbi.1003677-Sakabe1" ref-type="bibr">[4]</xref>
,
<xref rid="pcbi.1003677-Ahituv1" ref-type="bibr">[5]</xref>
.</p>
<p>Enhancers have been discovered in introns, exons, intergenic regions megabases away from their target genes
<xref rid="pcbi.1003677-Noonan1" ref-type="bibr">[6]</xref>
, and even on different chromosomes
<xref rid="pcbi.1003677-Lomvardas1" ref-type="bibr">[7]</xref>
. An enhancer frequently drives only one of many domains of a gene's expression
<xref rid="pcbi.1003677-Visel2" ref-type="bibr">[8]</xref>
,
<xref rid="pcbi.1003677-Visel3" ref-type="bibr">[9]</xref>
and different cell types accordingly exhibit considerable differences in their active enhancers
<xref rid="pcbi.1003677-Koch1" ref-type="bibr">[10]</xref>
,
<xref rid="pcbi.1003677-Heintzman1" ref-type="bibr">[11]</xref>
. This modularity enables the creation of complex regulatory programs that can evolve relatively easily between closely related species
<xref rid="pcbi.1003677-Sholtis1" ref-type="bibr">[12]</xref>
,
<xref rid="pcbi.1003677-Levine1" ref-type="bibr">[13]</xref>
.</p>
<p>Individual enhancers were initially identified using transgenic assays in cultured cell lines
<xref rid="pcbi.1003677-Banerji1" ref-type="bibr">[14]</xref>
,
<xref rid="pcbi.1003677-Gillies1" ref-type="bibr">[15]</xref>
and later
<italic>in vivo</italic>
in model organisms, such as mouse,
<italic>Drosophila</italic>
, and zebrafish. In the
<italic>in vivo</italic>
experiments, a construct containing the sequence to be tested for enhancer activity, a minimal promoter, and a reporter gene (e.g., lacZ) is injected into fertilized eggs, and transgenic individuals are assayed for reporter gene expression.</p>
<p>Early efforts to find enhancers at the genome scale used comparative genomics. Several studies assayed non-coding regions conserved across diverse species for enhancer activity
<xref rid="pcbi.1003677-Nobrega1" ref-type="bibr">[16]</xref>
,
<xref rid="pcbi.1003677-Pennacchio1" ref-type="bibr">[17]</xref>
,
<xref rid="pcbi.1003677-Visel4" ref-type="bibr">[18]</xref>
, since functional non-coding regions likely evolve under negative selection. This approach identified many enhancers at a range of levels of evolutionary conservation
<xref rid="pcbi.1003677-Visel5" ref-type="bibr">[19]</xref>
,
<xref rid="pcbi.1003677-Woolfe1" ref-type="bibr">[20]</xref>
,
<xref rid="pcbi.1003677-Prabhakar1" ref-type="bibr">[21]</xref>
. However, relying on evolutionary conservation alone has several shortcomings: many characterized enhancers are not conserved between species
<xref rid="pcbi.1003677-McGaughey1" ref-type="bibr">[22]</xref>
, non-coding conservation is not specific to enhancer elements, and evolutionary patterns provide little information about the tissue and timing of enhancer activity.</p>
<p>Enhancer prediction has been revolutionized by recent technological advances, including chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq)
<xref rid="pcbi.1003677-Johnson1" ref-type="bibr">[23]</xref>
, RNA sequencing (RNA-seq), and sequencing of DNaseI-digested chromatin (DNase-seq)
<xref rid="pcbi.1003677-Boyle1" ref-type="bibr">[24]</xref>
or formaldehyde-assisted isolation of regulatory elements (FAIRE-seq)
<xref rid="pcbi.1003677-Giresi1" ref-type="bibr">[25]</xref>
. These “functional genomics” assays enable genome-wide measurement of histone modifications, binding sites of regulatory proteins, transcription levels, and the structural conformation of DNA. The ENCODE project
<xref rid="pcbi.1003677-Dunham1" ref-type="bibr">[26]</xref>
, FANTOM project
<xref rid="pcbi.1003677-Andersson1" ref-type="bibr">[27]</xref>
, and similar studies focused on specific cell types
<xref rid="pcbi.1003677-Wamstad1" ref-type="bibr">[28]</xref>
,
<xref rid="pcbi.1003677-Paige1" ref-type="bibr">[29]</xref>
have dramatically increased the amount of publicly available functional genomics data.</p>
<p>Functional genomics studies revealed several genomic signatures of active enhancers. For example, known enhancers are associated with the unstable histone variants H3.3 and H2A.Z
<xref rid="pcbi.1003677-Jin1" ref-type="bibr">[30]</xref>
,
<xref rid="pcbi.1003677-He1" ref-type="bibr">[31]</xref>
and low nucleosome occupancy
<xref rid="pcbi.1003677-Thurman1" ref-type="bibr">[32]</xref>
, although these chromatin states are not unique to enhancers. Monomethylation of lysine 4 on histone H3 (H3K4me1), a lack of trimethylation at the same site (H3K4me3), and acetylation of lysine 27 on histone H3 (H3K27ac) may distinguish active enhancers from promoters
<xref rid="pcbi.1003677-Koch1" ref-type="bibr">[10]</xref>
,
<xref rid="pcbi.1003677-Heintzman2" ref-type="bibr">[33]</xref>
,
<xref rid="pcbi.1003677-Cotney1" ref-type="bibr">[34]</xref>
, enhancers that are “poised” for activity later in development
<xref rid="pcbi.1003677-Creyghton1" ref-type="bibr">[35]</xref>
,
<xref rid="pcbi.1003677-RadaIglesias1" ref-type="bibr">[36]</xref>
, and regulatory elements that repress gene expression
<xref rid="pcbi.1003677-Mikkelsen1" ref-type="bibr">[37]</xref>
,
<xref rid="pcbi.1003677-Zhou1" ref-type="bibr">[38]</xref>
. Additional features that pinpoint specific classes of active enhancers include binding of the transcriptional cofactor p300/CBP
<xref rid="pcbi.1003677-Visel4" ref-type="bibr">[18]</xref>
,
<xref rid="pcbi.1003677-Blow1" ref-type="bibr">[39]</xref>
,
<xref rid="pcbi.1003677-Ghisletti1" ref-type="bibr">[40]</xref>
,
<xref rid="pcbi.1003677-May1" ref-type="bibr">[41]</xref>
, clusters of transcription factor (TF) binding sites
<xref rid="pcbi.1003677-Zinzen1" ref-type="bibr">[42]</xref>
,
<xref rid="pcbi.1003677-He2" ref-type="bibr">[43]</xref>
,
<xref rid="pcbi.1003677-Yip1" ref-type="bibr">[44]</xref>
,
<xref rid="pcbi.1003677-Cheng1" ref-type="bibr">[45]</xref>
, and enhancer RNA transcription (eRNAs)
<xref rid="pcbi.1003677-Orom1" ref-type="bibr">[46]</xref>
. Collectively, functional genomics data have pinpointed the locations of many novel enhancers and yielded insights into sequence and structural determinants of enhancer activity. However, these patterns have not proven to be universal
<xref rid="pcbi.1003677-Barski1" ref-type="bibr">[47]</xref>
,
<xref rid="pcbi.1003677-Wang1" ref-type="bibr">[48]</xref>
, and there is unlikely to be a single chromatin signature that identifies all classes of enhancers
<xref rid="pcbi.1003677-Heintzman1" ref-type="bibr">[11]</xref>
,
<xref rid="pcbi.1003677-Zentner1" ref-type="bibr">[49]</xref>
,
<xref rid="pcbi.1003677-Bonn1" ref-type="bibr">[50]</xref>
.</p>
<p>Given the complexity of these functional genomics data sets, computational methods have been developed to improve and generalize the enhancer predictions made from simple combinations of these data. Support vector machines (SVMs) and linear regression models trained to interpret DNA sequence motifs underlying known enhancers have successfully identified novel enhancers active in heart
<xref rid="pcbi.1003677-Narlikar1" ref-type="bibr">[51]</xref>
, hindbrain
<xref rid="pcbi.1003677-Burzynski1" ref-type="bibr">[52]</xref>
, and muscle
<xref rid="pcbi.1003677-Busser1" ref-type="bibr">[53]</xref>
development. Another approach used SVMs to learn patterns of short DNA sequence motifs that distinguish markers of potential enhancers, such as p300 and H3K4me1, in different cellular contexts
<xref rid="pcbi.1003677-Lee1" ref-type="bibr">[54]</xref>
,
<xref rid="pcbi.1003677-Gorkin1" ref-type="bibr">[55]</xref>
. Random forests have been used to predict p300 binding sites from histone modifications in human embryonic stem cells and lung fibroblasts
<xref rid="pcbi.1003677-Rajagopal1" ref-type="bibr">[56]</xref>
. Machine-learning algorithms have also been applied to the related problem of selecting functional TF binding sites out of the thousands of hits to a TF's binding motif throughout the genome
<xref rid="pcbi.1003677-Lahdesmaki1" ref-type="bibr">[57]</xref>
,
<xref rid="pcbi.1003677-Kantorovitz1" ref-type="bibr">[58]</xref>
,
<xref rid="pcbi.1003677-Won1" ref-type="bibr">[59]</xref>
,
<xref rid="pcbi.1003677-PiqueRegi1" ref-type="bibr">[60]</xref>
,
<xref rid="pcbi.1003677-Arvey1" ref-type="bibr">[61]</xref>
,
<xref rid="pcbi.1003677-CuellarPartida1" ref-type="bibr">[62]</xref>
,
<xref rid="pcbi.1003677-Wang2" ref-type="bibr">[63]</xref>
. Finally, two groups have taken a less supervised approach and used hidden Markov models (ChromHMM)
<xref rid="pcbi.1003677-Ernst1" ref-type="bibr">[64]</xref>
and dynamic Bayesian networks (Segway)
<xref rid="pcbi.1003677-Hoffman1" ref-type="bibr">[65]</xref>
to segment the human genome into regions with unique signatures in ENCODE data and then assigned potential functions, such as enhancer activity, to these states.</p>
<p>While rich datasets coupled with sophisticated algorithms have successfully identified many novel enhancers, comprehensive enhancer prediction is challenging for two main reasons. First, no single type of data is currently sufficient to identify all enhancers active in a given context. Many of the approaches described above use a single mark or motif as a proxy for an enhancer, but this gives an incomplete representation of all biologically active enhancers. Second, while a great deal of functional genomics data are available for different cell lines and tissues, it is not understood how informative experiments in a given cellular context are indicative of enhancer activity in other contexts.</p>
<p>With these issues in mind, we introduce EnhancerFinder, a new two-step machine-learning method for predicting enhancers and their tissue specificity. In machine learning, a classification algorithm is trained to distinguish between labeled training examples (e.g., enhancers and non-enhancers) based on features of these labeled examples (e.g., evolutionary conservation, chromatin signature, DNA sequence). The trained classifier can then be used to predict the labels for uncharacterized genomic regions (e.g., which ones are enhancers). Our approach employs two rounds of a supervised machine-learning technique called multiple kernel learning (MKL)
<xref rid="pcbi.1003677-Sonnenburg1" ref-type="bibr">[66]</xref>
,
<xref rid="pcbi.1003677-Kloft1" ref-type="bibr">[67]</xref>
. MKL is based on the theory of SVMs
<xref rid="pcbi.1003677-Boser1" ref-type="bibr">[68]</xref>
, but provides greater flexibility to combine diverse data (e.g., evolutionary conservation, sequence motifs, and functional genomics data from different cellular contexts) and to interpret their relative contributions to the resulting predictions. Our implementation of EnhancerFinder applies MKL in two steps with the goal of generating a genome-wide set of developmental enhancers to better characterize gene regulation during development. The algorithm, which is trained using
<italic>in vivo</italic>
validated enhancers from the VISTA enhancer database
<xref rid="pcbi.1003677-Visel6" ref-type="bibr">[69]</xref>
and publicly available genomic data, first aims to distinguish human developmental enhancers from the genomic background and then in a second step predicts enhancer tissue specificity. In contrast to most other enhancer prediction strategies, which are trained on epigenetic marks or sequence motifs that serve as a proxy for a subset of all active enhancers, our use of a heterogeneous and
<italic>in vivo</italic>
validated set of enhancers, enables us to investigate the complex suite of features that underlie active regulatory regions. With appropriate training data, EnhancerFinder could be applied to study gene regulation at other developmental stages.</p>
<p>Our analyses demonstrate that EnhancerFinder's integration of diverse types of data from different cellular contexts significantly improves prediction of validated enhancers over approaches based on a single context or type of data. We find that enhancers active in some developmental contexts are easier to identify than others. Applying EnhancerFinder to the entire human genome allowed us to predict more than 80,000 developmental enhancers, with tissue-specific predictions for brain, limb, and heart. These predictions significantly overlap known non-coding regulatory regions and are enriched near relevant genome-wide association study (GWAS) lead single nucleotide polymorphisms (SNPs) and genes expressed in the predicted tissue. To illustrate the utility and accuracy of our genome-wide enhancer predictions, we used them to investigate the enhancer landscape near three developmentally expressed genes. First, we screened predicted enhancers near
<italic>FOXC1</italic>
and
<italic>FOXC2</italic>
in transgenic zebrafish, and found that 70% (7 of 10) of tested EnhancerFinder predictions have confirmed (6) or suggestive (1) developmental enhancer activity. In addition, we validated a novel cranial nerve enhancer near the
<italic>ZEB2</italic>
locus using a transgenic mouse enhancer assay. Taken together, our results suggest that the EnhancerFinder approach of integrating diverse data sets significantly improves prediction of biologically active enhancers, providing high-confidence candidate enhancers for studies in developmental gene regulation.</p>
</sec>
<sec id="s2">
<title>Results</title>
<p>We present EnhancerFinder, a machine learning-based enhancer prediction pipeline that allows the seamless integration of feature data from a variety of experimental techniques and biological contexts that have previously been used individually to predict enhancers (
<xref ref-type="fig" rid="pcbi-1003677-g001">Figure 1</xref>
). We use MKL to integrate these data. MKL algorithms learn a weighted combination of different “kernel” functions that quantify the similarity of different feature data in order to make predictions. In EnhancerFinder, we use three kernels based on different types of biological feature data: DNA sequence motifs, evolutionary conservation patterns, and functional genomics datasets.</p>
<fig id="pcbi-1003677-g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g001</object-id>
<label>Figure 1</label>
<caption>
<title>Overview of the EnhancerFinder enhancer prediction pipeline.</title>
<p>In our two-step approach, regions of the genome are characterized by diverse features, such as their evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence patterns. For each step, appropriate positive (green) and negative (purple) training examples are provided as input to a multiple kernel learning (MKL) algorithm that produces a trained classifier. We used 10-fold cross validation to evaluate the performance of all classifiers. In Step 1, we trained a classifier to distinguish between known developmental enhancers from VISTA and the genomic background. In Step 2, we trained several classifiers to distinguish enhancers active in tissues of interest from those without activity in the tissue according to VISTA. We applied the trained enhancer classifier from Step 1 to the entire human genome to produce more than 80,000 developmental enhancer predictions. We then applied the tissue-specific enhancer classifiers from Step 2 to further refine our predictions.</p>
</caption>
<graphic xlink:href="pcbi.1003677.g001"></graphic>
</fig>
<p>EnhancerFinder could be used to predict enhancers active at any stage and tissue. In this study, we evaluate EnhancerFinder's ability to predict developmental enhancers and their tissue specificity.</p>
<sec id="s2a">
<title>A two-step approach to tissue-specific enhancer prediction</title>
<p>Step 1 of our pipeline aims to distinguish all enhancers active in the context of interest (e.g., a specific developmental stage) from non-enhancer regions. Step 2 then builds classifiers to predict the tissues in which the enhancer candidates from Step 1 are active. This two-step approach allows us to accurately identify enhancers, while also distinguishing their tissues of activity.</p>
<p>We train and evaluate EnhancerFinder using the VISTA Enhancer Browser, which at the time of our analysis contained over 700 human sequences with experimentally validated enhancer activity in at least one tissue at embryonic day 11.5 (E11.5) in transgenic mouse embryos. VISTA also contained a similar number of regions without enhancer activity in this context. E11.5 in mouse development roughly corresponds to E41 (Carnegie stage 17
<xref rid="pcbi.1003677-ORahilly1" ref-type="bibr">[70]</xref>
) in human development. In Step 1 of EnhancerFinder, we used all 711 VISTA enhancers as positive training data, and for negative training data, we created a set of 711 random regions matched to the length and chromosome distribution of the positives to represent the genomic background. We did not use the VISTA negatives as negative training examples in Step 1, because they are not representative of all non-enhancer regions (see below). Our goal in Step 1 is to develop a method that can be used to scan the whole genome and distinguish developmental enhancer regions from non-enhancer regions.</p>
<p>The second step of EnhancerFinder aims to distinguish enhancers active in a given embryonic tissue from non-enhancers and enhancers active in other tissues. We consider all enhancers in VISTA with activity in a tissue of interest as positives and all other regions in VISTA (including regions not active at E11.5) as negatives (see
<xref ref-type="sec" rid="s4">Methods</xref>
). This second step that includes enhancers active in other tissues as negatives in the training proves to be essential for obtaining high specificity in predicting tissue of activity (see below), and it is important to do this in two steps rather attempting to distinguish enhancers of a given tissue from genomic background in one step.</p>
<p>To evaluate EnhancerFinder, we compared it to several commonly used enhancer prediction approaches. Unless otherwise noted, we evaluated the performance of all prediction algorithms using 10-fold cross validation to compute the
<underline>a</underline>
rea
<underline>u</underline>
nder the
<underline>c</underline>
urve (AUC) for
<underline>r</underline>
eceiver
<underline>o</underline>
perating
<underline>c</underline>
haracteristic (ROC) curves. We also computed precision-recall curves (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s001">Figure S1</xref>
) and compared power at a low false positive rate.</p>
</sec>
<sec id="s2b">
<title>Building a general predictor from a biased training set</title>
<p>Because EnhancerFinder learns enhancer signatures from a training data set, we first explored biases in the VISTA enhancers that might affect how well EnhancerFinder could generalize to the whole genome. The genomic regions tested by VISTA were not selected randomly, and thus their positives do not represent a random sample of active enhancers. Nearly all regions tested by VISTA are evolutionarily conserved across mammals (706 of 711 positives and 727 of 736 negatives). Since our goal is to predict a broadly applicable, high confidence set of developmental enhancers, we did not include this feature when making genome wide predictions. However, with this bias in mind, we did evaluate several models that incorporate the degree of evolutionary conservation (see below).</p>
<p>In addition to conservation, several studies deposited in VISTA have considered enhancer-associated proteins and histone marks, such as p300, H3K27ac, and H3K4me1. We collected all data sets of these types from ENCODE and computed their overlap with VISTA enhancers. Fewer than half of the VISTA positives are marked by all three of p300, H3K27ac, and H3K4me1 (from any data set), with substantial percentages marked by only one or two and 13% (93/711) marked by none (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s002">Figure S2</xref>
). These findings indicate that VISTA positives are not highly biased towards a single type of ChIP-seq feature, motivating us to include these features in our genome-wide predictions, with the caveat that the trends we observe for VISTA positives might not generalize to all classes of enhancers. Our analysis also suggests that the standard practice of equating active enhancers with all regions marked by a single ChIP-seq feature, or even the union of overlapping peaks from several ChIP-seq experiments, will fail to identify all active enhancers in a given context.</p>
</sec>
<sec id="s2c">
<title>EnhancerFinder integrates diverse data types to accurately identify developmental enhancers</title>
<p>EnhancerFinder predicts enhancers by integrating classifiers based on distinct data types. In our first evaluation of EnhancerFinder, we consider: functional genomics data, evolutionary conservation patterns, and DNA sequence motifs. Combining these different approaches enables EnhancerFinder to accurately distinguish enhancers from the genomic background (
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2A</xref>
; AUC = 0.96).</p>
<fig id="pcbi-1003677-g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Combining diverse data using EnhancerFinder improves the identification of developmental enhancers.</title>
<p>(A) Enhancer prediction strategies based on functional genomics data, evolutionary conservation, and DNA sequence motif patterns all perform well, but EnhancerFinder, which combines these data, provides significant improvement over each of them alone (p<2.0E-7 for all). (B) Each of the approaches from (A) predicts that somewhat different sets of the VISTA regions are enhancers. This suggests that complementary information is contained in each data source. EnhancerFinder (not shown), which combines them, captures many of the enhancers that are unique to each source; it predicts 25 of the 44 enhancers unique to
<bold>Functional Genomics</bold>
, 30 of the 76 unique to
<bold>DNA Sequence Motifs</bold>
, and 34 of the 111 unique to
<bold>Evolutionary Conservation</bold>
. (C) EnhancerFinder outperforms CLARE, a successful enhancer prediction method based on known regulatory motifs. We also evaluated the enhancer states predicted by ChromHMM and Segway, two unsupervised clustering methods that have been used to segment the genome into different functional states based on patterns in functional genomics data, though these methods were not applied to developmental contexts. The different X's represent state predictions based on data from different ENCODE cell types: GM12878 (blue), H1-hESC (violet), HepG2 (brown), HMEC (tan), HSMM (gray), HUVEC (light green), K562 (green), NHEK (orange), NHLF (light blue), and all contexts combined (red).</p>
</caption>
<graphic xlink:href="pcbi.1003677.g002"></graphic>
</fig>
<p>The functional genomics component of EnhancerFinder (which we refer to as
<bold>All Functional Genomics</bold>
) is a linear SVM that incorporates 2469 datasets generated by the ENCODE project and smaller scale studies. These include DNaseI hypersensitivity data and ChIP-Seq for p300, many histone modifications, and many TFs from many adult and embryonic tissues and cell lines (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s012">Table S1</xref>
). DNA sequence patterns are integrated via a 4-spectrum kernel (
<bold>DNA Motifs</bold>
), which summarizes the occurrence of all length four DNA sequences (4-mers) in input regions
<xref rid="pcbi.1003677-Leslie1" ref-type="bibr">[71]</xref>
. We found that little was gained by increasing
<italic>k</italic>
, considering multiple
<italic>k</italic>
simultaneously, or incorporating knowledge of transcription factor binding site (TFBS) motifs as in a previous approach
<xref rid="pcbi.1003677-Burzynski1" ref-type="bibr">[52]</xref>
(
<xref ref-type="supplementary-material" rid="pcbi.1003677.s003">Figures S3</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s004">S4</xref>
). Finally, evolutionary conservation information is incorporated with a linear SVM that uses mammalian phastCons scores
<xref rid="pcbi.1003677-Siepel1" ref-type="bibr">[72]</xref>
as features (
<bold>Evolutionary Conservation</bold>
).</p>
</sec>
<sec id="s2d">
<title>EnhancerFinder performs significantly better than enhancer prediction approaches based on a single type of data</title>
<p>One motivation for developing EnhancerFinder was to explore whether combining previous successful approaches to enhancer prediction would improve performance. Each of the classifiers combined in EnhancerFinder is representative of a different strategy for predicting enhancers. Thus, we compared the performance of EnhancerFinder to each of its constituents, which are SVMs trained on the same enhancer data as EnhancerFinder, but using only one type of the data features (e.g., only sequence motifs). EnhancerFinder significantly outperformed each of the individual classifiers (
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2A</xref>
; p = 2.0E-7 for
<bold>Evolutionary Conservation</bold>
, p = 2.6E-8 for
<bold>DNA Motifs</bold>
, and p = 4.4E-16 for
<bold>All Functional Genomics</bold>
, McNemar's test), suggesting that these different types of data capture unique aspects of enhancers that are not completely encompassed by any single data type.</p>
<p>Not surprisingly, we found that of the three component classifiers in EnhancerFinder,
<bold>Evolutionary Conservation</bold>
yields the best performance (AUC = 0.93). As noted above, nearly all regions tested for enhancer activity by VISTA (positives and negatives) are evolutionarily conserved compared to the genomic background. Nonetheless, considering additional features significantly improved predictions. The
<bold>DNA Motifs</bold>
(AUC = 0.88) and
<bold>All Functional Genomics</bold>
(AUC = 0.89) classifiers also exhibit strong performance, but also do not perform as well as the combined classifier. EnhancerFinder has nearly twice the power of any of the individual classifiers at a 5% false positive rate (FPR), and its power advantage is even larger at lower FPRs.</p>
<p>
<bold>All Functional Genomics</bold>
,
<bold>DNA Motifs</bold>
, and
<bold>Evolutionary Conservation</bold>
achieve roughly similar performance from different feature data, but each individual classifier predicts a somewhat different set of enhancers during evaluation (
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2B</xref>
). Roughly two-thirds of the enhancer predictions are shared between the three classifiers. The improvement provided by combining these data argues that these data sources are indeed complementary.</p>
<p>We also compared EnhancerFinder's performance with several current computational methods used to identify enhancers. We were able to make the most direct comparison with CLARE, a popular method for identifying enhancers from DNA sequence data, i.e., transcription factor binding site motifs and other sequence patterns
<xref rid="pcbi.1003677-Taher1" ref-type="bibr">[73]</xref>
. This approach, which has been successfully applied in several contexts
<xref rid="pcbi.1003677-Narlikar1" ref-type="bibr">[51]</xref>
,
<xref rid="pcbi.1003677-Burzynski1" ref-type="bibr">[52]</xref>
,
<xref rid="pcbi.1003677-Busser1" ref-type="bibr">[53]</xref>
,
<xref rid="pcbi.1003677-Capra1" ref-type="bibr">[74]</xref>
, makes few assumptions about the input, and is publicly available as a web server. On our Step 1 enhancer prediction task, we find that CLARE achieves an ROC AUC of 0.79. This is much lower than
<bold>DNA Motifs</bold>
(AUC = 0.88), our approach based on sequence data alone, and the full
<bold>EnhancerFinder</bold>
(AUC = 0.96;
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2C</xref>
). At a 5% FPR, the power of CLARE is about 20%, compared to approximately 30% for
<bold>DNA Motifs</bold>
and more than 60% for
<bold>EnhancerFinder</bold>
.</p>
<p>Comparisons with additional methods were complicated by the fact that most were developed in different contexts. We designed EnhancerFinder specifically to predict biologically active developmental enhancers. Most existing approaches focus on data from a single cell line and define enhancers based on specific enhancer-associated marks or proteins (such as p300 in human embryonic stem cells) rather than biological activity. Thus, we did not anticipate that they would perform as well as EnhancerFinder at developmental enhancer prediction. However, since the predictions of these methods are commonly used outside the specific contexts in which they were made, we believe that it is useful to evaluate how well they can identify developmental enhancers and how much the EnhancerFinder approach applied to developmental enhancers improves on their performance.</p>
<p>In particular, we compared EnhancerFinder to ChromHMM and Segway
<xref rid="pcbi.1003677-Ernst1" ref-type="bibr">[64]</xref>
, , two unsupervised machine learning methods for segmenting the genome into a small number of functional “states” based on consistent patterns in ENCODE data for individual cell lines. The states resulting from the segmentations of each cell line's data are annotated by hand into predicted functional classes, which include enhancer activity. To evaluate these methods, we considered the states overlapping our training and testing regions. Any region with an overlapping enhancer state was considered a predicted enhancer and all others were predicted non-enhancers. In this way, we obtained a single point in ROC space for the state predictions. Since there is no score or confidence value associated with the state assignments, a full ROC curve could not be created for these methods.
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2C</xref>
gives the performance for several versions of ChromHMM and Segway based on ENCODE data from different cell lines. Both methods perform better than random, but considerably worse than EnhancerFinder and CLARE (p≈0). We stress that, in contrast to our supervised method, these methods were not explicitly trained to perform the same task as EnhancerFinder, and thus we did not expect them to perform as well as EnhancerFinder. Indeed, these results argue that their utility in identifying developmental enhancers is limited compared to specialized approaches.</p>
</sec>
<sec id="s2e">
<title>Integrating diverse functional genomics data improves enhancer prediction</title>
<p>As illustrated above, our machine learning prediction and evaluation framework enabled us to quantitatively explore the utility of different genomics datasets in enhancer prediction by creating classifiers based on different types of data (i.e., sequence motifs, evolutionary conservation, and functional genomics) and comparing their performance. We also used this framework to investigate other questions about the utility of different subsets of these data for enhancer prediction. For example, one might expect that some of the datasets included in
<bold>All Functional Genomics</bold>
(e.g., experiments in cancer cell lines or adult tissues) would not be as useful as others (e.g., experiments in embryonic tissues) for predicting developmental enhancers, and that limiting the features examined by the classifier to the most relevant experiments might improve performance.</p>
<p>To explore this hypothesis, we trained linear SVM classifiers to predict VISTA enhancers (as in Step 1 of EnhancerFinder) based on different subsets of all the functional genomics features (
<xref ref-type="table" rid="pcbi-1003677-t001">Table 1</xref>
) and compared their performance. First, we considered a collection of 244 datasets from embryonic tissues and cell lines (
<bold>Embryonic Functional Genomics</bold>
). Next, we created a classifier that considers data from a wider range of contexts by training a linear SVM using a large, manually curated set of 509 potentially relevant functional genomics data sets (
<bold>Relevant Functional Genomics</bold>
). This set includes embryonic datasets, along with additional DNaseI and ChIP-seq data from adult tissues and cell lines related to the dominant tissues of activity in VISTA. For example, we included data from human cardiac myocytes, since there are many developmental heart enhancers in our training examples. We compared these to the
<bold>All Functional Genomics</bold>
classifier described above that uses all 2496 functional genomics features.</p>
<table-wrap id="pcbi-1003677-t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.t001</object-id>
<label>Table 1</label>
<caption>
<title>Performance (ROC AUC) of classifiers on each tissue-specific enhancer prediction task (Step 2).</title>
</caption>
<alternatives>
<graphic id="pcbi-1003677-t001-1" xlink:href="pcbi.1003677.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Heart</td>
<td align="left" rowspan="1" colspan="1">Limb</td>
<td align="left" rowspan="1" colspan="1">Forebrain</td>
<td align="left" rowspan="1" colspan="1">Midbrain</td>
<td align="left" rowspan="1" colspan="1">Hindbrain</td>
<td align="left" rowspan="1" colspan="1">Neural Tube</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Evolutionary Conservation</td>
<td align="left" rowspan="1" colspan="1">0.78</td>
<td align="left" rowspan="1" colspan="1">0.58</td>
<td align="left" rowspan="1" colspan="1">0.52</td>
<td align="left" rowspan="1" colspan="1">0.54</td>
<td align="left" rowspan="1" colspan="1">0.53</td>
<td align="left" rowspan="1" colspan="1">0.52</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">DNA Motifs</td>
<td align="left" rowspan="1" colspan="1">0.83</td>
<td align="left" rowspan="1" colspan="1">0.64</td>
<td align="left" rowspan="1" colspan="1">0.66</td>
<td align="left" rowspan="1" colspan="1">0.63</td>
<td align="left" rowspan="1" colspan="1">0.62</td>
<td align="left" rowspan="1" colspan="1">0.60</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Functional Genomics</td>
<td align="left" rowspan="1" colspan="1">0.86</td>
<td align="left" rowspan="1" colspan="1">0.74</td>
<td align="left" rowspan="1" colspan="1">0.72</td>
<td align="left" rowspan="1" colspan="1">0.72</td>
<td align="left" rowspan="1" colspan="1">0.69</td>
<td align="left" rowspan="1" colspan="1">0.62</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Enhancer Finder</td>
<td align="left" rowspan="1" colspan="1">0.85</td>
<td align="left" rowspan="1" colspan="1">0.74</td>
<td align="left" rowspan="1" colspan="1">0.72</td>
<td align="left" rowspan="1" colspan="1">0.72</td>
<td align="left" rowspan="1" colspan="1">0.69</td>
<td align="left" rowspan="1" colspan="1">0.62</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>
<bold>All Functional Genomics</bold>
(AUC = 0.89) performed slightly, but not significantly, better than
<bold>Relevant Functional Genomics</bold>
(AUC = 0.87; p = 0.16), and both significantly outperformed
<bold>Embryonic Functional Genomics</bold>
(AUC = 0.83; p = 9.2E-9 and p = 2.7E-6, respectively) (
<xref ref-type="fig" rid="pcbi-1003677-g003">Figure 3A</xref>
). At low FPRs, the differences in power between these classifiers were modest. The
<bold>Embryonic Functional Genomics</bold>
classifier included the most time-appropriate datasets, yet its performance was improved by including additional data sets that seem less relevant to our classification problem
<italic>a priori</italic>
. Thus, we conclude that it can be advantageous to consider a range of functional genomics features, especially when few features are available from the context of interest. The utility of these additional data sets might indicate that some enhancer features are stable across cell types and developmental stages, but it could also reflect information these data provide about genomic regions that are
<italic>not</italic>
active enhancers during development (see
<xref ref-type="sec" rid="s3">Discussion</xref>
).</p>
<fig id="pcbi-1003677-g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g003</object-id>
<label>Figure 3</label>
<caption>
<title>Integrating diverse functional genomics data improves enhancer prediction.</title>
<p>(A) Considering functional genomics features from contexts and assays not directly associated with developmental enhancer activity (
<bold>All Functional Genomics</bold>
and
<bold>Relevant Functional Genomics</bold>
) improves the identification of developmental enhancers (p = 9.2E-9 and p = 2.7E-6, respectively, compared to
<bold>Embryonic Functional Genomics</bold>
only). (B) Combining available H3K4me1, p300, and H3K27ac data, which are commonly used in isolation to identify enhancers, in a linear SVM (
<bold>Basic Functional Genomics</bold>
) is better able to distinguish known developmental enhancers from the genomic background than considering each type of data alone (p<2E-7, for each). However, combining these marks still performs significantly worse than EnhancerFinder (
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2A</xref>
; AUC = 0.96) and considering additional data as in (A).</p>
</caption>
<graphic xlink:href="pcbi.1003677.g003"></graphic>
</fig>
</sec>
<sec id="s2f">
<title>Histone marks and p300 provide complementary information about enhancer activity</title>
<p>We also explored the utility of individual functional genomics datasets that are often used as proxies for developmental enhancers by creating three linear SVM classifiers:
<bold>H3K27ac</bold>
,
<bold>H3K4me1</bold>
, and
<bold>p300</bold>
. These SVMs were trained to distinguish VISTA positives from the genomic background (Step 1) using all available data of the specified type from ENCODE, which include a range of cell types and tissues (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s012">Table S1</xref>
). All three classifiers performed better than random (
<xref ref-type="fig" rid="pcbi-1003677-g003">Figure 3B</xref>
).
<bold>H3K4me1</bold>
(AUC = 0.72) and
<bold>p300</bold>
(AUC = 0.68) performed similarly (p = 0.25), with
<bold>p300</bold>
performing best at low FPRs and
<bold>H3K4me1</bold>
best at higher FPRs. Both significantly outperformed
<bold>H3K27ac</bold>
(AUC = 0.61; p = 9.4E-15 and p = 5.5E-9, respectively); however, we caution against extrapolating from this comparison, since it may reflect biases in the feature sets available and the VISTA positives. Since combinations of these features are often used to predict enhancers, we next trained a linear SVM classifier (
<bold>Basic Functional Genomics</bold>
) that includes all three data types together. The combined classifier significantly outperforms all the individual classifiers (AUC = 0.77; p<2E-7 for each), suggesting that each data type contributes unique information about enhancer activity. Also, all four SVM classifiers achieved much better performance than the common approach of simply considering regions overlapping with these data (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s005">Figure S5</xref>
).</p>
<p>EnhancerFinder also learns weights for individual features within classifiers that reflect their contribution to the enhancer predictions. We found that features known to be associated with enhancer activity in relevant cellular contexts generally receive positive weights, while those associated with other types of elements received negative weights (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s019">Text S1</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s006">Figure S6</xref>
).</p>
</sec>
<sec id="s2g">
<title>EnhancerFinder's two-step approach enables tissue-specific enhancer prediction</title>
<p>In the previous sections, we focused on generic developmental enhancer prediction (Step 1 of EnhancerFinder). Step 2 of EnhancerFinder applies a second round of MKL to refine and further annotate predicted enhancers from Step 1 (
<xref ref-type="fig" rid="pcbi-1003677-g001">Figure 1</xref>
). In this study, Step 2 consists of training an MKL classifier to distinguish VISTA enhancers active in a given tissue from VISTA regions without activity in that tissue, i.e., non-enhancers from VISTA plus enhancers for other tissues. We did not require that the positive training examples be active
<italic>only</italic>
in the tissue of interest. Using the same feature data as in Step 1, we created tissue-specific classifiers for all tissues with more than 50 examples in VISTA: forebrain, midbrain, hindbrain, heart, limb, and neural tube.</p>
<p>The performance of EnhancerFinder's tissue specificity predictions varied dramatically between tissues (
<xref ref-type="fig" rid="pcbi-1003677-g004">Figure 4</xref>
), with the best performance for heart (AUC = 0.85), followed by limb (AUC = 0.74), forebrain (AUC = 0.72), midbrain (AUC = 0.72), hindbrain (AUC = 0.69), and neural tube (AUC = 0.62), which was the worst of the tested tissue classifiers, but better than random. We combined all brain enhancers into one class, and the performance of this generic brain classifier was similar to that of the more specific brain classifiers (AUC = 0.73). The EnhancerFinder tissue-specific classifiers trained with all data types performed well for most tissues (
<xref ref-type="table" rid="pcbi-1003677-t001">Table 1</xref>
); however, classifiers based on functional genomics alone often performed as well as the full EnhancerFinder classifier, suggesting functional genomics data are more informative about developmental enhancer tissue specificity than degree of conservation or sequence motifs.</p>
<fig id="pcbi-1003677-g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g004</object-id>
<label>Figure 4</label>
<caption>
<title>Enhancers of heart expression are easier to identify than enhancers active in other tissues at E11.5.</title>
<p>(A) In Step 2 of our prediction pipeline, we trained EnhancerFinder using the same features as in Step 1 (
<xref ref-type="fig" rid="pcbi-1003677-g001">Figure 1</xref>
), but using VISTA enhancers active in a given tissue as positives and tested regions that did not show activity in the tissue as negatives. Heart enhancers were dramatically easier to distinguish from other enhancers than enhancers of expression in other tissues. The heart enhancers have significantly higher GC content than other enhancers and the genomic background. This and several other unique attributes may explain the ease of identifying them (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s007">Figures S7</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s008">S8</xref>
). In general, functional genomics data are the most informative data type for predicting enhancer tissue specificity (
<xref ref-type="table" rid="pcbi-1003677-t001">Table 1</xref>
).</p>
</caption>
<graphic xlink:href="pcbi.1003677.g004"></graphic>
</fig>
<p>Most previous efforts to predict tissue-specific enhancers have performed a single training step using enhancers or enhancer marks present in the tissue of interest as positives and non-enhancer regions or the genomic background as negatives. To test whether our two-step method improves upon these previous approaches, we trained one-step MKL tissue-specific classifiers and compared their predicted tissue distributions to those of validated enhancers from the VISTA database (
<xref ref-type="fig" rid="pcbi-1003677-g005">Figure 5A</xref>
). First, we trained a set of tissue-specific classifiers using enhancers active in each tissue as positives and the genomic background as negatives. These classifiers predict very similar sets of enhancers regardless of the target tissue; and they vastly overestimate the number of enhancers that are active in multiple tissues (95% of predictions versus 8% of VISTA) and the number of true enhancers of each tissue (
<xref ref-type="fig" rid="pcbi-1003677-g005">Figure 5B</xref>
). In contrast, classifiers trained as in Step 2 of EnhancerFinder, i.e., using tissue-specific enhancers as positives and a mix of enhancers active in other tissues and regions with no activity in VISTA as negatives, show much greater tissue-specificity in their predictions (76%) and a similar amount of overlap as among known enhancers (
<xref ref-type="fig" rid="pcbi-1003677-g005">Figure 5C</xref>
).</p>
<fig id="pcbi-1003677-g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g005</object-id>
<label>Figure 5</label>
<caption>
<title>EnhancerFinder's two-step approach captures tissue-specific attributes of enhancers.</title>
<p>(A) The true overlap of human enhancers of brain, heart, and limb in the VISTA database. The vast majority of characterized enhancers are unique to one of these tissues at this stage. For example, of the 84 validated heart enhancers, 71 are unique to heart, five are shared with brain, four with limb, and four with both. (B) The predicted overlap of VISTA enhancers based on predictions made with a single training step using MKL with only enhancers of that tissue considered positives and the genomic background as negatives. This approach overestimates the number of enhancers active in multiple tissues. Each classifier mainly learns general attributes of enhancers, rather than tissue-specific attributes. (C) The predicted overlap based on EnhancerFinder's two-step approach. These predictions are much more tissue-specific and exhibit overlaps between tissues similar to the true values (A). Predicted tissue distributions are similar when the methods are applied to other genomic regions, as illustrated in our genome-wide predictions, but only predictions on VISTA enhancers are shown here to enable comparisons to the distribution for validated enhancers (A).</p>
</caption>
<graphic xlink:href="pcbi.1003677.g005"></graphic>
</fig>
</sec>
<sec id="s2h">
<title>Heart enhancers are easier to identify due to several unique attributes</title>
<p>The relative ease of identifying heart enhancers is likely due to several unique characteristics. Known heart enhancers at E11.5 are more evolutionarily conserved than genomic background, but significantly less conserved than enhancers in other tissues
<xref rid="pcbi.1003677-Blow1" ref-type="bibr">[39]</xref>
,
<xref rid="pcbi.1003677-May1" ref-type="bibr">[41]</xref>
. In addition, we observed that heart enhancers at this developmental stage are uniquely close to the nearest transcription start site (TSS) (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s007">Figure S7</xref>
). These two patterns are consistent with a recent study of mouse enhancers from different developmental stages
<xref rid="pcbi.1003677-Nord1" ref-type="bibr">[75]</xref>
. Finally, we observed that E11.5 heart enhancers have an unusually high GC content (49%) compared to enhancers of other tissues at E11.5 (∼40%). A simple classifier based solely on the GC content of a region performs nearly as well as our full classifier for heart enhancers (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s008">Figure S8</xref>
). In contrast, sequence-based classifiers do not perform well on the other tissues whose enhancer GC content is not significantly different from the genomic background (
<xref ref-type="table" rid="pcbi-1003677-t001">Table 1</xref>
). The high GC content of heart enhancers is not due to overlap with CpG islands. Only about 4% of VISTA enhancers overlap with a CpG island, and this number is consistent across tissues. We also did not find enrichment for any known GC-rich transcription factor binding site motifs in VISTA heart enhancers. We do see, however, that repeat regions in heart enhancers are depleted for the very AT-rich repeats seen in other enhancers, and that most of the repeat regions in heart enhancers are 40–60% GC. Our results suggest the possible existence of unknown GC-rich motifs that may be important for gene regulation in the cardiac lineage.</p>
<p>The heart classifier based on functional genomics data alone exhibits strong performance compared to other tissue-specific classifiers as well (
<xref ref-type="table" rid="pcbi-1003677-t001">Table 1</xref>
). It is possible that this is due to the presence of feature data from contexts more relevant to developmental heart activity than to other tissues, rather than unique attributes of the heart enhancers themselves. Indeed, the highest weighted features in the heart functional genomics classifier come from heart tissues. However, the performance of the heart classifier based only on functional genomics data does not decrease substantially when we exclude data from the most relevant contexts: embryonic heart tissue, adult hearts, and stages of a directed differentiation of stem cells into cardiomyocytes (ROC AUC = 0.85). Thus, it is possible that feature data from less obviously relevant contexts are more informative about heart activity than for other tissues. We suspect that the ease of distinguishing heart enhancers may be due to the earlier development of the heart compared to other tissues (see
<xref ref-type="sec" rid="s3">Discussion</xref>
).</p>
</sec>
<sec id="s2i">
<title>We predict more than 80,000 developmental enhancers across the human genome</title>
<p>One of the main motivations for developing algorithms that can distinguish active enhancers is to apply them to unannotated genomic regions to aid the exploration and interpretation of the gene regulatory landscape of the human genome (
<xref ref-type="fig" rid="pcbi-1003677-g001">Figure 1</xref>
). To produce a genome-wide set of candidate developmental enhancers, we divided the genome into 1.5 kb blocks overlapping one another by 500 bp and applied Step 1 of EnhancerFinder to each of these regions. EnhancerFinder produces a score for each region; positive scores indicate membership in the positive set (enhancers), and negative scores indicate membership in the negative set (non-enhancers). To focus on high confidence predictions in this genome-wide analysis, we used the cross-validation-based evaluation described above to find a 5% FPR score threshold, and only considered regions exceeding this threshold. After merging overlapping positive predictions, we identified 84,301 developmental enhancers across the human genome with median length of 1,500 bp and total genome coverage of 183,695,500 bp (5.86%).</p>
<p>The 5% FPR threshold we used corresponds to a 65% true positive rate (TPR). To calculate the false discovery rate (FDR), we must estimate the unknown fraction of 1.5 kb blocks of the human genome that harbor developmental enhancer regions. If this fraction were as high as 50%, a 5% FPR would correspond to a 9% FDR. If instead we estimate that 10% of 1.5 kb windows contain a developmental enhancer, we see an FDR of 47% at a 5% FPR. While this may seem high, our recent analysis of predicted enhancers with human-specific substitution rate acceleration found a lower failure rate at E11.5 (17%, 5/29)
<xref rid="pcbi.1003677-Capra1" ref-type="bibr">[74]</xref>
, and only three of ten tested predictions did not validate with confirmed or suggestive activity in our zebrafish assay (see below). This suggests that the FDR may be lower in experimental applications, especially when predicted enhancer regions are analyzed in the context of other relevant data. However, to accurately measure the true FDR would require experimental testing of a very large, random set of EnhancerFinder predictions, which is beyond the scope of this study.</p>
<p>In our genome-wide analysis, we used the smaller
<bold>Relevant Functional Genomics</bold>
data set in order to reduce the computational time required. We also did not include evolutionary conservation data, because the positives in our training data are almost universally conserved. While most enhancers likely exhibit some evolutionary conservation, this extremely high fraction is likely due to bias in the selection of the tested regions in VISTA and could reduce our ability to detect less highly conserved novel enhancers genome-wide (see
<xref ref-type="sec" rid="s3">Discussion</xref>
). The resulting conservation-free classifier still performed extremely well in cross validation (AUC = 0.92). Supporting this approach, non-conserved regions make up over 20% of our genome-wide enhancer predictions. As noted above, we did not observe any other dramatic biases in the feature data associated with human VISTA enhancers.</p>
<p>Next, we applied Step 2 of EnhancerFinder to all enhancer regions predicted in Step 1. We focused on brain, limb, and heart, because these tissues are highly represented in VISTA and have been extensively studied in previous analyses of developmental enhancers. We predicted 7,400 limb enhancers, 19,051 heart enhancers, and 11,693 brain enhancers (
<xref ref-type="fig" rid="pcbi-1003677-g006">Figure 6</xref>
) at a 5% FPR threshold tuned separately for each tissue. Since EnhancerFinder makes predictions for each tissue independently, there are no constraints on the distribution of tissues in the resulting genome-wide predictions. Nonetheless, we find a high level of tissue-specificity; nearly 90% of the limb, heart, and brain enhancers are predicted to be active in just one of the three tissues.</p>
<fig id="pcbi-1003677-g006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g006</object-id>
<label>Figure 6</label>
<caption>
<title>Predicted tissue-specific enhancers exhibit tissue-specific characteristics.</title>
<p>EnhancerFinder identifies thousands of novel high-confidence (FPR<0.05) heart, brain, and limb enhancers. These enhancers are enriched for tissue-specific GO Biological Processes. The five most enriched GO Biological Processes among genes near each enhancer set (as calculated using GREAT) are listed in the colored boxes. Nearly 90% of EnhancerFinder predicted heart, brain, and limb enhancers are unique to a single tissue. The larger number of high-confidence heart enhancers relative to brain and limb enhancers is the result of the superior performance of the heart classifier.</p>
</caption>
<graphic xlink:href="pcbi.1003677.g006"></graphic>
</fig>
<p>All genome-wide enhancer predictions are available as tracks for import into the UCSC Genome Browser (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s018">Data File S1</xref>
). These lists of high-confidence tissue-specific enhancers should not be viewed as exhaustive; we found thousands of regions with positive, but less significant scores from Step 2 of EnhancerFinder.</p>
</sec>
<sec id="s2j">
<title>Predicted enhancers are associated with relevant functional genomic regions</title>
<p>To characterize and further validate our genome-wide enhancer predictions, we examined their genomic distribution with respect to several independent indicators of function (details in
<xref ref-type="supplementary-material" rid="pcbi.1003677.s019">Text S1</xref>
). Genes near brain and heart enhancers are enriched for expression in relevant tissues (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s013">Tables S2</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s014">S3</xref>
). Similarly, Gene Ontology (GO) Biological Process enrichment analyses of nearby genes suggest that our predicted developmental enhancers target genes that function in relevant cell types and tissues (
<xref ref-type="fig" rid="pcbi-1003677-g006">Figure 6</xref>
). The most prevalent transcription factor binding site motifs found in the sequences of predicted enhancers differed between enhancers of different tissues and included many relevant developmental TFs (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s015">Table S4</xref>
). Finally, our predicted enhancers contain 676 lead SNPs associated with significant effects in GWAS (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s016">Table S5</xref>
); this is significantly more than expected at random (permutation p<0.001).</p>
<p>Taken together, these analyses suggest that EnhancerFinder identifies many active regulatory regions that contain functionally relevant variation. Our tissue-specific enhancer predictions give valuable annotations to thousands of non-coding regions of the human genome that had not previously been linked to developmental regulation. For example, thousands of SNPs associated with disease by GWAS are in non-coding regions with limited functional annotations
<xref rid="pcbi.1003677-Hindorff1" ref-type="bibr">[76]</xref>
. Our genome-wide enhancer predictions provide a resource for exploring the mechanisms and functional effects of these uncharacterized GWAS hits.</p>
</sec>
<sec id="s2k">
<title>EnhancerFinder predictions function as enhancers in the developing embryo</title>
<p>To demonstrate that genome-wide EnhancerFinder predictions can facilitate the discovery of functional regulatory elements, we present two case studies in which we identify and validate novel enhancers near genes active during development.</p>
<sec id="s2k1">
<title>EnhancerFinder identifies many novel enhancers near
<italic>FOXC1</italic>
and
<italic>FOXC2</italic>
</title>
<p>To evaluate several EnhancerFinder predictions, we took advantage of a transgenic enhancer assay in embryonic zebrafish (
<xref ref-type="sec" rid="s4">Methods</xref>
). We tested enhancer activity of ten predicted human enhancers near
<italic>FOXC1</italic>
and
<italic>FOXC2</italic>
, two forkhead box TFs. The mouse homologs
<italic>Foxc1</italic>
and
<italic>Foxc2</italic>
have been studied extensively and have been shown to be required for proper embryonic development;
<italic>Foxc1</italic>
null and
<italic>Foxc2</italic>
null mutants are pre- or perinatal lethal
<xref rid="pcbi.1003677-Kume1" ref-type="bibr">[77]</xref>
,
<xref rid="pcbi.1003677-Kume2" ref-type="bibr">[78]</xref>
,
<xref rid="pcbi.1003677-Maiese1" ref-type="bibr">[79]</xref>
. In humans, complete lack of
<italic>FOXC1</italic>
is also typically pre- or perinatal lethal, and deletions near and point mutations in
<italic>FOXC1</italic>
contribute to eye and brain development disorders
<xref rid="pcbi.1003677-Smith1" ref-type="bibr">[80]</xref>
,
<xref rid="pcbi.1003677-Aldinger1" ref-type="bibr">[81]</xref>
.
<xref ref-type="fig" rid="pcbi-1003677-g007">Figure 7</xref>
shows the genomic context of
<italic>FOXC2</italic>
, along with the candidate enhancers that we tested (
<italic>FOXC2</italic>
Enhancer Candidates, or F2ECs).
<italic>FOXC1</italic>
results are shown in Supplementary
<xref ref-type="supplementary-material" rid="pcbi.1003677.s010">Figure S10</xref>
(
<italic>FOXC1</italic>
Enhancer Candidates, or F1ECs). Six of the ten predicted human enhancer sequences showed consistent enhancer activity in zebrafish at 24 or 48 hours post fertilization (hpf) (F1EC-1, F1EC-6, F2EC-1, F2EC-2, F2EC-3, and F2EC-4). One additional candidate enhancer (F1EC-3) showed suggestive enhancer activity. EnhancerFinder predicted tissue specificity for eight of the ten candidate enhancers, and we saw the predicted expression pattern confirmed for just one candidate enhancer (F2EC-3, predicted heart enhancer), and suggestive expression for another (F1EC-6, predicted heart enhancer). However, it is difficult to interpret this result, since the tested stages (24 and 48 hpf) do not directly correspond to single stages of mammalian development, and some of the studied tissues are not homologous. Also, since we tested predicted human enhancer sequences in zebrafish, it is possible that differences in developmental regulation between human and fish contributed to this result.</p>
<fig id="pcbi-1003677-g007" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g007</object-id>
<label>Figure 7</label>
<caption>
<title>Four novel developmental enhancers near
<italic>FOXC2</italic>
.</title>
<p>This UCSC Genome Browser (
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu">http://genome.ucsc.edu</ext-link>
) snapshot shows the genomic context of four candidate human enhancers tested in transgenic zebrafish. For each enhancer, we show a zebrafish image that is representative of the reproducible expression patterns.
<italic>FOXC2</italic>
Enhancer Candidate 1 (F2EC-1) drives expression at 48 hpf in the eye and epidermis (arrows). F2EC-2 shows expression at 24 hpf in the forebrain, midbrain, and nerve. F2EC-3 drives expression at 48 hpf in the epidermis and heart. F2EC-4 shows expression at 48 hpf in the notochord, spinal cord, and heart. See
<xref ref-type="supplementary-material" rid="pcbi.1003677.s017">Table S6</xref>
for full list of expressed tissues seen in each candidate enhancer and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s010">Figure S10</xref>
for results on candidate enhancers near
<italic>FOXC1</italic>
.</p>
</caption>
<graphic xlink:href="pcbi.1003677.g007"></graphic>
</fig>
</sec>
<sec id="s2k2">
<title>EnhancerFinder predictions highlight a novel enhancer near
<italic>ZEB2</italic>
</title>
<p>Next, we sought to investigate a novel enhancer prediction in a mammalian system. We selected the locus containing
<italic>ZEB2</italic>
, a zinc finger E-box-binding homeobox-2 TF, which has many roles throughout embryonic and postnatal development, in particular in cortical neurogenesis
<xref rid="pcbi.1003677-Seuntjens1" ref-type="bibr">[82]</xref>
,
<xref rid="pcbi.1003677-Miquelajauregui1" ref-type="bibr">[83]</xref>
,
<xref rid="pcbi.1003677-Weng1" ref-type="bibr">[84]</xref>
,
<xref rid="pcbi.1003677-Renthal1" ref-type="bibr">[85]</xref>
. Mutations in
<italic>ZEB2</italic>
are associated with Mowat-Wilson syndrome, a complex developmental disorder
<xref rid="pcbi.1003677-Wilson1" ref-type="bibr">[86]</xref>
. However, relatively little is known about the genetic mechanisms that orchestrate
<italic>ZEB2</italic>
's expression. A long-range enhancer of postnatal expression in developing kidney cells (E1 in
<xref ref-type="fig" rid="pcbi-1003677-g008">Figure 8</xref>
) was recently discovered 1.2 megabases (Mb) downstream of
<italic>ZEB2</italic>
in the adjacent gene desert
<xref rid="pcbi.1003677-ElKasti1" ref-type="bibr">[87]</xref>
. Since this enhancer does not fully recapitulate the expression timing and domains of
<italic>ZEB2</italic>
, the authors speculated that the gene has many other, potentially long-range, enhancers. Supporting this theory, there are two validated E11.5 brain enhancers near
<italic>ZEB2</italic>
in the VISTA Enhancer Browser (
<xref ref-type="fig" rid="pcbi-1003677-g008">Figure 8</xref>
, VISTA hs407 and VISTA hs1802). Finally, there is an enrichment of human accelerated regions (HARs)
<xref rid="pcbi.1003677-Pollard1" ref-type="bibr">[88]</xref>
,
<xref rid="pcbi.1003677-LindbladToh1" ref-type="bibr">[89]</xref>
near
<italic>ZEB2</italic>
, suggesting that it may have human-specific regulatory patterns.</p>
<fig id="pcbi-1003677-g008" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1003677.g008</object-id>
<label>Figure 8</label>
<caption>
<title>A novel cranial nerve enhancer in the
<italic>ZEB2</italic>
locus.</title>
<p>This UCSC Genome Browser snapshot shows a dense region of predicted enhancers in a 1.5
<italic>ZEB2</italic>
and part of the adjacent gene desert. Tracks give the locations of four human accelerated regions (HARs), two validated VISTA enhancers (hs407 and hs1802), and the E1 region recently shown to have postnatal enhancer activity
<xref rid="pcbi.1003677-ElKasti1" ref-type="bibr">[87]</xref>
. The inset shows a zoomed in view of
<italic>ZEB2</italic>
(hg19.chr2:145,100,000–145,425,000) along with summaries of several ENCODE functional genomics datasets and evolutionary conservation across placental mammals. We tested the predicted enhancer overlapping 2xHAR.240 for enhancer activity at E11.5 in transgenic mice. Both the human and chimp versions of this sequence drive consistent expression in the cranial nerve (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s011">Figure S11</xref>
).</p>
</caption>
<graphic xlink:href="pcbi.1003677.g008"></graphic>
</fig>
<p>Our EnhancerFinder predictions support the existence of a rich regulatory program specified in the non-coding sequence nearby
<italic>ZEB2</italic>
; there are 54 predicted enhancers for which it is the nearest TSS. This puts
<italic>ZEB2</italic>
in the top 0.2% of all genes with respect to the number of adjacent enhancer predictions. Supporting the validity of our predictions, the known VISTA enhancers both overlap EnhancerFinder predicted enhancers, while the regions known to be inactive or active at later postnatal developmental stages (E1)
<xref rid="pcbi.1003677-ElKasti1" ref-type="bibr">[87]</xref>
do not</p>
<p>We selected an EnhancerFinder predicted enhancer (indicated in the zoomed pane of
<xref ref-type="fig" rid="pcbi-1003677-g008">Figure 8</xref>
) for further experimental analysis due to its high EnhancerFinder score and overlap with a HAR (2xHAR.240). We interrogated the potential of the human and chimp sequences at this region to drive gene expression at E11.5 in transient transgenic mouse embryos. All seven embryos with staining showed cranial nerve expression (
<xref ref-type="fig" rid="pcbi-1003677-g008">Figure 8</xref>
red box;
<xref ref-type="supplementary-material" rid="pcbi.1003677.s011">Figure S11</xref>
), regardless of whether the construct contained the human or chimp sequence. Thus, we have identified a novel enhancer within the
<italic>ZEB2</italic>
locus that overlaps one of its expression domains; however, whether this enhancer targets
<italic>ZEB2</italic>
remains to be proven.</p>
<p>This is not the only HAR enhancer validated to date. In a recent publication, we showed that many HARs function as developmental enhancers
<xref rid="pcbi.1003677-Capra2" ref-type="bibr">[90]</xref>
. In that study, we experimentally tested 29 HARs that EnhancerFinder predicts to function as developmental enhancers, and found, in agreement with the cross-validation and zebrafish experimental validation rates here, that 24 of the regions (83%) show positive enhancer activity at E11.5. In addition, one EnhancerFinder negative showed no enhancer activity.</p>
<p>While none of the enhancer predictions tested so far were randomly selected, our results suggest that EnhancerFinder is a powerful tool for accurately characterizing developmental regulatory potential in many useful contexts. Our enhancer predictions highlight many additional candidates for further investigation, and we believe that they will enable similar analyses of the regulatory potential of many other genes and regions of interest.</p>
</sec>
</sec>
</sec>
<sec id="s3">
<title>Discussion</title>
<p>In this study, we developed EnhancerFinder, a new machine-learning framework for predicting regulatory enhancers from diverse data sources. In contrast to most previous enhancer identification strategies, which have based their predictions on one or a small number of data types, EnhancerFinder enables us to flexibly integrate the large and continually expanding collection of evolutionary, DNA sequence, and functional genomics data that are informative about enhancer function. Our analysis of the EnhancerFinder algorithm and its predictions makes three major contributions. First, we demonstrate that integrating diverse types of data from many cellular contexts, including some unexpected ones, can accurately predict
<italic>in vivo</italic>
validated developmental enhancers. Second, we show that a two-step approach in which enhancer tissue-specificity is individually evaluated after general enhancer prediction improves the identification of enhancers' tissues of activity. Finally, our genome-wide developmental enhancer annotations, including tissue-specific predictions for heart, brain, and limb, assign novel functions in development to thousands of genomic regions. We show that these predictions are enriched for a number of independent indicators of regulatory functions. As a result, we expect our predictions to prove useful in the annotation of non-coding genomic regions, as illustrated in the identification of novel enhancers near
<italic>ZEB2</italic>
,
<italic>FOXC1</italic>
, and
<italic>FOXC2</italic>
. Our genome-wide predictions are freely available as a UCSC Genome Browser track.</p>
<sec id="s3a">
<title>A biologically active
<italic>in vivo</italic>
definition of “enhancer”</title>
<p>We chose to define developmental enhancers for training as genomic regions that are experimentally shown to activate gene expression
<italic>in vivo</italic>
in embryonic mouse assays. We believe that this definition is better suited to identifying regions for further exploration and experimental characterization than approaches based on single data sources, such as p300, H3K4me1, or H3K27ac, associated with enhancers in individual cell lines. We showed that our predicted enhancers, based on this biologically active definition, significantly overlap data sets commonly used as proxies for enhancer activity, such as H3K27ac and p300 binding. However, these other data alone are not sufficient to identify all enhancers, as we demonstrated for H3K27ac, H3K4me1, and p300 in
<xref ref-type="fig" rid="pcbi-1003677-g003">Figure 3B</xref>
. Similarly, when we evaluated the ability of other computational methods to identify enhancers, we find that they perform better than random, but that EnhancerFinder significantly outperforms them at identifying biologically active developmental enhancers. This is not surprising given the different contexts in which some enhancer predictions, such as those from ChromHMM and Segway, were developed.</p>
<p>While EnhancerFinder could be used to predict enhancers in well-characterized cell lines, it is particularly useful at identifying enhancers in complex tissues that contain multiple cell types and in cell types that do not have much specific functional genomics data available. Other computational approaches to enhancer prediction have focused on identifying enhancers in individual cell types using functional genomics data from the same cells
<xref rid="pcbi.1003677-Rajagopal1" ref-type="bibr">[56]</xref>
or using the differences in cell type specific transcription factor binding to identify cell-type specific binding motifs
<xref rid="pcbi.1003677-Arvey1" ref-type="bibr">[61]</xref>
. These methods generally perform well, but they do not address enhancer prediction in cell types with little or no functional genomics data, or in tissues that contain multiple cell types.</p>
</sec>
<sec id="s3b">
<title>Why do seemingly irrelevant data improve our enhancer predictions?</title>
<p>Data such as p300 binding sites and H3K4me1 have been used in previous studies to identify enhancers, and these data are major contributors to our enhancer predictions. However, data from other sources and contexts less directly associated with enhancer activity provide complementary information that improves our predictions. Some of these data may be negatively correlated with enhancer activity, allowing EnhancerFinder to learn what features distinguish regions that are not developmental enhancers. Others may help reinforce patterns present in data from more relevant contexts, reflecting some degree of stability in the features of enhancer regions across developmental stages and cell types. For example, we found that features measured in embryonic stem cells are quite useful for E11.5 enhancer prediction; their removal from the classifier degrades performance and/or they have large (positive or negative) MKL weights. Examination of these features suggests that some identify “poised” regions that will become active enhancers upon differentiation, while others seem to help distinguish stem cell enhancers (i.e., non-enhancers at E11.5) from those specific to differentiated lineages. We note that despite these interesting observations, most individual functional genomics features do not carry a great deal of information and the power of EnhancerFinder comes from the integration of different types of data. It is also possible that as a more complete experimental characterization of chromatin state and protein-DNA binding from E11.5 tissues is obtained, data from less relevant contexts will not provide as much improvement as it did in this study.</p>
</sec>
<sec id="s3c">
<title>What data are most informative about enhancer activity?</title>
<p>We focused on a single developmental stage with a large number of validated enhancers. To efficiently extend enhancer detection and validation to new contexts, it will be very important to select the most informative data to collect. Even though the ENCODE project has produced an impressive amount of data, it still has not extensively assayed most contexts of interest to researchers, in particular developmental biologists. The performance of classifiers trained on subsets of all our data and the weights we learned for feature sets and individual features provide some guidance for future experiments. Evolutionary conservation and DNA sequence patterns are broadly useful in the identification of enhancers, but our results suggest that adding functional genomics data is necessary to make more precise predictions about the contexts of activity. H3K4me1 and p300 are two of the most useful functional genomics data types overall (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s006">Figure S6</xref>
), but many others are useful in particular contexts. However, the non-random sampling of functional genomics data and enhancers makes definitively determining the relative utility of different data types challenging.</p>
</sec>
<sec id="s3d">
<title>Why are heart enhancers easier to predict than other types of enhancers?</title>
<p>We saw a broad range in our ability to predict the tissue specificity of enhancers from existing data. Heart enhancers were dramatically easier to identify than other tissue-specific enhancers. Heart enhancers have significantly higher GC content than enhancers of other tissues, are less evolutionarily conserved, and are closer to the nearest TSS than other known enhancers at E11.5, and we show that GC content alone is sufficient to accurately predict many heart enhancers (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s007">Figures S7</xref>
Figures S7 and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s008">S8</xref>
). However, functional genomics data alone were also able to accurately predict heart enhancers. The underlying biological explanation for these patterns may have to do with relative developmental age of different organs and tissues. At E11.5, the heart is further along its developmental trajectory than the other tissues considered, and heart enhancers have completed their most conserved developmental stage, whereas forebrain enhancers are most strongly conserved at E11.5 and E14.5
<xref rid="pcbi.1003677-Nord1" ref-type="bibr">[75]</xref>
. At E11.5, many of the less conserved, mammal-specific features of the heart are developing
<xref rid="pcbi.1003677-Woznica1" ref-type="bibr">[91]</xref>
,
<xref rid="pcbi.1003677-KoshibaTakeuchi1" ref-type="bibr">[92]</xref>
, whereas other tissues are still developing under more general, less species-specific conserved regulatory programs at E11.5
<xref rid="pcbi.1003677-Casci1" ref-type="bibr">[93]</xref>
. A recent study of enhancers in the adult mouse retina found that high local GC content was strongly correlated with enhancer activity
<xref rid="pcbi.1003677-White1" ref-type="bibr">[94]</xref>
. Paired with our result, this suggests that GC content is a distinguishing feature of certain classes of enhancers.</p>
</sec>
<sec id="s3e">
<title>Limitations of our approach</title>
<p>In spite of the strong overall performance of EnhancerFinder at predicting tissue-specific developmental enhancers, our approach has some limitations. First, we rely heavily upon the VISTA Enhancer Browser for training examples, because it is the largest collection of validated mammalian enhancers currently available. This resource provides an impressive catalog of validated human regulatory enhancers, but it is limited to a single developmental stage and experimental system. Without more data and analysis, it is difficult to evaluate how specific our predictions are to this context. Applying EnhancerFinder to known enhancers in model organisms, such as zebrafish and fly, would provide additional opportunities to evaluate our approach and findings, while potentially demonstrating differences in how enhancers function in these different species.</p>
<p>Second, most of the enhancers present in VISTA are evolutionarily conserved. As a result, the VISTA enhancers cannot be viewed as an exhaustive catalog of the full range of enhancers. However, these regions have validated enhancer activity
<italic>in vivo</italic>
, and thus provide an appealing alternative to approaches that use single-mark proxies for enhancer activity (e.g., considering all H3K27ac peaks as active enhancer regions). In addition to being conserved, these regions contain many signatures of enhancers in their sequence motifs and functional genomics composition that are useful for predicting enhancers. To emphasize these features and mitigate the impact of bias towards conserved regions, we removed evolutionary conservation as a feature from EnhancerFinder when we applied it to predict enhancers genome-wide. Our goal in doing so was to improve our ability to discern less conserved enhancers in these genome-wide predictions, and indeed, we predicted thousands of non-conserved enhancers (∼20% of all predictions).</p>
<p>Third, though our predictions are based on a large collection of genome-wide chromatin state, protein-binding, and sequence information from many contexts, we are still limited by data availability. Even with the impressive efforts of ENCODE and related projects, producing data that are perfectly matched to all contexts of interest is time consuming and sometimes impossible, especially when studying humans. Thus, it will be important to develop a principled understanding of how different data can be generalized across tissues, developmental stages, and between species. In our analysis, many of the highest weighted features come from contexts close to the developmental stage of interest, and thus we anticipate that gathering more data from developmentally relevant cells and tissues will significantly improve our ability to annotate genomic regions involved in the regulation of embryonic development. However, data from other, seemingly unrelated, contexts may continue to prove useful.</p>
</sec>
<sec id="s3f">
<title>Extensions and future applications</title>
<p>This study annotates regulatory elements in the human genome and provides tools for interpreting the effects of mutations in non-coding regions. Our case studies on regions around
<italic>ZEB2</italic>
,
<italic>FOXC1</italic>
, and
<italic>FOXC2</italic>
illustrate how our predictions can facilitate the rapid identification of novel enhancers. In addition, the statistical enrichment for GWAS SNPs in our genome-wide enhancer predictions suggests that they may be a good resource for pinpointing causal mutations in potential disease loci.</p>
<p>EnhancerFinder is a general framework for enhancer prediction and evaluation of different data sources that aim to annotate the regulatory functions of the human genome. It could easily be extended to include additional types of data, such as population-level variation at each locus, information about the three-dimensional state of the genome from Hi-C and 5C, and predictions of potential target genes for each enhancer. It could also be used to analyze additional aspects of the data we already consider, such as accounting for the relative genomic position of different features
<xref rid="pcbi.1003677-Sonnenburg1" ref-type="bibr">[66]</xref>
.</p>
<p>The EnhancerFinder two-step approach enables delineation of features common to all enhancers versus those that characterize enhancers of different types. For example, we find that predicting enhancers that are unique to a single tissue is more difficult than those that are active in multiple tissues (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s009">Figure S9</xref>
), that certain features make prediction of heart enhancers particularly easy, and that different features are selected in classifiers for general enhancers and those for specific tissues. Together, these results suggest that there may be distinct classes of enhancers, even among those active in a given tissue at a single developmental stage. Further analysis of EnhancerFinder classifiers based on different types of data may help suggest biological mechanisms underlying the functional distinctions and genomic features of these different classes of enhancers.</p>
</sec>
</sec>
<sec sec-type="methods" id="s4">
<title>Methods</title>
<sec id="s4a">
<title>Ethics statement</title>
<p>Transgenic mice were generated by Cyagen Biosciences (
<ext-link ext-link-type="uri" xlink:href="http://www.cyagen.com/">http://www.cyagen.com/</ext-link>
). Their facility meets and often exceeds animal health and welfare guidelines. Animals were euthanized using techniques recommended by the American Veterinary Medical Association. All procedures were carried out in line with Gladstone Institutes and University of California guidelines. All zebrafish work was approved by the UCSF Institutional Animal Care and Use Committee (protocol number AN100466).</p>
</sec>
<sec id="s4b">
<title>Genomic data</title>
<p>All work presented in this paper is based on the February 2009 assembly of the human genome (GRCh37/hg19) downloaded from the UCSC Genome Browser (
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu/">http://genome.ucsc.edu/</ext-link>
). Any data that was not in reference to this build was mapped over using the
<italic>liftOver</italic>
tool from the UCSC Kent tools (
<ext-link ext-link-type="uri" xlink:href="http://hgdownload.cse.ucsc.edu/admin/jksrc.zip">http://hgdownload.cse.ucsc.edu/admin/jksrc.zip</ext-link>
).</p>
</sec>
<sec id="s4c">
<title>Multiple kernel learning-based prediction of developmental enhancers</title>
<p>In our framework, genomic regions are associated with a common set of descriptive features. We then apply machine-learning algorithms that use the features of known training examples to learn a function of the feature data that distinguishes the positives (enhancers) from the negatives (non-enhancers). This function can then be applied to the features associated with uncharacterized genomic regions to predict their enhancer status. A positive score for a genomic region indicates predicted membership in the positive class (enhancers) and a negative score indicates predicted membership in the negative class (non-enhancers).</p>
<sec id="s4c1">
<title>Training examples</title>
<p>We obtained all of our positive training data and our tissue-specific negative training data from the VISTA Enhancer Browser
<xref rid="pcbi.1003677-Visel6" ref-type="bibr">[69]</xref>
on April 4, 2012. We downloaded the location, DNA sequence, and expression contexts for all human sequences tested in the VISTA mouse E11.5 enhancer screen. This consisted of 711 validated human enhancers and 736 genomic regions that did not exhibit enhancer activity in this context (
<ext-link ext-link-type="uri" xlink:href="http://enhancer.lbl.gov/">http://enhancer.lbl.gov/</ext-link>
). The median length of the enhancers in VISTA is 1,545 bp.</p>
<p>In the first step of EnhancerFinder (
<xref ref-type="fig" rid="pcbi-1003677-g001">Figure 1</xref>
), we used all 711 VISTA enhancers as positive training data. For negative training data, we generated a set of 711 random genomic regions matched to the length and chromosome distribution of the positives, and filtered to remove known VISTA enhancers and assembly gaps.</p>
<p>In the second step of EnhancerFinder, we used tissue-specific subsets of the 1,447 VISTA regions for training. For example, when predicting heart enhancers, our positive training data were the 84 VISTA regions with heart expression in E11.5 mice, and our negative training data were the remaining 1,363 VISTA regions that were tested and showed no heart expression at E11.5, even though they may be enhancers in other tissues or none at all. We did not require that a region be active only in the tissue of interest. We included the VISTA negatives in this analysis, because they share many attributes in common with known enhancers and may have enhancer activity in contexts other than E11.5. Our results did not change dramatically when the VISTA negatives were not included in the training. We trained tissue-specific classifiers for the six tissues with more than 50 examples in VISTA: forebrain, midbrain, hindbrain, heart, limb, and neural tube. We also trained a brain enhancer classifier on the combined the forebrain, midbrain, and hindbrain enhancers.</p>
</sec>
<sec id="s4c2">
<title>Feature data</title>
<p>We considered three main types of data as features in our analysis: functional genomics data, evolutionary conservation, and DNA sequence motifs. We obtained our functional genomics feature data from the ENCODE data repository at the UCSC Genome Browser (
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu/ENCODE/">http://genome.ucsc.edu/ENCODE/</ext-link>
and
<xref rid="pcbi.1003677-Birney1" ref-type="bibr">[95]</xref>
). These data include histone modifications, such as H3K4me1, H3K4me3, H3K27ac, protein-DNA associations for many TFs and p300, and several measurements of open chromatin (DNaseI hypersensitivity, FAIRE, digital genomic footprinting), from hundreds of cell types
<xref rid="pcbi.1003677-Birney1" ref-type="bibr">[95]</xref>
. We also included heart p300 data from
<xref rid="pcbi.1003677-Blow1" ref-type="bibr">[39]</xref>
. For a full list of the functional genomics data considered, see
<xref ref-type="supplementary-material" rid="pcbi.1003677.s012">Table S1</xref>
. We associated each genomic region with a binary vector that represents the presence or absence of overlap with each functional genomics data set. To determine this feature vector, we intersected the genomic location of the region of interest with the peaks defined by the original researchers (from the broadPeak or narrowPeak files) using
<italic>intersectBed</italic>
<xref rid="pcbi.1003677-Quinlan1" ref-type="bibr">[96]</xref>
. We found that considering non-binary functional genomics features based on experimental data, like the density of sequence reads from a ChIP-seq study, did not significantly improve performance (data not shown). However, we suspect that with consistent peak calling and appropriate normalization this might be an avenue for future improvement.</p>
<p>To summarize the DNA sequence motif patterns in a genomic region, we calculated the number of occurrences of all possible 4-mers in the sequence.</p>
<p>Evolutionary conservation estimates were taken from the mammalian phastCons elements
<xref rid="pcbi.1003677-Siepel1" ref-type="bibr">[72]</xref>
obtained from the
<italic>phastConsElements46wayPlacental</italic>
track in UCSC Genome Browser. Each genomic region was assigned its maximum overlapping phastCons score or zero if it did not overlap any phastCons elements.</p>
</sec>
<sec id="s4c3">
<title>Machine-learning algorithms</title>
<p>EnhancerFinder is an extension of the SVM supervised learning framework that allows the integration of multiple data types into a single discrimination function. Standard 1-norm MKL augments the usual SVM discrimination function,
<italic>f</italic>
, with additional parameters,
<italic>β
<sub>j</sub>
</italic>
, that weight the contribution of each kernel function
<italic>k
<sub>j</sub>
</italic>
:
<disp-formula id="pcbi.1003677.e001">
<graphic xlink:href="pcbi.1003677.e001.jpg" position="anchor" orientation="portrait"></graphic>
</disp-formula>
where
<italic>N</italic>
is the number of training examples,
<italic>M</italic>
is the number of kernels,
<italic>α
<sub>i</sub>
</italic>
are the training example weights, and
<italic>b</italic>
is the bias
<xref rid="pcbi.1003677-Sonnenburg1" ref-type="bibr">[66]</xref>
. We include three kernel functions in EnhancerFinder, each of which corresponds to one of the three types of feature data described above. These kernels quantify the similarity of the features of the appropriate type for any two genomic regions. To combine the kernels, the MKL algorithm simultaneously learns weights for the associated kernels, in addition to learning the bias and weights for each training example as in a standard SVM. We use the 4-spectrum kernel
<xref rid="pcbi.1003677-Leslie1" ref-type="bibr">[71]</xref>
for our sequence features; this kernel has been shown to perform well in a variety of DNA sequence-based prediction tasks including enhancer prediction
<xref rid="pcbi.1003677-Lee1" ref-type="bibr">[54]</xref>
. For the functional genomics and evolutionary conservation data, we use linear kernels, which are equivalent to dot products of the feature vectors. We explored the use of alternative, non-linear kernels for these features and found that they performed similarly (data not shown). Each kernel was variance normalized, and we balanced the misclassification costs by class size
<xref rid="pcbi.1003677-BenHur1" ref-type="bibr">[97]</xref>
. In addition to EnhancerFinder classifiers, we also trained and evaluated the constituent single kernel SVMs. All analyses were performed using the implementation of SVMs and MKL in the SHOGUN Machine Learning Toolbox v1.1.0
<xref rid="pcbi.1003677-Sonnenburg2" ref-type="bibr">[98]</xref>
.</p>
</sec>
</sec>
<sec id="s4d">
<title>Performance evaluations</title>
<p>To evaluate the performance of trained classifiers, we performed 10-fold cross-validation on the training data and quantified our results with ROC AUC, precision-recall curves, and power estimates at fixed false positive rates. We computed p-values for the difference in performance between classification methods using McNemar's test
<xref rid="pcbi.1003677-Salzberg1" ref-type="bibr">[99]</xref>
,
<xref rid="pcbi.1003677-Dietterich1" ref-type="bibr">[100]</xref>
. To estimate false discovery rates, we trained EnhancerFinder classifiers at 1∶1, 1∶10, and 1∶100 ratios of positive to negative enhancers and used the resulting 10-fold cross-validation results to calculate the proportion of false discoveries genome-wide at a 5% FPR if the true proportion of 1.5 kb windows containing an enhancer was 50%, 10%, or 1%.</p>
</sec>
<sec id="s4e">
<title>Comparison to existing enhancer prediction methods</title>
<p>We compared EnhancerFinder's predictions to those of several previous enhancer prediction methods. We obtained the performance of CLARE on our Step 1 prediction task, by inputting our positive and negative data into the CLARE web server
<xref rid="pcbi.1003677-Taher1" ref-type="bibr">[73]</xref>
. We downloaded the genomic segmentations and annotations produced by ChromHMM
<xref rid="pcbi.1003677-Ernst1" ref-type="bibr">[64]</xref>
and Segway
<xref rid="pcbi.1003677-Hoffman1" ref-type="bibr">[65]</xref>
. We considered the ChromHMM predictions based on different ENCODE cell lines both individually and together. Any genomic region in our evaluation data set that overlapped an enhancer state was considered a predicted enhancer, and all others were considered predicted non-enhancers. For Segway, we also considered the “TF activity” state.</p>
</sec>
<sec id="s4f">
<title>Identification of tissue-specific enhancers across the human genome</title>
<p>We predicted tissue-specific developmental enhancers throughout the human genome by applying a trained MKL classifier (Step 1 of EnhancerFinder) without conservation (see
<xref ref-type="sec" rid="s2">Results</xref>
) to sliding windows of 1500 bp, moving along the human genome in 500 bp steps. The feature profile for each window was computed as described above. To focus on high-confidence predictions, we filtered the enhancer scores for the windows at a 5% FPR, estimated from cross-validation using the genomic background, and combined the remaining overlapping windows to produce 84,301 high-confidence predicted enhancers.</p>
<p>To predict tissue specificity, we applied trained brain, limb, and heart classifiers (Step 2 of EnhancerFinder) without conservation to all 299,039 windows with positive enhancer scores in Step 1. We then applied a 5% FPR cutoff for each tissue and concatenated the remaining overlapping windows into merged enhancer regions. Using this approach, we predicted 19,051 heart enhancers, 11,693 brain enhancers, and 7,400 limb enhancers.</p>
</sec>
<sec id="s4g">
<title>Analysis of genome-wide tissue-specific enhancer predictions</title>
<p>We characterized the expression patterns of the gene nearest to each predicted enhancer using the GNF Atlas 2
<xref rid="pcbi.1003677-Su1" ref-type="bibr">[101]</xref>
. It contains expression data for genes in 79 different tissues, with expression measured using Affymetrix microarrays. For each of these 79 tissues, we used a paired t-test to determine if the nearest genes of predicted heart enhancers had significantly different mean values of expression than the nearest genes of brain enhancers. We did not include the limb enhancers in this analysis due to the lack of relevant expression data in the GNF Atlas 2.</p>
<p>We examined genomic regions near predicted developmental enhancers for enrichment of Gene Ontology functional annotations, known phenotypes, and pathways using GREAT
<xref rid="pcbi.1003677-McLean1" ref-type="bibr">[102]</xref>
. Results were computed using the hypergeometric test for genome-wide significance, with the default settings and the “basal plus extension” association rule (proximal 5 kb upstream, 1 kb downstream, plus distal up to 100 kb).</p>
<p>We identified the sequence motifs present in each set of enhancers using the FIMO tool (Find Individual Motif Occurrences) from the MEME Suite of sequence motif analysis tools
<xref rid="pcbi.1003677-Grant1" ref-type="bibr">[103]</xref>
. We considered known transcription factor binding motifs from the April 2011 release of the TRANSFAC database with a FIMO score threshold of 10e-5. We identified those occurrences that fell in predicted enhancers, and summarized motifs to identify the most prevalent TFs in each tissue-specific set of enhancers.</p>
<p>We analyzed the overlap of predicted enhancers with GWAS SNPs, based on the NHGRI catalog of 9,687 GWAS SNPs downloaded from the UCSC Genome Browser in October 2012. Unadjusted permutation p-values were calculated by randomizing genomic locations of predicted enhancers (matching for length and chromosome, and avoiding assembly gaps) and overlapping these randomized regions with GWAS SNPs to assess significance of overlapping regions.</p>
</sec>
<sec id="s4h">
<title>Transgenic enhancer assays</title>
<p>Mouse enhancer assays were carried out in transient transgenic mouse embryos generated by pronuclear injections of enhancer assay constructs into FVB embryos (Cyagen Biosciences). Human and chimpanzee DNA sequences were inserted upstream of a minimal promoter Hsp68 and a
<italic>LacZ</italic>
reporter gene. The human sequence was amplified using primers
<named-content content-type="gene">5′-TGTATGAAACCTGTTCACTCTCC-3′</named-content>
and
<named-content content-type="gene">5′-GCTTAAAACAACTACTAGAATCAGGC-3′</named-content>
from the bacterial artificial chromosome (BAC) RP11-107E5 (from the BacPac resource at CHORI). The chimpanzee sequence was amplified using primers
<named-content content-type="gene">5′-TGTATGAAACCTGTTCACTCTCC-3′</named-content>
and
<named-content content-type="gene">5′-GCTTAAAACAACTACTAGAATCAGGC-3′</named-content>
from BAC CH251-677E03a (CHORI). The embryos were collected and stained for
<italic>LacZ</italic>
expression at E11.5.</p>
<p>Following the annotation policies of the VISTA Enhancer Browser, we required that consistent spatial expression patterns be present in three or more embryos with staining in order for the region to be considered an enhancer.</p>
<p>Zebrafish enhancer assays were performed in transient transgenic zebrafish embryos. We tested candidate enhancer regions that ranged in length from 987 bp to 3,633 bp (see
<xref ref-type="supplementary-material" rid="pcbi.1003677.s017">Table S6</xref>
for hg19 genomic coordinates), which we manually demarcated from within larger predicted enhancer regions based on signatures of likely enhancer function (including DnaseI hypersensitivity sites, transcription factor binding sites, histone modifications, and conservation).</p>
<p>We performed PCR to obtain the candidate enhancer sequence using human genomic DNA (Roche). These were cloned into the E1b-GFP-Tol2 enhancer assay vector containing an E1b minimal promoter followed by GFP
<xref rid="pcbi.1003677-Li1" ref-type="bibr">[104]</xref>
, and the construct was verified by sequencing. Each construct was injected with Tol2 mRNA into at least 100 single-cell fertilized zebrafish embryos. We annotated GFP expression at approximately 24 and 48 hours post fertilization (hpf), and considered an enhancer to be positive if we observed consistent expression in at least 15% of all fish alive at either 24 or 48 hpf
<xref rid="pcbi.1003677-Oksenberg1" ref-type="bibr">[105]</xref>
, and suggestive of enhancer activity if we observed consistent expression in at least 10% of all fish alive at 24 or 48 hpf, after subtracting out percentages of tissue expression in fish injected with the empty enhancer vector. For each construct, at least 50 fish were analyzed for GFP expression at 48 hpf.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="s5">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pcbi.1003677.s001">
<label>Figure S1</label>
<caption>
<p>
<bold>Precision-Recall curves corresponding to all ROC curves presented in the main text.</bold>
(A)
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2A</xref>
(B)
<xref ref-type="fig" rid="pcbi-1003677-g003">Figure 3A</xref>
(C)
<xref ref-type="fig" rid="pcbi-1003677-g003">Figure 3B</xref>
(D)
<xref ref-type="fig" rid="pcbi-1003677-g004">Figure 4</xref>
. A PR curve could not be created for
<xref ref-type="fig" rid="pcbi-1003677-g002">Figure 2C</xref>
, because we could not obtain the raw scores for regions from the CLARE web server.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s001.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s002">
<label>Figure S2</label>
<caption>
<p>
<bold>VISTA enhancers overlap many common marks of enhancers, but no common mark is universal to all VISTA enhancers.</bold>
We computed the overlap between 711 VISTA enhancers and three common functional genomic marks of enhancers and found that 450 enhancers overlap H3K27ac (in any of 16 datasets from ENCODE), 563 overlap H3K4me1 (in any of 15 datasets from ENCODE), and 404 overlap p300/CBP (in any of 35 datasets from ENCODE and human tissues). Fewer than half of the enhancers (306) overlap all three common marks of enhancers, and 93 do not overlap any of those three functional genomics marks. All but five of the VISTA enhancers overlap a conservation peak (phastCons 46-way placental mammal). Four of these non-conserved enhancers overlap all three functional genomics marks, and one non-conserved enhancer overlaps just H3K27ac and H3K4me1.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s002.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s003">
<label>Figure S3</label>
<caption>
<p>
<bold>The 4-spectrum kernel performs competitively with other k-spectrum kernels and the combination of k-spectrum kernels.</bold>
We analyzed the ability of spectrum kernels based on k-mer lengths between 2 and 8 to distinguish enhancers from the genomic background (Step 1). K-mers between 4 and 7 had the best performance. We also evaluated an MKL algorithm that combined each k-spectrum kernel, and it did not provide significant improvement over the best individual kernels.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s003.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s004">
<label>Figure S4</label>
<caption>
<p>
<bold>Considering known TFBS motifs does not improve the 4-spectrum kernel.</bold>
Considering the number of occurrences of known TFBS motifs as features has recently been used in a linear SVM framework to predict enhancers
<xref rid="pcbi.1003677-Burzynski1" ref-type="bibr">[52]</xref>
. To evaluate the utility of this approach, instead of and in addition to considering all k-mers, we created a linear SVM that used the number of hits to 1022 TF binding site matrices from TRANSFAC and JASPAR as computed by FIMO as features. That is the feature vector for each region consisted of 1022 elements, each of which was the number of significant hits for a different TF motif. This TFBS linear SVM (AUC = 0.81) did not perform as well as the 4-spectrum kernel (AUC = 0.88). We also evaluated an MKL algorithm that combined the 4-spectrum and TFBS kernels. This combined kernel did not perform any better than the 4-spectrum kernel suggesting that, at least under this encoding, TFBS motifs do not provide significant additional benefit in distinguishing enhancers from the genomic background.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s004.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s005">
<label>Figure S5</label>
<caption>
<p>
<bold>Combining functional genomics data with an SVM outperforms simply considering regions overlapping these data.</bold>
The four solid lines shown are the same as in
<xref ref-type="fig" rid="pcbi-1003677-g003">Figure 3B</xref>
; they summarize the performance of these methods at distinguishing VISTA enhancers from the genomic background (Step 1). The X's give the performance of approaches that consider all regions overlapping a given feature as positives and all others as negatives. The + and * indicate the performance obtained by considering the union and intersection of H3K4me1, p300, and H3K27ac, respectively. For each feature, the linear SVM achieves better performance than simply considering all overlapping regions as positives.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s005.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s006">
<label>Figure S6</label>
<caption>
<p>
<bold>EnhancerFinder feature weights highlight the contribution of different functional genomics data types to enhancer predictions.</bold>
Each “+” represents the contribution made by a single data feature, e.g. H3K4me1 peaks from embryonic stem cells, to the classification in EnhancerFinder Step 1 (developmental enhancers versus genomic background). Positive weights (red) indicate an association with enhancer activity in our analysis and negative weights (blue) suggest a lack of enhancer activity. The features plotted here come from a range of likely relevant contexts (
<bold>Relevant Functional Genomics</bold>
classifier;
<xref ref-type="supplementary-material" rid="pcbi.1003677.s012">Table S1</xref>
), and the number of data sets present for each feature type is given in parentheses. The black bar gives the average weight over all features of each type. In general, the features with high average weights, such as H3K3me1, p300, and H3K4me2, are known to be associated with enhancers, while those with large negative weights are associated with other types of genomic regions. However, no data type has uniformly positive or negative weights in all contexts.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s006.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s007">
<label>Figure S7</label>
<caption>
<p>
<bold>Heart enhancers are less conserved and closer to the nearest transcription start site (TSS) than limb and brain enhancers.</bold>
Considering only limb and brain enhancers that are less evolutionarily conserved and close to a TSS improved our ability to identify them, but they are still more difficult to identify than heart enhancers. In addition to these features, heart enhancers have uniquely high GC content compared to other enhancers and the genomic background (
<xref ref-type="supplementary-material" rid="pcbi.1003677.s007">Figure S7</xref>
).</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s007.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s008">
<label>Figure S8</label>
<caption>
<p>
<bold>The uniquely high GC content of heart enhancers in VISTA enables accurate classification.</bold>
The VISTA heart enhancers have higher GC content (49%) than other types of enhancers and the genomic background (∼40%). (A) The classification score from a spectrum kernel classifier trained to distinguish heart enhancers within VISTA (Step 2) is strongly correlated (Pearson rho = 0.95) with the GC content of the input region. (B) A classification algorithm based solely on GC content (black) performs competitively with the spectrum kernel (AUC of 0.80 vs. 0.82), and nearly as well as EnhancerFinder (0.85;
<xref ref-type="fig" rid="pcbi-1003677-g004">Figure 4</xref>
).</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s008.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s009">
<label>Figure S9</label>
<caption>
<p>
<bold>Enhancers active in multiple tissues are easier to identify than those active in a single tissue.</bold>
There are 399 enhancers active in a single tissue at E11.5 in the VISTA database and 312 active in multiple tissues. EnhancerFinder is better able to distinguish the enhancers active in multiple tissues from the VISTA negatives (AUC = 0.75) than it is to distinguish single tissue enhancers from the negatives (AUC = 0.67). This trend also holds across each tissue individually. However, both sets are easy to distinguish from the genomic background (AUC = 0.96 for both, not shown).</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s009.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s010">
<label>Figure S10</label>
<caption>
<p>
<bold>Three novel developmental enhancers near </bold>
<bold>
<italic>FOXC1</italic>
</bold>
<bold>.</bold>
This UCSC Genome Browser screenshot shows six candidate enhancer regions tested in transgenic zebrafish. Three of the regions showed positive or suggestive expression at 24 or 48 hpf. F1EC-1 drives expression at 48 hpf; the arrows highlight reproducible midbrain, spinal cord, and epidermis expression. F1EC-3 shows suggestive expression at 24 hpf in somitic muscles and the epidermis (arrows). F1EC-6 drives expression at 48 hpf in the pericardium and heart (suggestive). The other three tested candidate enhancers without corresponding zebrafish images were negative in the enhancer assay. See
<xref ref-type="supplementary-material" rid="pcbi.1003677.s017">Table S6</xref>
for full list of expressed tissues seen in each candidate enhancer.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s010.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s011">
<label>Figure S11</label>
<caption>
<p>
<bold>Transient transgenic mouse embryos support a novel cranial nerve enhancer near </bold>
<bold>
<italic>ZEB2</italic>
</bold>
<bold>.</bold>
Seven transient transgenic mouse embryos showed
<italic>LacZ</italic>
expression at embryonic day 11.5. Constructs containing a 999 bp region (hg19.chr2:145,234,541–145,235,539) including 2xHAR.240 near
<italic>ZEB2</italic>
, a minimal promoter, and
<italic>LacZ</italic>
were used for human. The orthologous region was used in the chimp construct (panTro2.chr2b:148,811,929–148,812,929). Three embryos with constructs containing the human version of the region of interest and four embryos containing the chimp sequence had staining. In all embryos, there was consistent expression in the cranial nerve. There does not appear to be a significant difference in the activity driven by the human and chimp sequences at this time point.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pcbi.1003677.s011.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s012">
<label>Table S1</label>
<caption>
<p>
<bold>Functional genomics features used in our analysis.</bold>
This Excel spreadsheet lists the files used from ENCODE (
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu/ENCODE/">http://genome.ucsc.edu/ENCODE/</ext-link>
) or GEO (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/">http://www.ncbi.nlm.nih.gov/geo/</ext-link>
). There is a sheet for each of the classifiers based on functional genomics data that lists all data files used. ENCODE data set names are UCSC track names. GEO data set names are GEO identifiers.</p>
<p>(XLS)</p>
</caption>
<media xlink:href="pcbi.1003677.s012.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s013">
<label>Table S2</label>
<caption>
<p>
<bold>Genes near brain enhancers have significantly higher gene expression in brain and neural tissues than genes near heart enhancers.</bold>
Brain- or heart-related tissues with significantly higher mean expression in genes associated with predicted brain enhancers compared to predicted heart enhancers.</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pcbi.1003677.s013.doc">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s014">
<label>Table S3</label>
<caption>
<p>
<bold>Genes near heart enhancers have significantly higher gene expression in cardiac-related tissues than genes near brain enhancers.</bold>
Brain- or heart-related tissues with significantly higher mean expression in genes associated with predicted heart enhancers compared to predicted brain enhancers.</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pcbi.1003677.s014.doc">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s015">
<label>Table S4</label>
<caption>
<p>
<bold>The top 25 transcription factors for which binding sites were most prevalent in brain, heart, and limb enhancers.</bold>
</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pcbi.1003677.s015.doc">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s016">
<label>Table S5</label>
<caption>
<p>
<bold>676 GWAS SNPs are found in predicted enhancers.</bold>
This Excel spreadsheet lists all GWAS SNPs from the NHGRI database that fall within one of our predicted enhancers.</p>
<p>(XLSX)</p>
</caption>
<media xlink:href="pcbi.1003677.s016.xlsx">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s017">
<label>Table S6</label>
<caption>
<p>
<bold>Candidate enhancer regions tested in zebrafish.</bold>
We tested 10 candidate enhancer regions in a transgenic zebrafish assay. This table lists the genomic coordinates (hg19) and expression patterns observed for each construct at 24 and 48 hpf. A representative fish is shown for each positive enhancer in (
<xref ref-type="fig" rid="pcbi-1003677-g007">Figures 7</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1003677.s009">S9</xref>
). Candidate enhancers on chromosome 6 are near FOXC1, and those on chromosome 16 are near FOXC2. N is the number of zebrafish alive at the specified time point, and * indicates expression patterns that are “suggestive,” but below the 15% threshold we used for confirmed enhancers.</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pcbi.1003677.s017.doc">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s018">
<label>Data File S1</label>
<caption>
<p>
<bold>This ZIP archive contains BED files (hg19 coordinates) with EnhancerFinder's genome-wide enhancer predictions, along with the MKL scores, for general developmental enhancer activity, brain, heart, and limb enhancers.</bold>
The general prediction file also lists the H3K27ac and H3K4me1 marks from the feature data overlapping each predicted enhancer.</p>
<p>(ZIP)</p>
</caption>
<media xlink:href="pcbi.1003677.s018.zip">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1003677.s019">
<label>Text S1</label>
<caption>
<p>
<bold>Text describing additional analyses in support of the manuscript.</bold>
</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pcbi.1003677.s019.doc">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>We thank A. Robles and C. Miller in the Gladstone Histology Core for assistance with embryo imaging and JLR Rubenstein for help interpreting embryo staining patterns. Transgenic mice were generated by Cyagen Biosciences, Inc.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1003677-Ong1">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ong</surname>
<given-names>CT</given-names>
</name>
,
<name>
<surname>Corces</surname>
<given-names>VG</given-names>
</name>
(
<year>2011</year>
)
<article-title>Enhancer function: new insights into the regulation of tissue-specific gene expression</article-title>
.
<source>Nature reviews Genetics</source>
<volume>12</volume>
:
<fpage>283</fpage>
<lpage>293</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Bulger1">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bulger</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Groudine</surname>
<given-names>M</given-names>
</name>
(
<year>2011</year>
)
<article-title>Functional and mechanistic diversity of distal transcription enhancers</article-title>
.
<source>Cell</source>
<volume>144</volume>
:
<fpage>327</fpage>
<lpage>339</lpage>
.
<pub-id pub-id-type="pmid">21295696</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Visel1">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
,
<name>
<surname>Pennacchio</surname>
<given-names>LA</given-names>
</name>
(
<year>2009</year>
)
<article-title>Genomic views of distant-acting enhancers</article-title>
.
<source>Nature</source>
<volume>461</volume>
:
<fpage>199</fpage>
<lpage>205</lpage>
.
<pub-id pub-id-type="pmid">19741700</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Sakabe1">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sakabe</surname>
<given-names>NJ</given-names>
</name>
,
<name>
<surname>Savic</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Nobrega</surname>
<given-names>MA</given-names>
</name>
(
<year>2012</year>
)
<article-title>Transcriptional enhancers in development and disease</article-title>
.
<source>Genome biology</source>
<volume>13</volume>
:
<fpage>238</fpage>
.
<pub-id pub-id-type="pmid">22269347</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Ahituv1">
<label>5</label>
<mixed-citation publication-type="book">Ahituv N (2012) Gene regulatory sequences and human disease. New York: Springer. x, 283 pages p.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Noonan1">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>Noonan</surname>
<given-names>JP</given-names>
</name>
,
<name>
<surname>McCallion</surname>
<given-names>AS</given-names>
</name>
(
<year>2010</year>
)
<article-title>Genomics of long-range regulatory elements</article-title>
.
<source>Annual review of genomics and human genetics</source>
<volume>11</volume>
:
<fpage>1</fpage>
<lpage>23</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Lomvardas1">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lomvardas</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Barnea</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Pisapia</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Mendelsohn</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Kirkland</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
(
<year>2006</year>
)
<article-title>Interchromosomal interactions and olfactory receptor choice</article-title>
.
<source>Cell</source>
<volume>126</volume>
:
<fpage>403</fpage>
<lpage>413</lpage>
.
<pub-id pub-id-type="pmid">16873069</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Visel2">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Akiyama</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Shoukry</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Afzal</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Functional autonomy of distant-acting human enhancers</article-title>
.
<source>Genomics</source>
<volume>93</volume>
:
<fpage>509</fpage>
<lpage>513</lpage>
.
<pub-id pub-id-type="pmid">19268701</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Visel3">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Taher</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Girgis</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>May</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Golonzhka</surname>
<given-names>O</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>A high-resolution enhancer atlas of the developing telencephalon</article-title>
.
<source>Cell</source>
<volume>152</volume>
:
<fpage>895</fpage>
<lpage>908</lpage>
.
<pub-id pub-id-type="pmid">23375746</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Koch1">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Koch</surname>
<given-names>CM</given-names>
</name>
,
<name>
<surname>Andrews</surname>
<given-names>RM</given-names>
</name>
,
<name>
<surname>Flicek</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Dillon</surname>
<given-names>SC</given-names>
</name>
,
<name>
<surname>Karaoz</surname>
<given-names>U</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>The landscape of histone modifications across 1% of the human genome in five human cell lines</article-title>
.
<source>Genome research</source>
<volume>17</volume>
:
<fpage>691</fpage>
<lpage>707</lpage>
.
<pub-id pub-id-type="pmid">17567990</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Heintzman1">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Heintzman</surname>
<given-names>ND</given-names>
</name>
,
<name>
<surname>Hon</surname>
<given-names>GC</given-names>
</name>
,
<name>
<surname>Hawkins</surname>
<given-names>RD</given-names>
</name>
,
<name>
<surname>Kheradpour</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Stark</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Histone modifications at human enhancers reflect global cell-type-specific gene expression</article-title>
.
<source>Nature</source>
<volume>459</volume>
:
<fpage>108</fpage>
<lpage>112</lpage>
.
<pub-id pub-id-type="pmid">19295514</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Sholtis1">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sholtis</surname>
<given-names>SJ</given-names>
</name>
,
<name>
<surname>Noonan</surname>
<given-names>JP</given-names>
</name>
(
<year>2010</year>
)
<article-title>Gene regulation and the origins of human biological uniqueness</article-title>
.
<source>Trends in genetics : TIG</source>
<volume>26</volume>
:
<fpage>110</fpage>
<lpage>118</lpage>
.
<pub-id pub-id-type="pmid">20106546</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Levine1">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Levine</surname>
<given-names>M</given-names>
</name>
(
<year>2010</year>
)
<article-title>Transcriptional enhancers in animal development and evolution</article-title>
.
<source>Current biology : CB</source>
<volume>20</volume>
:
<fpage>R754</fpage>
<lpage>763</lpage>
.
<pub-id pub-id-type="pmid">20833320</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Banerji1">
<label>14</label>
<mixed-citation publication-type="journal">
<name>
<surname>Banerji</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Rusconi</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Schaffner</surname>
<given-names>W</given-names>
</name>
(
<year>1981</year>
)
<article-title>Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences</article-title>
.
<source>Cell</source>
<volume>27</volume>
:
<fpage>299</fpage>
<lpage>308</lpage>
.
<pub-id pub-id-type="pmid">6277502</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Gillies1">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gillies</surname>
<given-names>SD</given-names>
</name>
,
<name>
<surname>Morrison</surname>
<given-names>SL</given-names>
</name>
,
<name>
<surname>Oi</surname>
<given-names>VT</given-names>
</name>
,
<name>
<surname>Tonegawa</surname>
<given-names>S</given-names>
</name>
(
<year>1983</year>
)
<article-title>A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene</article-title>
.
<source>Cell</source>
<volume>33</volume>
:
<fpage>717</fpage>
<lpage>728</lpage>
.
<pub-id pub-id-type="pmid">6409417</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Nobrega1">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>Nobrega</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Ovcharenko</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Afzal</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
(
<year>2003</year>
)
<article-title>Scanning human gene deserts for long-range enhancers</article-title>
.
<source>Science</source>
<volume>302</volume>
:
<fpage>413</fpage>
.
<pub-id pub-id-type="pmid">14563999</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Pennacchio1">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pennacchio</surname>
<given-names>LA</given-names>
</name>
,
<name>
<surname>Ahituv</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Moses</surname>
<given-names>AM</given-names>
</name>
,
<name>
<surname>Prabhakar</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Nobrega</surname>
<given-names>MA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2006</year>
)
<article-title>In vivo enhancer analysis of human conserved non-coding sequences</article-title>
.
<source>Nature</source>
<volume>444</volume>
:
<fpage>499</fpage>
<lpage>502</lpage>
.
<pub-id pub-id-type="pmid">17086198</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Visel4">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Blow</surname>
<given-names>MJ</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Akiyama</surname>
<given-names>JA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>ChIP-seq accurately predicts tissue-specific activity of enhancers</article-title>
.
<source>Nature</source>
<volume>457</volume>
:
<fpage>854</fpage>
<lpage>858</lpage>
.
<pub-id pub-id-type="pmid">19212405</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Visel5">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Prabhakar</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Akiyama</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Shoukry</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Lewis</surname>
<given-names>KD</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>Ultraconservation identifies a small subset of extremely constrained developmental enhancers</article-title>
.
<source>Nature genetics</source>
<volume>40</volume>
:
<fpage>158</fpage>
<lpage>160</lpage>
.
<pub-id pub-id-type="pmid">18176564</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Woolfe1">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>Woolfe</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Goodson</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Goode</surname>
<given-names>DK</given-names>
</name>
,
<name>
<surname>Snell</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>McEwen</surname>
<given-names>GK</given-names>
</name>
,
<etal>et al</etal>
(
<year>2005</year>
)
<article-title>Highly conserved non-coding sequences are associated with vertebrate development</article-title>
.
<source>PLoS biology</source>
<volume>3</volume>
:
<fpage>e7</fpage>
.
<pub-id pub-id-type="pmid">15630479</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Prabhakar1">
<label>21</label>
<mixed-citation publication-type="journal">
<name>
<surname>Prabhakar</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Poulin</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Shoukry</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Afzal</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
,
<etal>et al</etal>
(
<year>2006</year>
)
<article-title>Close sequence comparisons are sufficient to identify human cis-regulatory elements</article-title>
.
<source>Genome research</source>
<volume>16</volume>
:
<fpage>855</fpage>
<lpage>863</lpage>
.
<pub-id pub-id-type="pmid">16769978</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-McGaughey1">
<label>22</label>
<mixed-citation publication-type="journal">
<name>
<surname>McGaughey</surname>
<given-names>DM</given-names>
</name>
,
<name>
<surname>Vinton</surname>
<given-names>RM</given-names>
</name>
,
<name>
<surname>Huynh</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Al-Saif</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Beer</surname>
<given-names>MA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b</article-title>
.
<source>Genome research</source>
<volume>18</volume>
:
<fpage>252</fpage>
<lpage>260</lpage>
.
<pub-id pub-id-type="pmid">18071029</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Johnson1">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
,
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
,
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
(
<year>2007</year>
)
<article-title>Genome-wide mapping of in vivo protein-DNA interactions</article-title>
.
<source>Science</source>
<volume>316</volume>
:
<fpage>1497</fpage>
<lpage>1502</lpage>
.
<pub-id pub-id-type="pmid">17540862</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Boyle1">
<label>24</label>
<mixed-citation publication-type="journal">
<name>
<surname>Boyle</surname>
<given-names>AP</given-names>
</name>
,
<name>
<surname>Davis</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Shulha</surname>
<given-names>HP</given-names>
</name>
,
<name>
<surname>Meltzer</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Margulies</surname>
<given-names>EH</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>High-resolution mapping and characterization of open chromatin across the genome</article-title>
.
<source>Cell</source>
<volume>132</volume>
:
<fpage>311</fpage>
<lpage>322</lpage>
.
<pub-id pub-id-type="pmid">18243105</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Giresi1">
<label>25</label>
<mixed-citation publication-type="journal">
<name>
<surname>Giresi</surname>
<given-names>PG</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>McDaniell</surname>
<given-names>RM</given-names>
</name>
,
<name>
<surname>Iyer</surname>
<given-names>VR</given-names>
</name>
,
<name>
<surname>Lieb</surname>
<given-names>JD</given-names>
</name>
(
<year>2007</year>
)
<article-title>FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin</article-title>
.
<source>Genome research</source>
<volume>17</volume>
:
<fpage>877</fpage>
<lpage>885</lpage>
.
<pub-id pub-id-type="pmid">17179217</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Dunham1">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Dunham</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Kundaje</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Aldred</surname>
<given-names>SF</given-names>
</name>
,
<name>
<surname>Collins</surname>
<given-names>PJ</given-names>
</name>
,
<name>
<surname>Davis</surname>
<given-names>CA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>An integrated encyclopedia of DNA elements in the human genome</article-title>
.
<source>Nature</source>
<volume>489</volume>
:
<fpage>57</fpage>
<lpage>74</lpage>
.
<pub-id pub-id-type="pmid">22955616</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Andersson1">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Andersson</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Gebhard</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Miguel-Escalada</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Hoof</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Bornholdt</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
(
<year>2014</year>
)
<article-title>An atlas of active enhancers across human cell types and tissues</article-title>
.
<source>Nature</source>
<volume>507</volume>
:
<fpage>455</fpage>
<lpage>461</lpage>
.
<pub-id pub-id-type="pmid">24670763</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Wamstad1">
<label>28</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wamstad</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Alexander</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Truty</surname>
<given-names>RM</given-names>
</name>
,
<name>
<surname>Shrikumar</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>F</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Dynamic and coordinated epigenetic regulation of developmental transitions in the cardiac lineage</article-title>
.
<source>Cell</source>
<volume>151</volume>
:
<fpage>206</fpage>
<lpage>220</lpage>
.
<pub-id pub-id-type="pmid">22981692</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Paige1">
<label>29</label>
<mixed-citation publication-type="journal">
<name>
<surname>Paige</surname>
<given-names>SL</given-names>
</name>
,
<name>
<surname>Thomas</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Stoick-Cooper</surname>
<given-names>CL</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Maves</surname>
<given-names>L</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>A temporal chromatin signature in human embryonic stem cells identifies regulators of cardiac development</article-title>
.
<source>Cell</source>
<volume>151</volume>
:
<fpage>221</fpage>
<lpage>232</lpage>
.
<pub-id pub-id-type="pmid">22981225</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Jin1">
<label>30</label>
<mixed-citation publication-type="journal">
<name>
<surname>Jin</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Zang</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Cui</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Peng</surname>
<given-names>W</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions</article-title>
.
<source>Nature genetics</source>
<volume>41</volume>
:
<fpage>941</fpage>
<lpage>945</lpage>
.
<pub-id pub-id-type="pmid">19633671</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-He1">
<label>31</label>
<mixed-citation publication-type="journal">
<name>
<surname>He</surname>
<given-names>HH</given-names>
</name>
,
<name>
<surname>Meyer</surname>
<given-names>CA</given-names>
</name>
,
<name>
<surname>Shin</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Bailey</surname>
<given-names>ST</given-names>
</name>
,
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Nucleosome dynamics define transcriptional enhancers</article-title>
.
<source>Nature genetics</source>
<volume>42</volume>
:
<fpage>343</fpage>
<lpage>347</lpage>
.
<pub-id pub-id-type="pmid">20208536</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Thurman1">
<label>32</label>
<mixed-citation publication-type="journal">
<name>
<surname>Thurman</surname>
<given-names>RE</given-names>
</name>
,
<name>
<surname>Rynes</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Humbert</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Vierstra</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Maurano</surname>
<given-names>MT</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>The accessible chromatin landscape of the human genome</article-title>
.
<source>Nature</source>
<volume>489</volume>
:
<fpage>75</fpage>
<lpage>82</lpage>
.
<pub-id pub-id-type="pmid">22955617</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Heintzman2">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Heintzman</surname>
<given-names>ND</given-names>
</name>
,
<name>
<surname>Stuart</surname>
<given-names>RK</given-names>
</name>
,
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Ching</surname>
<given-names>CW</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome</article-title>
.
<source>Nature genetics</source>
<volume>39</volume>
:
<fpage>311</fpage>
<lpage>318</lpage>
.
<pub-id pub-id-type="pmid">17277777</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Cotney1">
<label>34</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cotney</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Leng</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Oh</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>DeMare</surname>
<given-names>LE</given-names>
</name>
,
<name>
<surname>Reilly</surname>
<given-names>SK</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Chromatin state signatures associated with tissue-specific gene expression and enhancer activity in the embryonic limb</article-title>
.
<source>Genome research</source>
<volume>22</volume>
:
<fpage>1069</fpage>
<lpage>1080</lpage>
.
<pub-id pub-id-type="pmid">22421546</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Creyghton1">
<label>35</label>
<mixed-citation publication-type="journal">
<name>
<surname>Creyghton</surname>
<given-names>MP</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>AW</given-names>
</name>
,
<name>
<surname>Welstead</surname>
<given-names>GG</given-names>
</name>
,
<name>
<surname>Kooistra</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Carey</surname>
<given-names>BW</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Histone H3K27ac separates active from poised enhancers and predicts developmental state</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>107</volume>
:
<fpage>21931</fpage>
<lpage>21936</lpage>
.
<pub-id pub-id-type="pmid">21106759</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-RadaIglesias1">
<label>36</label>
<mixed-citation publication-type="journal">
<name>
<surname>Rada-Iglesias</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Bajpai</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Swigut</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Brugmann</surname>
<given-names>SA</given-names>
</name>
,
<name>
<surname>Flynn</surname>
<given-names>RA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>A unique chromatin signature uncovers early developmental enhancers in humans</article-title>
.
<source>Nature</source>
<volume>470</volume>
:
<fpage>279</fpage>
<lpage>283</lpage>
.
<pub-id pub-id-type="pmid">21160473</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Mikkelsen1">
<label>37</label>
<mixed-citation publication-type="journal">
<name>
<surname>Mikkelsen</surname>
<given-names>TS</given-names>
</name>
,
<name>
<surname>Ku</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
,
<name>
<surname>Issac</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Lieberman</surname>
<given-names>E</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>Genome-wide maps of chromatin state in pluripotent and lineage-committed cells</article-title>
.
<source>Nature</source>
<volume>448</volume>
:
<fpage>553</fpage>
<lpage>560</lpage>
.
<pub-id pub-id-type="pmid">17603471</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Zhou1">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhou</surname>
<given-names>VW</given-names>
</name>
,
<name>
<surname>Goren</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
(
<year>2011</year>
)
<article-title>Charting histone modifications and the functional organization of mammalian genomes</article-title>
.
<source>Nature reviews Genetics</source>
<volume>12</volume>
:
<fpage>7</fpage>
<lpage>18</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Blow1">
<label>39</label>
<mixed-citation publication-type="journal">
<name>
<surname>Blow</surname>
<given-names>MJ</given-names>
</name>
,
<name>
<surname>McCulley</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Akiyama</surname>
<given-names>JA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>ChIP-Seq identification of weakly conserved heart enhancers</article-title>
.
<source>Nature genetics</source>
<volume>42</volume>
:
<fpage>806</fpage>
<lpage>810</lpage>
.
<pub-id pub-id-type="pmid">20729851</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Ghisletti1">
<label>40</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ghisletti</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Barozzi</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Mietton</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Polletti</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>De Santa</surname>
<given-names>F</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages</article-title>
.
<source>Immunity</source>
<volume>32</volume>
:
<fpage>317</fpage>
<lpage>328</lpage>
.
<pub-id pub-id-type="pmid">20206554</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-May1">
<label>41</label>
<mixed-citation publication-type="journal">
<name>
<surname>May</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Blow</surname>
<given-names>MJ</given-names>
</name>
,
<name>
<surname>Kaplan</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>McCulley</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Jensen</surname>
<given-names>BC</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Large-scale discovery of enhancers from human heart tissue</article-title>
.
<source>Nature genetics</source>
<volume>44</volume>
:
<fpage>89</fpage>
<lpage>93</lpage>
.
<pub-id pub-id-type="pmid">22138689</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Zinzen1">
<label>42</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zinzen</surname>
<given-names>RP</given-names>
</name>
,
<name>
<surname>Girardot</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Gagneur</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Braun</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Furlong</surname>
<given-names>EE</given-names>
</name>
(
<year>2009</year>
)
<article-title>Combinatorial binding predicts spatio-temporal cis-regulatory activity</article-title>
.
<source>Nature</source>
<volume>462</volume>
:
<fpage>65</fpage>
<lpage>70</lpage>
.
<pub-id pub-id-type="pmid">19890324</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-He2">
<label>43</label>
<mixed-citation publication-type="journal">
<name>
<surname>He</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Kong</surname>
<given-names>SW</given-names>
</name>
,
<name>
<surname>Ma</surname>
<given-names>Q</given-names>
</name>
,
<name>
<surname>Pu</surname>
<given-names>WT</given-names>
</name>
(
<year>2011</year>
)
<article-title>Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>108</volume>
:
<fpage>5632</fpage>
<lpage>5637</lpage>
.
<pub-id pub-id-type="pmid">21415370</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Yip1">
<label>44</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yip</surname>
<given-names>KY</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Bhardwaj</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Brown</surname>
<given-names>JB</given-names>
</name>
,
<name>
<surname>Leng</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors</article-title>
.
<source>Genome biology</source>
<volume>13</volume>
:
<fpage>R48</fpage>
.
<pub-id pub-id-type="pmid">22950945</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Cheng1">
<label>45</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cheng</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Alexander</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Min</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Leng</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Yip</surname>
<given-names>KY</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Understanding transcriptional regulation by integrative analysis of transcription factor binding data</article-title>
.
<source>Genome research</source>
<volume>22</volume>
:
<fpage>1658</fpage>
<lpage>1667</lpage>
.
<pub-id pub-id-type="pmid">22955978</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Orom1">
<label>46</label>
<mixed-citation publication-type="journal">
<name>
<surname>Orom</surname>
<given-names>UA</given-names>
</name>
,
<name>
<surname>Derrien</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Beringer</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Gumireddy</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Gardini</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Long noncoding RNAs with enhancer-like function in human cells</article-title>
.
<source>Cell</source>
<volume>143</volume>
:
<fpage>46</fpage>
<lpage>58</lpage>
.
<pub-id pub-id-type="pmid">20887892</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Barski1">
<label>47</label>
<mixed-citation publication-type="journal">
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Cuddapah</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Cui</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Roh</surname>
<given-names>TY</given-names>
</name>
,
<name>
<surname>Schones</surname>
<given-names>DE</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>High-resolution profiling of histone methylations in the human genome</article-title>
.
<source>Cell</source>
<volume>129</volume>
:
<fpage>823</fpage>
<lpage>837</lpage>
.
<pub-id pub-id-type="pmid">17512414</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Wang1">
<label>48</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Zang</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Rosenfeld</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Schones</surname>
<given-names>DE</given-names>
</name>
,
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>Combinatorial patterns of histone acetylations and methylations in the human genome</article-title>
.
<source>Nature genetics</source>
<volume>40</volume>
:
<fpage>897</fpage>
<lpage>903</lpage>
.
<pub-id pub-id-type="pmid">18552846</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Zentner1">
<label>49</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zentner</surname>
<given-names>GE</given-names>
</name>
,
<name>
<surname>Tesar</surname>
<given-names>PJ</given-names>
</name>
,
<name>
<surname>Scacheri</surname>
<given-names>PC</given-names>
</name>
(
<year>2011</year>
)
<article-title>Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>1273</fpage>
<lpage>1283</lpage>
.
<pub-id pub-id-type="pmid">21632746</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Bonn1">
<label>50</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bonn</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Zinzen</surname>
<given-names>RP</given-names>
</name>
,
<name>
<surname>Girardot</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Gustafson</surname>
<given-names>EH</given-names>
</name>
,
<name>
<surname>Perez-Gonzalez</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development</article-title>
.
<source>Nature genetics</source>
<volume>44</volume>
:
<fpage>148</fpage>
<lpage>156</lpage>
.
<pub-id pub-id-type="pmid">22231485</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Narlikar1">
<label>51</label>
<mixed-citation publication-type="journal">
<name>
<surname>Narlikar</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Sakabe</surname>
<given-names>NJ</given-names>
</name>
,
<name>
<surname>Blanski</surname>
<given-names>AA</given-names>
</name>
,
<name>
<surname>Arimura</surname>
<given-names>FE</given-names>
</name>
,
<name>
<surname>Westlund</surname>
<given-names>JM</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Genome-wide discovery of human heart enhancers</article-title>
.
<source>Genome research</source>
<volume>20</volume>
:
<fpage>381</fpage>
<lpage>392</lpage>
.
<pub-id pub-id-type="pmid">20075146</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Burzynski1">
<label>52</label>
<mixed-citation publication-type="journal">
<name>
<surname>Burzynski</surname>
<given-names>GM</given-names>
</name>
,
<name>
<surname>Reed</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Taher</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Stine</surname>
<given-names>ZE</given-names>
</name>
,
<name>
<surname>Matsui</surname>
<given-names>T</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control</article-title>
.
<source>Genome research</source>
<volume>22</volume>
:
<fpage>2278</fpage>
<lpage>2289</lpage>
.
<pub-id pub-id-type="pmid">22759862</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Busser1">
<label>53</label>
<mixed-citation publication-type="journal">
<name>
<surname>Busser</surname>
<given-names>BW</given-names>
</name>
,
<name>
<surname>Taher</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Tansey</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Bloom</surname>
<given-names>MJ</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis</article-title>
.
<source>PLoS genetics</source>
<volume>8</volume>
:
<fpage>e1002531</fpage>
.
<pub-id pub-id-type="pmid">22412381</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Lee1">
<label>54</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lee</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Karchin</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Beer</surname>
<given-names>MA</given-names>
</name>
(
<year>2011</year>
)
<article-title>Discriminative prediction of mammalian enhancers from DNA sequence</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>2167</fpage>
<lpage>2180</lpage>
.
<pub-id pub-id-type="pmid">21875935</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Gorkin1">
<label>55</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gorkin</surname>
<given-names>DU</given-names>
</name>
,
<name>
<surname>Lee</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Reed</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Fletez-Brant</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Bessling</surname>
<given-names>SL</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Integration of ChIP-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes</article-title>
.
<source>Genome research</source>
<volume>22</volume>
:
<fpage>2290</fpage>
<lpage>2301</lpage>
.
<pub-id pub-id-type="pmid">23019145</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Rajagopal1">
<label>56</label>
<mixed-citation publication-type="journal">
<name>
<surname>Rajagopal</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Xie</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Wagner</surname>
<given-names>U</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>RFECS: a random-forest based algorithm for enhancer identification from chromatin state</article-title>
.
<source>PLoS computational biology</source>
<volume>9</volume>
:
<fpage>e1002968</fpage>
.
<pub-id pub-id-type="pmid">23526891</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Lahdesmaki1">
<label>57</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lahdesmaki</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Rust</surname>
<given-names>AG</given-names>
</name>
,
<name>
<surname>Shmulevich</surname>
<given-names>I</given-names>
</name>
(
<year>2008</year>
)
<article-title>Probabilistic inference of transcription factor binding from multiple data sources</article-title>
.
<source>PloS one</source>
<volume>3</volume>
:
<fpage>e1820</fpage>
.
<pub-id pub-id-type="pmid">18364997</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Kantorovitz1">
<label>58</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kantorovitz</surname>
<given-names>MR</given-names>
</name>
,
<name>
<surname>Kazemian</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Kinston</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Miranda-Saavedra</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Zhu</surname>
<given-names>Q</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse</article-title>
.
<source>Developmental cell</source>
<volume>17</volume>
:
<fpage>568</fpage>
<lpage>579</lpage>
.
<pub-id pub-id-type="pmid">19853570</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Won1">
<label>59</label>
<mixed-citation publication-type="journal">
<name>
<surname>Won</surname>
<given-names>KJ</given-names>
</name>
,
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
(
<year>2010</year>
)
<article-title>Genome-wide prediction of transcription factor binding sites using an integrated model</article-title>
.
<source>Genome biology</source>
<volume>11</volume>
:
<fpage>R7</fpage>
.
<pub-id pub-id-type="pmid">20096096</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-PiqueRegi1">
<label>60</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pique-Regi</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Degner</surname>
<given-names>JF</given-names>
</name>
,
<name>
<surname>Pai</surname>
<given-names>AA</given-names>
</name>
,
<name>
<surname>Gaffney</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Gilad</surname>
<given-names>Y</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>447</fpage>
<lpage>455</lpage>
.
<pub-id pub-id-type="pmid">21106904</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Arvey1">
<label>61</label>
<mixed-citation publication-type="journal">
<name>
<surname>Arvey</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Agius</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Noble</surname>
<given-names>WS</given-names>
</name>
,
<name>
<surname>Leslie</surname>
<given-names>C</given-names>
</name>
(
<year>2012</year>
)
<article-title>Sequence and chromatin determinants of cell-type-specific transcription factor binding</article-title>
.
<source>Genome research</source>
<volume>22</volume>
:
<fpage>1723</fpage>
<lpage>1734</lpage>
.
<pub-id pub-id-type="pmid">22955984</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-CuellarPartida1">
<label>62</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cuellar-Partida</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Buske</surname>
<given-names>FA</given-names>
</name>
,
<name>
<surname>McLeay</surname>
<given-names>RC</given-names>
</name>
,
<name>
<surname>Whitington</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Noble</surname>
<given-names>WS</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Epigenetic priors for identifying active transcription factor binding sites</article-title>
.
<source>Bioinformatics</source>
<volume>28</volume>
:
<fpage>56</fpage>
<lpage>62</lpage>
.
<pub-id pub-id-type="pmid">22072382</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Wang2">
<label>63</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Do</surname>
<given-names>HT</given-names>
</name>
(
<year>2012</year>
)
<article-title>Computational localization of transcription factor binding sites using extreme learning machines</article-title>
.
<source>Soft Comput</source>
<volume>16</volume>
:
<fpage>1595</fpage>
<lpage>1606</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Ernst1">
<label>64</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ernst</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Kheradpour</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Mikkelsen</surname>
<given-names>TS</given-names>
</name>
,
<name>
<surname>Shoresh</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Ward</surname>
<given-names>LD</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>Mapping and analysis of chromatin state dynamics in nine human cell types</article-title>
.
<source>Nature</source>
<volume>473</volume>
:
<fpage>43</fpage>
<lpage>49</lpage>
.
<pub-id pub-id-type="pmid">21441907</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Hoffman1">
<label>65</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hoffman</surname>
<given-names>MM</given-names>
</name>
,
<name>
<surname>Buske</surname>
<given-names>OJ</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Weng</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Bilmes</surname>
<given-names>JA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Unsupervised pattern discovery in human chromatin structure through genomic segmentation</article-title>
.
<source>Nature methods</source>
<volume>9</volume>
:
<fpage>473</fpage>
<lpage>476</lpage>
.
<pub-id pub-id-type="pmid">22426492</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Sonnenburg1">
<label>66</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sonnenburg</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Zien</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Ratsch</surname>
<given-names>G</given-names>
</name>
(
<year>2006</year>
)
<article-title>ARTS: accurate recognition of transcription starts in human</article-title>
.
<source>Bioinformatics</source>
<volume>22</volume>
:
<fpage>e472</fpage>
<lpage>480</lpage>
.
<pub-id pub-id-type="pmid">16873509</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Kloft1">
<label>67</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kloft</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Brefeld</surname>
<given-names>U</given-names>
</name>
,
<name>
<surname>Sonnenburg</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Zien</surname>
<given-names>A</given-names>
</name>
(
<year>2011</year>
)
<article-title>lp-Norm Multiple Kernel Learning</article-title>
.
<source>Journal of Machine Learning Research</source>
<volume>12</volume>
:
<fpage>953</fpage>
<lpage>997</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Boser1">
<label>68</label>
<mixed-citation publication-type="book">Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on Computational learning theory. Pittsburgh, Pennsylvania, USA: ACM. pp. 144–152.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Visel6">
<label>69</label>
<mixed-citation publication-type="journal">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Minovitsky</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Dubchak</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Pennacchio</surname>
<given-names>LA</given-names>
</name>
(
<year>2007</year>
)
<article-title>VISTA Enhancer Browser–a database of tissue-specific human enhancers</article-title>
.
<source>Nucleic acids research</source>
<volume>35</volume>
:
<fpage>D88</fpage>
<lpage>92</lpage>
.
<pub-id pub-id-type="pmid">17130149</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-ORahilly1">
<label>70</label>
<mixed-citation publication-type="journal">
<name>
<surname>O'Rahilly</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Muller</surname>
<given-names>F</given-names>
</name>
(
<year>2010</year>
)
<article-title>Developmental stages in human embryos: revised and new measurements</article-title>
.
<source>Cells, tissues, organs</source>
<volume>192</volume>
:
<fpage>73</fpage>
<lpage>84</lpage>
.
<pub-id pub-id-type="pmid">20185898</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Leslie1">
<label>71</label>
<mixed-citation publication-type="journal">
<name>
<surname>Leslie</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Eskin</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Noble</surname>
<given-names>WS</given-names>
</name>
(
<year>2002</year>
)
<article-title>The spectrum kernel: a string kernel for SVM protein classification</article-title>
.
<source>Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing</source>
<fpage>564</fpage>
<lpage>575</lpage>
.
<pub-id pub-id-type="pmid">11928508</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Siepel1">
<label>72</label>
<mixed-citation publication-type="journal">
<name>
<surname>Siepel</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Bejerano</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Pedersen</surname>
<given-names>JS</given-names>
</name>
,
<name>
<surname>Hinrichs</surname>
<given-names>AS</given-names>
</name>
,
<name>
<surname>Hou</surname>
<given-names>M</given-names>
</name>
,
<etal>et al</etal>
(
<year>2005</year>
)
<article-title>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes</article-title>
.
<source>Genome research</source>
<volume>15</volume>
:
<fpage>1034</fpage>
<lpage>1050</lpage>
.
<pub-id pub-id-type="pmid">16024819</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Taher1">
<label>73</label>
<mixed-citation publication-type="journal">
<name>
<surname>Taher</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Narlikar</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Ovcharenko</surname>
<given-names>I</given-names>
</name>
(
<year>2012</year>
)
<article-title>CLARE: Cracking the LAnguage of Regulatory Elements</article-title>
.
<source>Bioinformatics</source>
<volume>28</volume>
:
<fpage>581</fpage>
<lpage>583</lpage>
.
<pub-id pub-id-type="pmid">22199387</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Capra1">
<label>74</label>
<mixed-citation publication-type="journal">
<name>
<surname>Capra</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Erwin</surname>
<given-names>GD</given-names>
</name>
,
<name>
<surname>McKinsey</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Rubenstein</surname>
<given-names>JLR</given-names>
</name>
,
<name>
<surname>Pollard</surname>
<given-names>KS</given-names>
</name>
(
<year>2013</year>
)
<article-title>Many human accelerated regions are developmental enhancers</article-title>
.
<source>Philos Trans R Soc Lond B Biol Sci</source>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Nord1">
<label>75</label>
<mixed-citation publication-type="journal">
<name>
<surname>Nord</surname>
<given-names>AS</given-names>
</name>
,
<name>
<surname>Blow</surname>
<given-names>MJ</given-names>
</name>
,
<name>
<surname>Attanasio</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Akiyama</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Holt</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>Rapid and Pervasive Changes in Genome-wide Enhancer Usage during Mammalian Development</article-title>
.
<source>Cell</source>
<volume>155</volume>
:
<fpage>1521</fpage>
<lpage>1531</lpage>
.
<pub-id pub-id-type="pmid">24360275</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Hindorff1">
<label>76</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hindorff</surname>
<given-names>LA</given-names>
</name>
,
<name>
<surname>Sethupathy</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Junkins</surname>
<given-names>HA</given-names>
</name>
,
<name>
<surname>Ramos</surname>
<given-names>EM</given-names>
</name>
,
<name>
<surname>Mehta</surname>
<given-names>JP</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Potential etiologic and functional implications of genome-wide association loci for human diseases and traits</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>106</volume>
:
<fpage>9362</fpage>
<lpage>9367</lpage>
.
<pub-id pub-id-type="pmid">19474294</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Kume1">
<label>77</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kume</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Deng</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Hogan</surname>
<given-names>BL</given-names>
</name>
(
<year>2000</year>
)
<article-title>Murine forkhead/winged helix genes Foxc1 (Mf1) and Foxc2 (Mfh1) are required for the early organogenesis of the kidney and urinary tract</article-title>
.
<source>Development</source>
<volume>127</volume>
:
<fpage>1387</fpage>
<lpage>1395</lpage>
.
<pub-id pub-id-type="pmid">10704385</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Kume2">
<label>78</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kume</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Jiang</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Topczewska</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Hogan</surname>
<given-names>BL</given-names>
</name>
(
<year>2001</year>
)
<article-title>The murine winged helix transcription factors, Foxc1 and Foxc2, are both required for cardiovascular development and somitogenesis</article-title>
.
<source>Genes & development</source>
<volume>15</volume>
:
<fpage>2470</fpage>
<lpage>2482</lpage>
.
<pub-id pub-id-type="pmid">11562355</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Maiese1">
<label>79</label>
<mixed-citation publication-type="book">Maiese K (2010) Forkhead Transcription Factors. New York: Springer.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Smith1">
<label>80</label>
<mixed-citation publication-type="journal">
<name>
<surname>Smith</surname>
<given-names>RS</given-names>
</name>
,
<name>
<surname>Zabaleta</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Kume</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Savinova</surname>
<given-names>OV</given-names>
</name>
,
<name>
<surname>Kidson</surname>
<given-names>SH</given-names>
</name>
,
<etal>et al</etal>
(
<year>2000</year>
)
<article-title>Haploinsufficiency of the transcription factors FOXC1 and FOXC2 results in aberrant ocular development</article-title>
.
<source>Human molecular genetics</source>
<volume>9</volume>
:
<fpage>1021</fpage>
<lpage>1032</lpage>
.
<pub-id pub-id-type="pmid">10767326</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Aldinger1">
<label>81</label>
<mixed-citation publication-type="journal">
<name>
<surname>Aldinger</surname>
<given-names>KA</given-names>
</name>
,
<name>
<surname>Lehmann</surname>
<given-names>OJ</given-names>
</name>
,
<name>
<surname>Hudgins</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Chizhikov</surname>
<given-names>VV</given-names>
</name>
,
<name>
<surname>Bassuk</surname>
<given-names>AG</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>FOXC1 is required for normal cerebellar development and is a major contributor to chromosome 6p25.3 Dandy-Walker malformation</article-title>
.
<source>Nature genetics</source>
<volume>41</volume>
:
<fpage>1037</fpage>
<lpage>1042</lpage>
.
<pub-id pub-id-type="pmid">19668217</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Seuntjens1">
<label>82</label>
<mixed-citation publication-type="journal">
<name>
<surname>Seuntjens</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Nityanandam</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Miquelajauregui</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Debruyn</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Stryjewska</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Sip1 regulates sequential fate decisions by feedback signaling from postmitotic neurons to progenitors</article-title>
.
<source>Nature neuroscience</source>
<volume>12</volume>
:
<fpage>1373</fpage>
<lpage>1380</lpage>
.
<pub-id pub-id-type="pmid">19838179</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Miquelajauregui1">
<label>83</label>
<mixed-citation publication-type="journal">
<name>
<surname>Miquelajauregui</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Van de Putte</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Polyakov</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Nityanandam</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Boppana</surname>
<given-names>S</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>Smad-interacting protein-1 (Zfhx1b) acts upstream of Wnt signaling in the mouse hippocampus and controls its formation</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>104</volume>
:
<fpage>12919</fpage>
<lpage>12924</lpage>
.
<pub-id pub-id-type="pmid">17644613</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Weng1">
<label>84</label>
<mixed-citation publication-type="journal">
<name>
<surname>Weng</surname>
<given-names>Q</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Xu</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Yang</surname>
<given-names>B</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Dual-mode modulation of Smad signaling by Smad-interacting protein Sip1 is required for myelination in the central nervous system</article-title>
.
<source>Neuron</source>
<volume>73</volume>
:
<fpage>713</fpage>
<lpage>728</lpage>
.
<pub-id pub-id-type="pmid">22365546</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Renthal1">
<label>85</label>
<mixed-citation publication-type="journal">
<name>
<surname>Renthal</surname>
<given-names>NE</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>CC</given-names>
</name>
,
<name>
<surname>Williams</surname>
<given-names>KC</given-names>
</name>
,
<name>
<surname>Gerard</surname>
<given-names>RD</given-names>
</name>
,
<name>
<surname>Prange-Kiel</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>miR-200 family and targets, ZEB1 and ZEB2, modulate uterine quiescence and contractility during pregnancy and labor</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>107</volume>
:
<fpage>20828</fpage>
<lpage>20833</lpage>
.
<pub-id pub-id-type="pmid">21079000</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Wilson1">
<label>86</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wilson</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Mowat</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Dastot-Le Moal</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Cacheux</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Kaariainen</surname>
<given-names>H</given-names>
</name>
,
<etal>et al</etal>
(
<year>2003</year>
)
<article-title>Further delineation of the phenotype associated with heterozygous mutations in ZFHX1B</article-title>
.
<source>American journal of medical genetics Part A</source>
<volume>119A</volume>
:
<fpage>257</fpage>
<lpage>265</lpage>
.
<pub-id pub-id-type="pmid">12784289</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-ElKasti1">
<label>87</label>
<mixed-citation publication-type="journal">
<name>
<surname>El-Kasti</surname>
<given-names>MM</given-names>
</name>
,
<name>
<surname>Wells</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Carter</surname>
<given-names>DA</given-names>
</name>
(
<year>2012</year>
)
<article-title>A novel long-range enhancer regulates postnatal expression of Zeb2: implications for Mowat-Wilson syndrome phenotypes</article-title>
.
<source>Human molecular genetics</source>
<volume>21</volume>
:
<fpage>5429</fpage>
<lpage>5442</lpage>
.
<pub-id pub-id-type="pmid">23001561</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Pollard1">
<label>88</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pollard</surname>
<given-names>KS</given-names>
</name>
,
<name>
<surname>Salama</surname>
<given-names>SR</given-names>
</name>
,
<name>
<surname>King</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Kern</surname>
<given-names>AD</given-names>
</name>
,
<name>
<surname>Dreszer</surname>
<given-names>T</given-names>
</name>
,
<etal>et al</etal>
(
<year>2006</year>
)
<article-title>Forces shaping the fastest evolving regions in the human genome</article-title>
.
<source>PLoS genetics</source>
<volume>2</volume>
:
<fpage>e168</fpage>
.
<pub-id pub-id-type="pmid">17040131</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-LindbladToh1">
<label>89</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lindblad-Toh</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Garber</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Zuk</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Lin</surname>
<given-names>MF</given-names>
</name>
,
<name>
<surname>Parker</surname>
<given-names>BJ</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>A high-resolution map of human evolutionary constraint using 29 mammals</article-title>
.
<source>Nature</source>
<volume>478</volume>
:
<fpage>476</fpage>
<lpage>482</lpage>
.
<pub-id pub-id-type="pmid">21993624</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Capra2">
<label>90</label>
<mixed-citation publication-type="journal">
<name>
<surname>Capra</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Erwin</surname>
<given-names>GD</given-names>
</name>
,
<name>
<surname>McKinsey</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Rubenstein</surname>
<given-names>JLR</given-names>
</name>
,
<name>
<surname>Pollard</surname>
<given-names>KS</given-names>
</name>
(
<year>2013</year>
)
<article-title>Many human accelerated regions are developmental enhancers</article-title>
.
<source>Philosophical Transactions of the Royal Society B: Biological Sciences</source>
<volume>3681</volume>
:
<fpage>1632</fpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Woznica1">
<label>91</label>
<mixed-citation publication-type="journal">
<name>
<surname>Woznica</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Haeussler</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Starobinska</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Jemmett</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Initial deployment of the cardiogenic gene regulatory network in the basal chordate, Ciona intestinalis</article-title>
.
<source>Developmental biology</source>
<volume>368</volume>
:
<fpage>127</fpage>
<lpage>139</lpage>
.
<pub-id pub-id-type="pmid">22595514</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-KoshibaTakeuchi1">
<label>92</label>
<mixed-citation publication-type="journal">
<name>
<surname>Koshiba-Takeuchi</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Mori</surname>
<given-names>AD</given-names>
</name>
,
<name>
<surname>Kaynak</surname>
<given-names>BL</given-names>
</name>
,
<name>
<surname>Cebra-Thomas</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Sukonnik</surname>
<given-names>T</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Reptilian heart development and the molecular basis of cardiac chamber evolution</article-title>
.
<source>Nature</source>
<volume>461</volume>
:
<fpage>95</fpage>
<lpage>98</lpage>
.
<pub-id pub-id-type="pmid">19727199</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Casci1">
<label>93</label>
<mixed-citation publication-type="journal">
<name>
<surname>Casci</surname>
<given-names>T</given-names>
</name>
(
<year>2011</year>
)
<article-title>Development: Hourglass theory gets molecular approval</article-title>
.
<source>Nature reviews Genetics</source>
<volume>12</volume>
:
<fpage>76</fpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-White1">
<label>94</label>
<mixed-citation publication-type="journal">
<name>
<surname>White</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Myers</surname>
<given-names>CA</given-names>
</name>
,
<name>
<surname>Corbo</surname>
<given-names>JC</given-names>
</name>
,
<name>
<surname>Cohen</surname>
<given-names>BA</given-names>
</name>
(
<year>2013</year>
)
<article-title>Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>110</volume>
:
<fpage>11952</fpage>
<lpage>11957</lpage>
.
<pub-id pub-id-type="pmid">23818646</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Birney1">
<label>95</label>
<mixed-citation publication-type="journal">
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Stamatoyannopoulos</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Dutta</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Guigo</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Gingeras</surname>
<given-names>TR</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project</article-title>
.
<source>Nature</source>
<volume>447</volume>
:
<fpage>799</fpage>
<lpage>816</lpage>
.
<pub-id pub-id-type="pmid">17571346</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Quinlan1">
<label>96</label>
<mixed-citation publication-type="journal">
<name>
<surname>Quinlan</surname>
<given-names>AR</given-names>
</name>
,
<name>
<surname>Hall</surname>
<given-names>IM</given-names>
</name>
(
<year>2010</year>
)
<article-title>BEDTools: a flexible suite of utilities for comparing genomic features</article-title>
.
<source>Bioinformatics</source>
<volume>26</volume>
:
<fpage>841</fpage>
<lpage>842</lpage>
.
<pub-id pub-id-type="pmid">20110278</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-BenHur1">
<label>97</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ben-Hur</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Weston</surname>
<given-names>J</given-names>
</name>
(
<year>2010</year>
)
<article-title>A user's guide to support vector machines</article-title>
.
<source>Methods in molecular biology</source>
<volume>609</volume>
:
<fpage>223</fpage>
<lpage>239</lpage>
.
<pub-id pub-id-type="pmid">20221922</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Sonnenburg2">
<label>98</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sonnenburg</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Ratsch</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Henschel</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Widmer</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Behr</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>The SHOGUN Machine Learning Toolbox</article-title>
.
<source>J Mach Learn Res</source>
<volume>99</volume>
:
<fpage>1799</fpage>
<lpage>1802</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Salzberg1">
<label>99</label>
<mixed-citation publication-type="journal">
<name>
<surname>Salzberg</surname>
<given-names>S</given-names>
</name>
(
<year>1997</year>
)
<article-title>On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach</article-title>
.
<source>Data Mining and Knowledge Discovery</source>
<volume>1</volume>
:
<fpage>317</fpage>
<lpage>327</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Dietterich1">
<label>100</label>
<mixed-citation publication-type="journal">
<name>
<surname>Dietterich</surname>
<given-names>TG</given-names>
</name>
(
<year>1998</year>
)
<article-title>Approximate statistical tests for comparing supervised classification learning algorithms</article-title>
.
<source>Neural Comput</source>
<volume>10</volume>
:
<fpage>1895</fpage>
<lpage>1923</lpage>
.
<pub-id pub-id-type="pmid">9744903</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Su1">
<label>101</label>
<mixed-citation publication-type="journal">
<name>
<surname>Su</surname>
<given-names>AI</given-names>
</name>
,
<name>
<surname>Wiltshire</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Batalov</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Lapp</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Ching</surname>
<given-names>KA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2004</year>
)
<article-title>A gene atlas of the mouse and human protein-encoding transcriptomes</article-title>
.
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>101</volume>
:
<fpage>6062</fpage>
<lpage>6067</lpage>
.
<pub-id pub-id-type="pmid">15075390</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-McLean1">
<label>102</label>
<mixed-citation publication-type="journal">
<name>
<surname>McLean</surname>
<given-names>CY</given-names>
</name>
,
<name>
<surname>Bristor</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Hiller</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Clarke</surname>
<given-names>SL</given-names>
</name>
,
<name>
<surname>Schaar</surname>
<given-names>BT</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>GREAT improves functional interpretation of cis-regulatory regions</article-title>
.
<source>Nature biotechnology</source>
<volume>28</volume>
:
<fpage>495</fpage>
<lpage>501</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1003677-Grant1">
<label>103</label>
<mixed-citation publication-type="journal">
<name>
<surname>Grant</surname>
<given-names>CE</given-names>
</name>
,
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
,
<name>
<surname>Noble</surname>
<given-names>WS</given-names>
</name>
(
<year>2011</year>
)
<article-title>FIMO: scanning for occurrences of a given motif</article-title>
.
<source>Bioinformatics</source>
<volume>27</volume>
:
<fpage>1017</fpage>
<lpage>1018</lpage>
.
<pub-id pub-id-type="pmid">21330290</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Li1">
<label>104</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>Q</given-names>
</name>
,
<name>
<surname>Ritter</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Yang</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Dong</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>A systematic approach to identify functional motifs within vertebrate developmental enhancers</article-title>
.
<source>Developmental biology</source>
<volume>337</volume>
:
<fpage>484</fpage>
<lpage>495</lpage>
.
<pub-id pub-id-type="pmid">19850031</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1003677-Oksenberg1">
<label>105</label>
<mixed-citation publication-type="journal">
<name>
<surname>Oksenberg</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Stevison</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Wall</surname>
<given-names>JD</given-names>
</name>
,
<name>
<surname>Ahituv</surname>
<given-names>N</given-names>
</name>
(
<year>2013</year>
)
<article-title>Function and regulation of AUTS2, a gene implicated in autism and human evolution</article-title>
.
<source>PLoS genetics</source>
<volume>9</volume>
:
<fpage>e1003221</fpage>
.
<pub-id pub-id-type="pmid">23349641</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Amérique/explor/PittsburghV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000259 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000259 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Amérique
   |area=    PittsburghV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4072507
   |texte=   Integrating Diverse Datasets Improves Developmental Enhancer Prediction
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24967590" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a PittsburghV1 

Wicri

This area was generated with Dilib version V0.6.38.
Data generation: Fri Jun 18 17:37:45 2021. Site generation: Fri Jun 18 18:15:47 2021