Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters

Identifieur interne : 000108 ( Pmc/Checkpoint ); précédent : 000107; suivant : 000109

Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters

Auteurs : Paul Gagniuc [Roumanie] ; Constantin Ionescu-Tirgoviste [Roumanie]

Source :

RBID : PMC:3549790

Abstract

Background

The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species?

Results

In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from Arabidopsis thaliana, Drosophila melanogaster, Homo sapiens and Oryza sativa, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on comprehensive data from three different databases and a new computer model whose core is using Kappa index of coincidence.

Conclusions

To fully understand the connections between gene promoters and gene expression, we analyzed thousands of promoter sequences using our Kappa Index of Coincidence method and a specialized Optical Character Recognition (OCR) neural network. Under our criteria, 10 classes of promoters were detected. In addition, the existence of “transitional” promoters suggests that there is an evolutionary weighted continuum between classes, depending perhaps upon changes in their gene products.


Url:
DOI: 10.1186/1471-2164-13-512
PubMed: 23020586
PubMed Central: 3549790


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3549790

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters</title>
<author>
<name sortKey="Gagniuc, Paul" sort="Gagniuc, Paul" uniqKey="Gagniuc P" first="Paul" last="Gagniuc">Paul Gagniuc</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Institute of Genetics, University of Bucharest, Bucharest 060101, Romania</nlm:aff>
<country xml:lang="fr">Roumanie</country>
<wicri:regionArea>Institute of Genetics, University of Bucharest, Bucharest 060101</wicri:regionArea>
<wicri:noRegion>Bucharest 060101</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ionescu Tirgoviste, Constantin" sort="Ionescu Tirgoviste, Constantin" uniqKey="Ionescu Tirgoviste C" first="Constantin" last="Ionescu-Tirgoviste">Constantin Ionescu-Tirgoviste</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">National Institute of Diabetes, Nutrition and Metabolic Diseases “N.C. Paulescu”, Bucharest, Romania</nlm:aff>
<country xml:lang="fr">Roumanie</country>
<wicri:regionArea>National Institute of Diabetes, Nutrition and Metabolic Diseases “N.C. Paulescu”, Bucharest</wicri:regionArea>
<wicri:noRegion>Bucharest</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23020586</idno>
<idno type="pmc">3549790</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549790</idno>
<idno type="RBID">PMC:3549790</idno>
<idno type="doi">10.1186/1471-2164-13-512</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000099</idno>
<idno type="wicri:Area/Pmc/Curation">000099</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000108</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters</title>
<author>
<name sortKey="Gagniuc, Paul" sort="Gagniuc, Paul" uniqKey="Gagniuc P" first="Paul" last="Gagniuc">Paul Gagniuc</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Institute of Genetics, University of Bucharest, Bucharest 060101, Romania</nlm:aff>
<country xml:lang="fr">Roumanie</country>
<wicri:regionArea>Institute of Genetics, University of Bucharest, Bucharest 060101</wicri:regionArea>
<wicri:noRegion>Bucharest 060101</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ionescu Tirgoviste, Constantin" sort="Ionescu Tirgoviste, Constantin" uniqKey="Ionescu Tirgoviste C" first="Constantin" last="Ionescu-Tirgoviste">Constantin Ionescu-Tirgoviste</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">National Institute of Diabetes, Nutrition and Metabolic Diseases “N.C. Paulescu”, Bucharest, Romania</nlm:aff>
<country xml:lang="fr">Roumanie</country>
<wicri:regionArea>National Institute of Diabetes, Nutrition and Metabolic Diseases “N.C. Paulescu”, Bucharest</wicri:regionArea>
<wicri:noRegion>Bucharest</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species?</p>
</sec>
<sec>
<title>Results</title>
<p>In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on comprehensive data from three different databases and a new computer model whose core is using Kappa index of coincidence.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>To fully understand the connections between gene promoters and gene expression, we analyzed thousands of promoter sequences using our Kappa Index of Coincidence method and a specialized Optical Character Recognition (OCR) neural network. Under our criteria, 10 classes of promoters were detected. In addition, the existence of “transitional” promoters suggests that there is an evolutionary weighted continuum between classes, depending perhaps upon changes in their gene products.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Levine, M" uniqKey="Levine M">M Levine</name>
</author>
<author>
<name sortKey="Tjian, R" uniqKey="Tjian R">R Tjian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smale, St" uniqKey="Smale S">ST Smale</name>
</author>
<author>
<name sortKey="Kadonaga, Jt" uniqKey="Kadonaga J">JT Kadonaga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hahn, S" uniqKey="Hahn S">S Hahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bucher, P" uniqKey="Bucher P">P Bucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mantovani, R" uniqKey="Mantovani R">R Mantovani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fujimori, S" uniqKey="Fujimori S">S Fujimori</name>
</author>
<author>
<name sortKey="Washio, T" uniqKey="Washio T">T Washio</name>
</author>
<author>
<name sortKey="Tomita, M" uniqKey="Tomita M">M Tomita</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tatarinova, T" uniqKey="Tatarinova T">T Tatarinova</name>
</author>
<author>
<name sortKey="Brover, V" uniqKey="Brover V">V Brover</name>
</author>
<author>
<name sortKey="Troukhan, M" uniqKey="Troukhan M">M Troukhan</name>
</author>
<author>
<name sortKey="Alexandrov, N" uniqKey="Alexandrov N">N Alexandrov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Molina, C" uniqKey="Molina C">C Molina</name>
</author>
<author>
<name sortKey="Grotewold, E" uniqKey="Grotewold E">E Grotewold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Juo, Zs" uniqKey="Juo Z">ZS Juo</name>
</author>
<author>
<name sortKey="Chiu, Tk" uniqKey="Chiu T">TK Chiu</name>
</author>
<author>
<name sortKey="Leiberman, Pm" uniqKey="Leiberman P">PM Leiberman</name>
</author>
<author>
<name sortKey="Baikalov, I" uniqKey="Baikalov I">I Baikalov</name>
</author>
<author>
<name sortKey="Berk, Aj" uniqKey="Berk A">AJ Berk</name>
</author>
<author>
<name sortKey="Dickerson, Re" uniqKey="Dickerson R">RE Dickerson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kiran, K" uniqKey="Kiran K">K Kiran</name>
</author>
<author>
<name sortKey="Ansari, Sa" uniqKey="Ansari S">SA Ansari</name>
</author>
<author>
<name sortKey="Srivastava, R" uniqKey="Srivastava R">R Srivastava</name>
</author>
<author>
<name sortKey="Lodhi, N" uniqKey="Lodhi N">N Lodhi</name>
</author>
<author>
<name sortKey="Chaturvedi, Cp" uniqKey="Chaturvedi C">CP Chaturvedi</name>
</author>
<author>
<name sortKey="Sawant, Sv" uniqKey="Sawant S">SV Sawant</name>
</author>
<author>
<name sortKey="Tuli, R" uniqKey="Tuli R">R Tuli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yamamoto, Yy" uniqKey="Yamamoto Y">YY Yamamoto</name>
</author>
<author>
<name sortKey="Ichida, H" uniqKey="Ichida H">H Ichida</name>
</author>
<author>
<name sortKey="Matsui, M" uniqKey="Matsui M">M Matsui</name>
</author>
<author>
<name sortKey="Obokata, J" uniqKey="Obokata J">J Obokata</name>
</author>
<author>
<name sortKey="Sakurai, T" uniqKey="Sakurai T">T Sakurai</name>
</author>
<author>
<name sortKey="Satou, M" uniqKey="Satou M">M Satou</name>
</author>
<author>
<name sortKey="Seki, M" uniqKey="Seki M">M Seki</name>
</author>
<author>
<name sortKey="Shinozaki, K" uniqKey="Shinozaki K">K Shinozaki</name>
</author>
<author>
<name sortKey="Abe, T" uniqKey="Abe T">T Abe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ioshikhes, Ip" uniqKey="Ioshikhes I">IP Ioshikhes</name>
</author>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ludwig, Mz" uniqKey="Ludwig M">MZ Ludwig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yamamoto, Yy" uniqKey="Yamamoto Y">YY Yamamoto</name>
</author>
<author>
<name sortKey="Yoshioka, Y" uniqKey="Yoshioka Y">Y Yoshioka</name>
</author>
<author>
<name sortKey="Hyakumachi, M" uniqKey="Hyakumachi M">M Hyakumachi</name>
</author>
<author>
<name sortKey="Obokata, J" uniqKey="Obokata J">J Obokata</name>
</author>
<author>
<name sortKey="Yoshiharu, Y" uniqKey="Yoshiharu Y">Y Yoshiharu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fukue, Y" uniqKey="Fukue Y">Y Fukue</name>
</author>
<author>
<name sortKey="Sumida, N" uniqKey="Sumida N">N Sumida</name>
</author>
<author>
<name sortKey="Nishikawa, J" uniqKey="Nishikawa J">J Nishikawa</name>
</author>
<author>
<name sortKey="Ohyama, T" uniqKey="Ohyama T">T Ohyama</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Florquin, K" uniqKey="Florquin K">K Florquin</name>
</author>
<author>
<name sortKey="Saeys, Y" uniqKey="Saeys Y">Y Saeys</name>
</author>
<author>
<name sortKey="Degroeve, S" uniqKey="Degroeve S">S Degroeve</name>
</author>
<author>
<name sortKey="Rouze, P" uniqKey="Rouze P">P Rouzé</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kanhere, A" uniqKey="Kanhere A">A Kanhere</name>
</author>
<author>
<name sortKey="Bansal, M" uniqKey="Bansal M">M Bansal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yamamoto, Yy" uniqKey="Yamamoto Y">YY Yamamoto</name>
</author>
<author>
<name sortKey="Ichida, H" uniqKey="Ichida H">H Ichida</name>
</author>
<author>
<name sortKey="Abe, T" uniqKey="Abe T">T Abe</name>
</author>
<author>
<name sortKey="Suzuki, Y" uniqKey="Suzuki Y">Y Suzuki</name>
</author>
<author>
<name sortKey="Sugano, S" uniqKey="Sugano S">S Sugano</name>
</author>
<author>
<name sortKey="Obokata, J" uniqKey="Obokata J">J Obokata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dineen, Dg" uniqKey="Dineen D">DG Dineen</name>
</author>
<author>
<name sortKey="Wilm, A" uniqKey="Wilm A">A Wilm</name>
</author>
<author>
<name sortKey="Cunningham, P" uniqKey="Cunningham P">P Cunningham</name>
</author>
<author>
<name sortKey="Higgins, Dg" uniqKey="Higgins D">DG Higgins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carninci, P" uniqKey="Carninci P">P Carninci</name>
</author>
<author>
<name sortKey="Sandelin, A" uniqKey="Sandelin A">A Sandelin</name>
</author>
<author>
<name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
<author>
<name sortKey="Katayama, S" uniqKey="Katayama S">S Katayama</name>
</author>
<author>
<name sortKey="Shimokawa, K" uniqKey="Shimokawa K">K Shimokawa</name>
</author>
<author>
<name sortKey="Ponjavic, J" uniqKey="Ponjavic J">J Ponjavic</name>
</author>
<author>
<name sortKey="Semple, Ca" uniqKey="Semple C">CA Semple</name>
</author>
<author>
<name sortKey="Taylor, Ms" uniqKey="Taylor M">MS Taylor</name>
</author>
<author>
<name sortKey="Engstrom, Pg" uniqKey="Engstrom P">PG Engström</name>
</author>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Forrest, Ar" uniqKey="Forrest A">AR Forrest</name>
</author>
<author>
<name sortKey="Alkema, Wb" uniqKey="Alkema W">WB Alkema</name>
</author>
<author>
<name sortKey="Tan, Sl" uniqKey="Tan S">SL Tan</name>
</author>
<author>
<name sortKey="Plessy, C" uniqKey="Plessy C">C Plessy</name>
</author>
<author>
<name sortKey="Kodzius, R" uniqKey="Kodzius R">R Kodzius</name>
</author>
<author>
<name sortKey="Ravasi, T" uniqKey="Ravasi T">T Ravasi</name>
</author>
<author>
<name sortKey="Kasukawa, T" uniqKey="Kasukawa T">T Kasukawa</name>
</author>
<author>
<name sortKey="Fukuda, S" uniqKey="Fukuda S">S Fukuda</name>
</author>
<author>
<name sortKey="Kanamori Katayama, M" uniqKey="Kanamori Katayama M">M Kanamori-Katayama</name>
</author>
<author>
<name sortKey="Kitazume, Y" uniqKey="Kitazume Y">Y Kitazume</name>
</author>
<author>
<name sortKey="Kawaji, H" uniqKey="Kawaji H">H Kawaji</name>
</author>
<author>
<name sortKey="Kai, C" uniqKey="Kai C">C Kai</name>
</author>
<author>
<name sortKey="Nakamura, M" uniqKey="Nakamura M">M Nakamura</name>
</author>
<author>
<name sortKey="Konno, H" uniqKey="Konno H">H Konno</name>
</author>
<author>
<name sortKey="Nakano, K" uniqKey="Nakano K">K Nakano</name>
</author>
<author>
<name sortKey="Mottagui Tabar, S" uniqKey="Mottagui Tabar S">S Mottagui-Tabar</name>
</author>
<author>
<name sortKey="Arner, P" uniqKey="Arner P">P Arner</name>
</author>
<author>
<name sortKey="Chesi, A" uniqKey="Chesi A">A Chesi</name>
</author>
<author>
<name sortKey="Gustincich, S" uniqKey="Gustincich S">S Gustincich</name>
</author>
<author>
<name sortKey="Persichetti, F" uniqKey="Persichetti F">F Persichetti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Friedman, Wf" uniqKey="Friedman W">WF Friedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mountjoy, M" uniqKey="Mountjoy M">M Mountjoy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Friedman, Wf" uniqKey="Friedman W">WF Friedman</name>
</author>
<author>
<name sortKey="Callimahos, Ld" uniqKey="Callimahos L">LD Callimahos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmid, Cd" uniqKey="Schmid C">CD Schmid</name>
</author>
<author>
<name sortKey="Perier, R" uniqKey="Perier R">R Perier</name>
</author>
<author>
<name sortKey="Praz, V" uniqKey="Praz V">V Praz</name>
</author>
<author>
<name sortKey="Bucher, P" uniqKey="Bucher P">P Bucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perier, Rc" uniqKey="Perier R">RC Périer</name>
</author>
<author>
<name sortKey="Praz, V" uniqKey="Praz V">V Praz</name>
</author>
<author>
<name sortKey="Junier, T" uniqKey="Junier T">T Junier</name>
</author>
<author>
<name sortKey="Bonnard, C" uniqKey="Bonnard C">C Bonnard</name>
</author>
<author>
<name sortKey="Bucher, P" uniqKey="Bucher P">P Bucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shahmuradov, Ia" uniqKey="Shahmuradov I">IA Shahmuradov</name>
</author>
<author>
<name sortKey="Gammerman, Aj" uniqKey="Gammerman A">AJ Gammerman</name>
</author>
<author>
<name sortKey="Hancock, Jm" uniqKey="Hancock J">JM Hancock</name>
</author>
<author>
<name sortKey="Bramley, Pm" uniqKey="Bramley P">PM Bramley</name>
</author>
<author>
<name sortKey="Solovyev, Vv" uniqKey="Solovyev V">VV Solovyev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Yu, X" uniqKey="Yu X">X Yu</name>
</author>
<author>
<name sortKey="Zack, Dj" uniqKey="Zack D">DJ Zack</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Qian, J" uniqKey="Qian J">J Qian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, X" uniqKey="Yu X">X Yu</name>
</author>
<author>
<name sortKey="Lin, J" uniqKey="Lin J">J Lin</name>
</author>
<author>
<name sortKey="Zack, Dj" uniqKey="Zack D">DJ Zack</name>
</author>
<author>
<name sortKey="Qian, J" uniqKey="Qian J">J Qian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, X" uniqKey="Yu X">X Yu</name>
</author>
<author>
<name sortKey="Lin, J" uniqKey="Lin J">J Lin</name>
</author>
<author>
<name sortKey="Zack, Dj" uniqKey="Zack D">DJ Zack</name>
</author>
<author>
<name sortKey="Qian, J" uniqKey="Qian J">J Qian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Hc" uniqKey="Nelson H">HC Nelson</name>
</author>
<author>
<name sortKey="Finch, Jt" uniqKey="Finch J">JT Finch</name>
</author>
<author>
<name sortKey="Luisi, Bf" uniqKey="Luisi B">BF Luisi</name>
</author>
<author>
<name sortKey="Klug, A" uniqKey="Klug A">A Klug</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Y" uniqKey="Zhou Y">Y Zhou</name>
</author>
<author>
<name sortKey="Bizzaro, Jw" uniqKey="Bizzaro J">JW Bizzaro</name>
</author>
<author>
<name sortKey="Marx, Ka" uniqKey="Marx K">KA Marx</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gershenzon, Ni" uniqKey="Gershenzon N">NI Gershenzon</name>
</author>
<author>
<name sortKey="Ioshikhes, Ip" uniqKey="Ioshikhes I">IP Ioshikhes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suzuki, Y" uniqKey="Suzuki Y">Y Suzuki</name>
</author>
<author>
<name sortKey="Tsunoda, T" uniqKey="Tsunoda T">T Tsunoda</name>
</author>
<author>
<name sortKey="Sese, J" uniqKey="Sese J">J Sese</name>
</author>
<author>
<name sortKey="Taira, H" uniqKey="Taira H">H Taira</name>
</author>
<author>
<name sortKey="Mizushima Sugano, J" uniqKey="Mizushima Sugano J">J Mizushima-Sugano</name>
</author>
<author>
<name sortKey="Hata, H" uniqKey="Hata H">H Hata</name>
</author>
<author>
<name sortKey="Ota, T" uniqKey="Ota T">T Ota</name>
</author>
<author>
<name sortKey="Isogai, T" uniqKey="Isogai T">T Isogai</name>
</author>
<author>
<name sortKey="Tanaka, T" uniqKey="Tanaka T">T Tanaka</name>
</author>
<author>
<name sortKey="Nakamura, Y" uniqKey="Nakamura Y">Y Nakamura</name>
</author>
<author>
<name sortKey="Suyama, A" uniqKey="Suyama A">A Suyama</name>
</author>
<author>
<name sortKey="Sakaki, Y" uniqKey="Sakaki Y">Y Sakaki</name>
</author>
<author>
<name sortKey="Morishita, S" uniqKey="Morishita S">S Morishita</name>
</author>
<author>
<name sortKey="Okubo, K" uniqKey="Okubo K">K Okubo</name>
</author>
<author>
<name sortKey="Sugano, S" uniqKey="Sugano S">S Sugano</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, C" uniqKey="Yang C">C Yang</name>
</author>
<author>
<name sortKey="Bolotin, E" uniqKey="Bolotin E">E Bolotin</name>
</author>
<author>
<name sortKey="Jiang, T" uniqKey="Jiang T">T Jiang</name>
</author>
<author>
<name sortKey="Sladek, Fm" uniqKey="Sladek F">FM Sladek</name>
</author>
<author>
<name sortKey="Martinez, E" uniqKey="Martinez E">E Martinez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bradley, Rc" uniqKey="Bradley R">RC Bradley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ioshikhes, Ip" uniqKey="Ioshikhes I">IP Ioshikhes</name>
</author>
<author>
<name sortKey="Albert, I" uniqKey="Albert I">I Albert</name>
</author>
<author>
<name sortKey="Zanton, Sj" uniqKey="Zanton S">SJ Zanton</name>
</author>
<author>
<name sortKey="Pugh, Bf" uniqKey="Pugh B">BF Pugh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Albert, I" uniqKey="Albert I">I Albert</name>
</author>
<author>
<name sortKey="Mavrich, Tn" uniqKey="Mavrich T">TN Mavrich</name>
</author>
<author>
<name sortKey="Tomsho, Lp" uniqKey="Tomsho L">LP Tomsho</name>
</author>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Zanton, Sj" uniqKey="Zanton S">SJ Zanton</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
<author>
<name sortKey="Pugh, Bf" uniqKey="Pugh B">BF Pugh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tirosh, I" uniqKey="Tirosh I">I Tirosh</name>
</author>
<author>
<name sortKey="Berman, J" uniqKey="Berman J">J Berman</name>
</author>
<author>
<name sortKey="Barkai, N" uniqKey="Barkai N">N Barkai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tirosh, I" uniqKey="Tirosh I">I Tirosh</name>
</author>
<author>
<name sortKey="Barkai, N" uniqKey="Barkai N">N Barkai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cai, S" uniqKey="Cai S">S Cai</name>
</author>
<author>
<name sortKey="Han, Hj" uniqKey="Han H">HJ Han</name>
</author>
<author>
<name sortKey="Kohwi Shigematsu, T" uniqKey="Kohwi Shigematsu T">T Kohwi-Shigematsu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iyer, V" uniqKey="Iyer V">V Iyer</name>
</author>
<author>
<name sortKey="Struhl, K" uniqKey="Struhl K">K Struhl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suter, B" uniqKey="Suter B">B Suter</name>
</author>
<author>
<name sortKey="Schnappauf, G" uniqKey="Schnappauf G">G Schnappauf</name>
</author>
<author>
<name sortKey="Thoma, F" uniqKey="Thoma F">F Thoma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Filetici, P" uniqKey="Filetici P">P Filetici</name>
</author>
<author>
<name sortKey="Aranda, C" uniqKey="Aranda C">C Aranda</name>
</author>
<author>
<name sortKey="Gonzalez, A" uniqKey="Gonzalez A">A Gonzàlez</name>
</author>
<author>
<name sortKey="Ballario, P" uniqKey="Ballario P">P Ballario</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koch, Ka" uniqKey="Koch K">KA Koch</name>
</author>
<author>
<name sortKey="Thiele, Dj" uniqKey="Thiele D">DJ Thiele</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fashena, Sj" uniqKey="Fashena S">SJ Fashena</name>
</author>
<author>
<name sortKey="Reeves, R" uniqKey="Reeves R">R Reeves</name>
</author>
<author>
<name sortKey="Ruddle, Nh" uniqKey="Ruddle N">NH Ruddle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sayers, Ew" uniqKey="Sayers E">EW Sayers</name>
</author>
<author>
<name sortKey="Barrett, T" uniqKey="Barrett T">T Barrett</name>
</author>
<author>
<name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
<author>
<name sortKey="Bolton, E" uniqKey="Bolton E">E Bolton</name>
</author>
<author>
<name sortKey="Bryant, Sh" uniqKey="Bryant S">SH Bryant</name>
</author>
<author>
<name sortKey="Canese, K" uniqKey="Canese K">K Canese</name>
</author>
<author>
<name sortKey="Chetvernin, V" uniqKey="Chetvernin V">V Chetvernin</name>
</author>
<author>
<name sortKey="Church, Dm" uniqKey="Church D">DM Church</name>
</author>
<author>
<name sortKey="Dicuccio, M" uniqKey="Dicuccio M">M Dicuccio</name>
</author>
<author>
<name sortKey="Federhen, S" uniqKey="Federhen S">S Federhen</name>
</author>
<author>
<name sortKey="Feolo, M" uniqKey="Feolo M">M Feolo</name>
</author>
<author>
<name sortKey="Fingerman, Im" uniqKey="Fingerman I">IM Fingerman</name>
</author>
<author>
<name sortKey="Geer, Ly" uniqKey="Geer L">LY Geer</name>
</author>
<author>
<name sortKey="Helmberg, W" uniqKey="Helmberg W">W Helmberg</name>
</author>
<author>
<name sortKey="Kapustin, Y" uniqKey="Kapustin Y">Y Kapustin</name>
</author>
<author>
<name sortKey="Krasnov, S" uniqKey="Krasnov S">S Krasnov</name>
</author>
<author>
<name sortKey="Landsman, D" uniqKey="Landsman D">D Landsman</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author>
<name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author>
<name sortKey="Madej, T" uniqKey="Madej T">T Madej</name>
</author>
<author>
<name sortKey="Maglott, Dr" uniqKey="Maglott D">DR Maglott</name>
</author>
<author>
<name sortKey="Marchler Bauer, A" uniqKey="Marchler Bauer A">A Marchler-Bauer</name>
</author>
<author>
<name sortKey="Miller, V" uniqKey="Miller V">V Miller</name>
</author>
<author>
<name sortKey="Karsch Mizrachi, I" uniqKey="Karsch Mizrachi I">I Karsch-Mizrachi</name>
</author>
<author>
<name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
<author>
<name sortKey="Panchenko, A" uniqKey="Panchenko A">A Panchenko</name>
</author>
<author>
<name sortKey="Phan, L" uniqKey="Phan L">L Phan</name>
</author>
<author>
<name sortKey="Pruitt, Kd" uniqKey="Pruitt K">KD Pruitt</name>
</author>
<author>
<name sortKey="Schuler, Gd" uniqKey="Schuler G">GD Schuler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23020586</article-id>
<article-id pub-id-type="pmc">3549790</article-id>
<article-id pub-id-type="publisher-id">1471-2164-13-512</article-id>
<article-id pub-id-type="doi">10.1186/1471-2164-13-512</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" id="A1">
<name>
<surname>Gagniuc</surname>
<given-names>Paul</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>paul_gagniuc@acad.ro</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Ionescu-Tirgoviste</surname>
<given-names>Constantin</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>cit@paulescu.ro</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Institute of Genetics, University of Bucharest, Bucharest 060101, Romania</aff>
<aff id="I2">
<label>2</label>
National Institute of Diabetes, Nutrition and Metabolic Diseases “N.C. Paulescu”, Bucharest, Romania</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>9</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<fpage>512</fpage>
<lpage>512</lpage>
<history>
<date date-type="received">
<day>13</day>
<month>5</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>9</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2012 Gagniuc and Ionescu-Tirgoviste; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Gagniuc and Ionescu-Tirgoviste; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2164/13/512"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species?</p>
</sec>
<sec>
<title>Results</title>
<p>In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on comprehensive data from three different databases and a new computer model whose core is using Kappa index of coincidence.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>To fully understand the connections between gene promoters and gene expression, we analyzed thousands of promoter sequences using our Kappa Index of Coincidence method and a specialized Optical Character Recognition (OCR) neural network. Under our criteria, 10 classes of promoters were detected. In addition, the existence of “transitional” promoters suggests that there is an evolutionary weighted continuum between classes, depending perhaps upon changes in their gene products.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Gene promoters</kwd>
<kwd>Promoter classes</kwd>
<kwd>Eukaryotic genomes</kwd>
<kwd>Promoter patterns</kwd>
<kwd>Kappa index of coincidence</kwd>
<kwd>Promoter network</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>Promoters have guided evolution for millions of years. It appears that they were the main engine responsible for the integration of different mutations favorable for the environmental conditions [
<xref ref-type="bibr" rid="B1">1</xref>
]. Promoters are critical regions for gene regulation in complex genomes and are located upstream of TSS (Transcription Start Site). A typical promoter region is composed of a core promoter and regulatory domains [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B3">3</xref>
]. The structure of a promoter is recognized by the presence of known promoter elements, such as TATA box, GC-box, CCAAT-box, BRE and INR box [
<xref ref-type="bibr" rid="B4">4</xref>
-
<xref ref-type="bibr" rid="B12">12</xref>
]. Therefore, accurate recognition of a promoter structure relies on a comprehensive list of promoter elements. Nevertheless, using these promoter elements for classification has proven to be difficult and perhaps even disadvantageous for different functional correlations between promoter sequences. From an evolutionary standpoint, within non-coding regulatory regions, nucleotides can change their order more frequently and these binding sites often become very small and instable [
<xref ref-type="bibr" rid="B13">13</xref>
]. Previously, approaches towards promoter classification include motif sequences and other structural parameters, such as DNA curvature, bendability, stability, nucleosome positioning or comparison of various DNA sequences [
<xref ref-type="bibr" rid="B14">14</xref>
-
<xref ref-type="bibr" rid="B19">19</xref>
]. Currently, promoters from vertebrates are classified into two major classes, namely TATA and CpG types while in mammals there is a subclassification in TATA box–enriched and CpG-rich promoters [
<xref ref-type="bibr" rid="B20">20</xref>
]. In order to investigate possible interactions between different biological processes, we found that an overall correlation between DNA sequence features among promoter regions may be an alternative method. In this context, we have chosen a different approach to classify promoter sequences by using two-dimensional patterns obtained through Kappa Index of Coincidence (Kappa IC) and (C + G)% values [
<xref ref-type="bibr" rid="B21">21</xref>
-
<xref ref-type="bibr" rid="B24">24</xref>
]. This classification it is mainly done by considering the shape and density of these promoter patterns. In this study, we explore the structural properties of these patterns and we search for correlations between promoter sequences of several different species. Genome sequencing has led to the development of many bioinformatic methods for accurate recognition and extraction of promoter sequences. A number of experimental approaches to compile TSSs on a genome-wide scale have been developed including the Eukaryotic Promoter Database [
<xref ref-type="bibr" rid="B25">25</xref>
,
<xref ref-type="bibr" rid="B26">26</xref>
] and PlantProm Database [
<xref ref-type="bibr" rid="B27">27</xref>
]. We used these databases and focused our attention on 20,597 promoter sequences from
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
. In humans we were also interested in promoters of genes that are expressed preferentially in certain tissues. Several studies converged on characterizing patterns of tissue specific gene expression, including TiGER (Tissue-specific Gene Expression and Regulation) database [
<xref ref-type="bibr" rid="B28">28</xref>
-
<xref ref-type="bibr" rid="B30">30</xref>
], which contains comprehensive information about human tissue-specific gene expression profiles. We have used TiGER database list of tissue-specific genes to determine the proportion of each promoter class in 30 tissues. This allowed us to identify certain relations between promoter sequences and different biological processes.</p>
</sec>
<sec sec-type="results">
<title>Results</title>
<p>We first investigated if some promoter patterns occur more often then others. Secondly we determined which of these patterns are more common in certain species and whether their distribution may have some evolutionary implications. In the third analysis we examined the distribution of these promoter classes among human tissues.</p>
<sec>
<title>Promoter classification</title>
<p>When promoter patterns are generated, some initial general conclusions can be drawn. Although these promoter sequences are less conserved between species they exhibit similar patterns. Each pattern is composed of vertically aligned clusters of Kappa IC (y-axis) and (G + C)% (x-axis) values. Vertical positions of these clusters form a promoter pattern which has a specific form for each promoter sequence. We have been able to classify promoters according to their patterns and noticed ten general types of promoters (Figure
<xref ref-type="fig" rid="F1">1A</xref>
-J). Although the overall shape and density seems to be conserved across different classes of promoters, they do differ in finer details. This may indicate a further possible organization of promoter classes in several subclasses. Their shape is explained by the presence of different structures such as simple sequence repeats (SSRs) or short tandem repeats (STRs). Among these structures we found an interesting distribution of short and long homopolymer tracts or di- and tri-nucleotides formations, many of which are consistent with other studies previously done [
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
]. We have been able to partition these patterns into ten classes on the basis of clear visual distinctions between their shape and their cluster density. The name of each promoter class has been chosen by the average nucleotide content and Kappa IC values, as follows:</p>
<p>1)
<italic>AT-based promoters</italic>
. AT-based representative patterns are distinguished by high (A + T)% and Kappa IC values. The left side of the pattern is predominant, while the right side is significantly less pronounced. The shape of this pattern exhibits various different lengths of short poly(dA:dT) homopolymer tracts (Figure
<xref ref-type="fig" rid="F1">1C</xref>
). AT-based patterns are characteristic for gene promoters from
<italic>Drosophila melanogaster</italic>
and
<italic>Arabidopsis thaliana</italic>
and are less common in humans.</p>
<p>2)
<italic>CG-based</italic>
promoters. These promoters are represented by patterns containing a high percentage of C + G and high Kappa IC values. CG-based promoters show a high CpG content. The right side of the pattern is predominant while the left side is significantly less pronounced (Figure
<xref ref-type="fig" rid="F1">1A</xref>
). The shape of this pattern exhibits various different lengths of short poly(dC:dG) homopolymer tracts. In addition, the average frequency of occurrence between AT-based and CG-based promoters appears to differ completely in these species, but curiously, these promoters tend to be in a relative opposition in each species (Figure
<xref ref-type="fig" rid="F2">2A</xref>
,B). This observation suggests that these species have different preferences for allocation of certain fundamental functions. Patterns of this class are particulary characteristic for genes from
<italic>Homo sapiens</italic>
.</p>
<p>3)
<italic>ATCG-compact</italic>
promoters. ATCG-compact patterns characterize promoters with centrally disposed clusters, leading to the formation of a round shaped pattern (Figure
<xref ref-type="fig" rid="F1">1D</xref>
). The middle-lower region of the pattern contains evenly interspersed nucleotides (A,T,C,G ≈ 25%) and the middle-upper area shows different lengths of short homopolymer tracts (poly(dA), poly(dT), poly(dC), poly(dG)) disposed in tandem in any order. ATCG-compact patterns are characteristic for gene promoters from
<italic>Arabidopsis thaliana</italic>
.</p>
<p>4)
<italic>ATCG-balanced</italic>
promoters. Promoter sequences belonging to ATCG-balanced class show an almost balanced G + C and A + T content. The right and the left side of the pattern tend to share a relative 2-fold rotational symmetry. These patterns are generally composed of equally distributed short poly(dA:dT) and poly(dC:dG) homopolymer tracts (Figure
<xref ref-type="fig" rid="F1">1B</xref>
). ATCG-balanced and CG-spike promoters tend to occur in the same proportion in each species and appear to have almost similar average frequencies between species (Figure
<xref ref-type="fig" rid="F2">2A</xref>
,B). This observation indicates that for some specific functions the same classes of promoters are preferred between species. These patterns are characteristic for gene promoters from
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
.</p>
<p>5)
<italic>ATCG-middle</italic>
promoters. ATCG-middle patterns are characterized mainly by promoter sequences containing A + T and C + G balanced values and higher than average Kappa IC values. The right side and the left side of the pattern are equally distributed. However, the central part is pronounced. They are similar to ATCG-balanced class in that they also have a relative 2-fold rotational symmetry, but contain additional short homopolymer tracts (poly(dA), poly(dT), poly(dC), poly(dG)) disposed in tandem in any order (Figure
<xref ref-type="fig" rid="F1">1E</xref>
). These patterns are rare and are almost equally distributed in all four species.</p>
<p>6)
<italic>ATCG-less promoters</italic>
. Promoters from this class are represented by an abrupt transition between two C + G threshold levels. Similar to ATCG-balanced promoters, the right side and the left side of the pattern is equally distributed, however, some sequences around the central region are missing or have a lower density. Typically, these central regions lack of tandem short homopolymer tracts and short sequences consisting of equally interspersed nucleotides (A,T,C,G ≈ 25%), or short sequences showing small variations over 50% in favor of A + T or C + G nucleotides (Figure
<xref ref-type="fig" rid="F1">1F</xref>
). Based on the promoter sequence features, these promoter patterns seem to be complementary with ATCG-middle promoters. ATCG-less patterns are significantly rare (an overall frequency between species of 0.10% - 0.16%) and are characteristic for promoters from
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
but are almost absent in
<italic>Drosophila melanogaster</italic>
and
<italic>Arabidopsis thaliana</italic>
.</p>
<p>7)
<italic>AT-less promoters</italic>
. Promoter sequences belonging to AT-less class exhibit a high frequency of short CG-rich sequences. Although both sides of the pattern show a relative 2-fold rotational symmetry, the clusters from the left side of the pattern exhibit a lower density than those on the right. These patterns are characterized by a large number of short poly(dC:dG) tracts and a lower number of short poly(dA:dT) tracts (Figure
<xref ref-type="fig" rid="F1">1G</xref>
). Short poly(dA:dT) tracts typically occur as a consequence of an abrupt depletion of C + G nucleotides on short distances (30b–60b) inside the promoter sequence. Such a depletion is accompanied by high Kappa IC values and is typically present near TSS (± 200b), suggesting a regular expression of their genes. AT-less patterns are generally rare and are found equally in all four species, but are slightly more frequent in
<italic>Homo sapiens</italic>
.</p>
<p>8)
<italic>CG-less promoters</italic>
. In contrast, CG-less promoters are distinguished by a high frequency of short AT-rich sequences and are more common in
<italic>Oryza sativa</italic>
and
<italic>Arabidopsis thaliana</italic>
. The right and left side of the pattern tend to be equally distributed, however, the clusters from the right side of the pattern exhibit a lower density than those on the left. AT-less and CG-less promoters seem to be characterized by an imbalance between the number of short poly(dA:dT) tracts and short poly(dC:dG) tracts. Complementary to AT-less promoter characteristics, these patterns are characterized by a large number of short poly(dA:dT) tracts and a much lower number of short poly(dC:dG) tracts (Figure
<xref ref-type="fig" rid="F1">1I</xref>
). Compared with AT-less promoters, the overall preference for CG-less promoters is very high between species. However, in
<italic>Homo sapiens</italic>
the number of AT-less promoters slightly exceeds the number CG-less promoters (Figure
<xref ref-type="fig" rid="F2">2A</xref>
).</p>
<p>9)
<italic>AT-spike promoters</italic>
. Promoter sequences belonging to AT-spike class are represented by long repetitive sequences with a high content of A or T nucleotides. These patterns exhibit a central part and an elongated left side containing small density clusters. The shape of AT-spike representative patterns is explained by the presence of long poly(dA) or long poly(dT) homopolymer tracts or tandem short poly(dA) or short poly(dT) tracts (Figure
<xref ref-type="fig" rid="F1">1J</xref>
). These promoters are prevalent in
<italic>Arabidopsis thaliana</italic>
.</p>
<p>10)
<italic>CG-spike promoters</italic>
. In contrast to AT-spike promoter architecture, these promoters are represented by long repetitive sequences with a high content of C or G nucleotides. CG-spike patterns exhibit a central part and an elongated right side containing small density clusters. These patterns contain long poly(dC) or long poly(dG) homopolymer tracts or tandem short poly(dC) or short poly(dG) tracts (Figure
<xref ref-type="fig" rid="F1">1H</xref>
). AT-spike and CG-spike promoters seem to be complementary considering the fact that both promoter classes are differentiated by two opposite types of homopolymer tracts. AT-spike and CG-spike classes appear to be equally preferred between species, nevertheless, their promoters tend to be in opposition in each species (Figure
<xref ref-type="fig" rid="F2">2B</xref>
). This observation suggests a possible conservation of their antagonist role between these species, yet a different preference for certain functions. These patterns are common in
<italic>Oryza sativa</italic>
and
<italic>Homo sapiens</italic>
.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Ten classes of promoters </bold>
<bold>and their representative patterns.</bold>
Each promoter pattern is composed of vertically aligned clusters of Kappa IC (y-axis) and GC% (x-axis) values. The center of weight for each pattern is represented by a black circle. These representative promoter patterns are shown in the following sections as follows: (
<bold>A</bold>
) AT-based, (
<bold>B</bold>
) CG-based, (
<bold>C</bold>
) ATCG-compact, (
<bold>D</bold>
) ATCG-balanced, (
<bold>E</bold>
) ATCG-middle, (
<bold>F</bold>
) ATCG-less, (
<bold>G</bold>
) AT-less, (
<bold>H</bold>
) CG-spike, (
<bold>I</bold>
) CG-less and (
<bold>J</bold>
) AT-spike.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-1"></graphic>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Organism-specific frequencies of each </bold>
<bold>promoter class.</bold>
Each column represents a class of promoters. Starting at the bottom of each column we present the class name, (
<bold>B</bold>
) the average preference of promoter classes between species, a representative shape of the promoter class (pink areas show denser clusters whereas light grayish gold color shows lower density clusters) and (
<bold>A</bold>
) the proportion of promoter classes in
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa.</italic>
</p>
</caption>
<graphic xlink:href="1471-2164-13-512-2"></graphic>
</fig>
</sec>
<sec>
<title>Promoter distribution</title>
<p>Our comparative analyses have revealed similarities and differences in the promoter architecture between
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
. We have plotted the center of weight from 20,586 promoter patterns according with each species in order to highlight the distribution of these regulatory sequences (Figure
<xref ref-type="fig" rid="F3">3</xref>
). The center of weight of each promoter pattern indicates an average between all SSR and STR sequences. ATCG-middle patterns contain almost all types of SSR and STR sequences and can reveal some visual insights into different promoter regions (Figure
<xref ref-type="fig" rid="F4">4A</xref>
-F). Although the phylogenetic relationships are usualy based on sequence alignment algorithms, Kappa IC approach is based on a frequency/content comparison. A superposition between promoter distributions from each species shows the shared surfaces, representing conserved promoter sequences (Figure
<xref ref-type="fig" rid="F3">3E</xref>
-J). Promoter sequences from
<italic>Arabidopsis thaliana</italic>
and rice were notably differentiated, and only a small part of promoters were shared (Figure
<xref ref-type="fig" rid="F3">3B</xref>
,D and Figure
<xref ref-type="fig" rid="F3">3I</xref>
). Moreover,
<italic>Arabidopsis thaliana</italic>
promoters seem to have more structural features in common with those from
<italic>Drosophila melanogaster</italic>
(Figure
<xref ref-type="fig" rid="F3">3F</xref>
). Promoters from
<italic>Arabidopsis thaliana</italic>
exhibit higher Kappa IC values than promoters from
<italic>Drosophila melanogaster</italic>
, while variations of C + G content are relatively the same. Curiously, the highest rate of conserved promoters was encountered between
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
(Figure
<xref ref-type="fig" rid="F3">3J</xref>
) and the lowest rate of conservation was observed between
<italic>Arabidopsis thaliana</italic>
and
<italic>Homo sapiens</italic>
(Figure
<xref ref-type="fig" rid="F3">3H</xref>
). Promoter sequences from
<italic>Homo sapiens</italic>
show both a wider distribution of C + G content and the highest values of Kappa IC (Figure
<xref ref-type="fig" rid="F3">3A</xref>
,E,H,J). The superposition of promoter distributions of the four species shows that promoters do not reflect distant phylogenetic relationships (Figure
<xref ref-type="fig" rid="F3">3E</xref>
-J). We have also noticed the directions and the angles of these promoter distributions which may suggest an evolutionary tendency for each species.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Promoter distributions for each </bold>
<bold>species.</bold>
(
<bold>A</bold>
)
<italic>Homo sapiens</italic>
, (
<bold>B</bold>
)
<italic>Drosophila melanogaster</italic>
, (
<bold>C</bold>
)
<italic>Oryza sativa</italic>
and (
<bold>D</bold>
)
<italic>Arabidopsis thaliana</italic>
. Each point represents the center of weight from a promoter pattern. Red color areas represent denser clusters of promoters. (
<bold>E-J</bold>
) superposition between promoter distributions. Red color areas represent conserved promoter sequences.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-3"></graphic>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Location of SSRs and </bold>
<bold>STRs within a promoter </bold>
<bold>pattern.</bold>
The light grayish gold shape represents a model of a promoter pattern from ATCG-middle class in which we approximate the location of various structures that compose a promoter sequence. (
<bold>A</bold>
) long Poly(dA) or Poly(dT) tracts or tandem short Poly(dA) or Poly(dT) tracts, (
<bold>B</bold>
) non-ordered short Poly(dA) and Poly(dT) and Poly(dC) and Poly(dG) tracts, (
<bold>C</bold>
) long Poly(dC) or Poly(dG) tracts or tandem short Poly(dC) or Poly(dG) tracts, (
<bold>D</bold>
) short Poly(dC) and Poly(dG) tracts, (
<bold>E</bold>
) evenly interspersed nucleotides (A,T,C,G ≈ 25%), (
<bold>F</bold>
) short Poly(dA) and Poly(dT) tracts.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-4"></graphic>
</fig>
</sec>
<sec>
<title>TATA-less and TATA-containing correlations</title>
<p>Several reports regarding
<italic>Homo sapiens</italic>
TATA-containing promoters seem to vary in different studies, depending on the number of promoters used [
<xref ref-type="bibr" rid="B33">33</xref>
]. An earlier study found 32% TATA-containing promoters from a set of ~1,000 genes [
<xref ref-type="bibr" rid="B34">34</xref>
]. More recent genome-wide studies show that only ~10% of human genes contain TATA-dependent promoters [
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B35">35</xref>
]. However, the EPD dataset (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
) has been cleared of redundant promoters that shared the same TSS. Accordingly, their promoter set has a much higher presence of known promoter elements, such as TATA or GC boxes. Using the EPD collection of 8,512
<italic>Homo sapiens</italic>
promoters, we searched for TATA motifs in a sample of 795 promoter sequences. Of this collection, we found that ~41% were TATA-containing promoters (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
). TATA-containing promoter levels were higher in AT-based, AT-less, ATCG-compact, ATCG-balanced and ATCG-middle classes, whereas TATA-less promoter levels were higher in CG-based, AT-spike, CG-less and ATCG-less classes (Figure
<xref ref-type="fig" rid="F5">5</xref>
). More extreme differences between TATA-containing and TATA-less promoters were observed in CG-based (TATA-containing (5.28%), TATA-less (36.72%) and AT-based (TATA-containing (6.41%), TATA-less (0.75%) classes (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>TATA-less and TATA-containing correlations.</bold>
In each class, blue bars show the proportion of TATA-less promoters and light yellow bars show the proportion of TATA-containing promoters. Observations were made on a sample of 795 promoters, randomly selected from a collection of 8512
<italic>Homo sapiens</italic>
promoters.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-5"></graphic>
</fig>
</sec>
<sec>
<title>Transitional states</title>
<p>Previous studies suggested that TATA-less and TATA-containing promoters have different chromatin structure [
<xref ref-type="bibr" rid="B36">36</xref>
-
<xref ref-type="bibr" rid="B41">41</xref>
]. Evolutionary, chromatin structure may influence the distribution of point mutations or other mutational events in the promoter sequence. A chromatin-dependent distribution of point mutations can lead to a gradual shift from a promoter class to another promoter class (ie. by disruption of poly(dA:dT) or poly(dC:dG) tracts in shorter elements), thus changing the predisposition for low or high levels of gene expression. Promoter patterns “trapped” in transitional states between classes may also perhaps indicate a change of their gene relationship towards other biological pathways. We have found intermediate states between these patterns which may suggest an evolutionary transition mechanism (Figure
<xref ref-type="fig" rid="F6">6</xref>
). Initially, the transition states were observed by our neural network (Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
). All promoter patterns have been classified by the highest percentage of recognition for each class. Certain promoter patterns present similar percentages for two separate classes of promoters, indicating a potential inclusion in two classes simultaneously. Exact intermediate patterns are rare (sometimes even unique) and differ drastically from the majority of patterns (Figure
<xref ref-type="fig" rid="F6">6</xref>
). For instance, ATCG-balanced class appears to have several patterns with a transitional tendency to ATCG-compact class or vice versa (Figure
<xref ref-type="fig" rid="F6">6A</xref>
). These transitions are based on successive elimination/insertion of short poly(dA:dT) and poly(dC:dG) tracts. Another example is represented by a systematic reduction of short poly(dA:dT) tracts, which lead to a transition of AT-less promoters to CG-based class (Figure
<xref ref-type="fig" rid="F6">6C</xref>
). In contrast, a systematic reduction of short poly(dC:dG) tracts leads to a class transition from CG-less promoters to AT-based promoters (Figure
<xref ref-type="fig" rid="F6">6D</xref>
). From what we have witnessed, neither of these classes represent “end of the line” for these transitions since we observed intermediate patterns between all classes. Furthermore, we have observed varying degrees of difficulty of transition from one class to another. This difficulty is reflected in the number of promoters belonging to each class (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
). For example, CG-based and AT-based, AT-spike and CG-spike or AT-less and CG-less classes tend to form mirror pairs. These pairs of classes have the lowest probability to transit directly from one to another. The evidence for this claim is supported by a small number of intermediate patterns that we have found between these alleged pairs of classes. For instance, intermediate patterns between AT-spike and CG-spike promoters can have both long poly(dA:dT) and long poly(dC:dG) tracts, a sequence arrangement that is rarely encountered (Figure
<xref ref-type="fig" rid="F6">6B</xref>
). Consequently, we suggest that these direct transitions of promoters between pairs of classes may be caused by strong selection pressures conditioned by radical changes in the environment.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption>
<p>
<bold>Promoter patterns found in </bold>
<bold>transitional states.</bold>
(
<bold>A</bold>
) MDH1B gene promoter found in a transitional state between ATCG-compact and ATCG-balanced class, (
<bold>B</bold>
) UFC1 gene promoter found in a transitional state between AT-spike and CG-spike class, (
<bold>C</bold>
) LRRN1 gene promoter found in a transitional state between AT-less and CG-based class and (
<bold>D</bold>
) PCDHB10 gene promoter found in a transitional state between AT-based and CG-less class.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-6"></graphic>
</fig>
</sec>
<sec>
<title>Tissue-specificity in humans</title>
<p>Our general classification criterion allowed us to demonstrate compelling biological correlates between 2,369 tissue-specific genes (Figure
<xref ref-type="fig" rid="F7">7A</xref>
,B). Some of our observations are also based on previous studies that suggest direct correlations between short or long homopolymer tracts and certain levels of gene expression [
<xref ref-type="bibr" rid="B42">42</xref>
-
<xref ref-type="bibr" rid="B46">46</xref>
]. Indeed, we have also observed a constant presence of different homopolymer elements in these patterns, suggesting that different promoter classes (ie. CG-spike or AT-spike) indicate a predisposition for various levels of gene expression as well as for a distinct number of factors which trigger gene expression. Specific interaction clusters have been reported in the past, such as muscle and heart or kidney and liver clusters [
<xref ref-type="bibr" rid="B30">30</xref>
]. We show some additional interaction groups, both between promoter classes and within each promoter class. In addition to these groups, the tissue order from each class further reflects the significance of the observed interactions (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
). The highlights of our observations include:</p>
<p>1. CG-based promoters have the highest percentage of occurrence (37.59%) and appear to be TATA-less class correspondents which tend to be associated with “housekeeping” genes. CG-based promoters are not only the most common but as expected they show the highest levels in all tissues. The first six tissues in which CG-based promoters have the highest percentages are cervix, skin, stomach, ovary, mammary gland and tongue (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10B online).</p>
<p>2. AT-based promoters (5.25%) are present in all tissues but are absent from the mammary gland. The first six tissues in which AT-based promoters have the highest percentages are liver, heart, kidney, lymph node, soft tissue and muscle. This order coincides with the first six tissues in which ATCG-compact promoters have the highest percentages, namely in prostate, liver, kidney, muscle, heart and lymph node. Equally curious, the last six tissues in which CG-based promoters have the lowest percentages are liver, uterus, kidney, heart, lung and brain (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10G and Figure S
<xref ref-type="fig" rid="F7">7B</xref>
online). This implies a special relationship between CG-based and AT-based promoters because their proportions seem to indicate an almost antagonistic activity which may suggests an involvement of these promoters in some metabolic processes. Nevertheless, the relationship between CG-based promoters and other classes of promoters in these tissues seems to conceal more than a simplistic association with the housekeeping genes.</p>
<p>3. AT-less promoters (14.36%) are overestimated in uterus while CG-less and ATCG-balanced promoters are overestimated in testis (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10E,F,H online).</p>
<p>4. CG-less promoters have an occurrence of 3.98% and are present in all tissues but they are absent from Spleen (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10F online).</p>
<p>5. There was no clear correlation regarding tissue order between AT-less and CG-less promoters. Nevertheless, we noticed that some tissues have a tendency to stay grouped, such as muscle and heart, stomach and soft tissue, larynx and colon, lymph node and liver or bone marrow and peripheral nervous system (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10E,F online). These groups may suggest a role of these promoters in simple feedback mechanisms among tissues responsible for maintaining homeostasis. Furthermore, the occurrence of short poly(dA:dT) tracts on short distances near TSS could also indicate an involvement of AT-less (and, by association, a complementary role for their CG-less counterpart) promoters in short term non-critical gene expression, which may strengthen our hypothesis regarding their physiological role. Moreover, in different tissues AT-less and CG-less percentages show a combined relationship of complementarity and proportionality (Figure
<xref ref-type="fig" rid="F8">8C</xref>
).</p>
<p>6. AT-spike promoters are found especially in tissues that require high levels of gene expression such as lung, eye, pancreas, uterus, liver, soft tissue, brain, kidney, prostate and blood. This tissue order and the presence of long poly(dA) or long poly(dT) tracts suggests an involvement of these promoters in survival mechanisms, possibly responsible for interactions with the environment.</p>
<p>7. CG-spike promoters also appear to be involved in survival mechanisms. These promoters are found in large numbers especially in tissues that need a short-term critical gene expression. This is supported by the order of the first seven tissues in which these promoters are most common, such as lung, eye, brain, peripheral nervous system, spleen, heart and blood, which also tend to have a high interaction with the environment (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<p>8. The proportions of CG-spike and AT-spike promoters seem to be similar in the first two tissues, namely in lung and eye. The occurrence of long poly(dA:dT) or tandem short poly(dA:dT) tracts on short distances (>30b) near TSS, could also indicate an involvement of AT-spike and CG-spike promoters in short term critical gene expression.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption>
<p>
<bold>Tissue-distribution frequencies for 2,369 </bold>
<bold>human promoters.</bold>
Two visualization methods are used: (
<bold>A</bold>
) shows the distribution of 30 tissues for each class of promoters and section (
<bold>B</bold>
) shows the distribution of promoter classes in each tissue.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-7"></graphic>
</fig>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption>
<p>
<bold>An overall comparison between </bold>
<bold>different promoter classes in </bold>
<bold>each tissue.</bold>
(
<bold>A</bold>
) tendency for a complementarity relationship between CG based and AT spike classes, (
<bold>B</bold>
) tendency for direct proportionality relationship of AT based – ATCG compact classes. (
<bold>C</bold>
) a combined relationship between AT less – CG less classes, both of complementarity and direct proportionality.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-8"></graphic>
</fig>
<p> The frequency of AT-spike promoters (13.02%) exceeds that of GC-spike promoters (8.93%) but indicate proportional relative values in most tissues. Exceptions are tissues from cervix and muscle where the number of CG-spike promoters surpasses the number of AT-spike promoters (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<p>10. The percentage of occurrences between CG-based and AT-spike promoters appears to be relative and nearly complementary in all tissues (Figure
<xref ref-type="fig" rid="F8">8A</xref>
). Interestingly, the last two tissues in which AT-spike promoters have the lowest percentages and the first two tissues in which CG-based promoters have the highest percentages are cervix and skin (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10C,B).</p>
<p>11. The proportion of ATCG-compact and AT-less promoters seems to have similar values in tissues from kidney and lymph node whereas ATCG-compact and AT-based promoters appear to have similar values in bladder, skin and uterus (Figure
<xref ref-type="fig" rid="F8">8B</xref>
). ATCG-compact promoters tend to exhibit equal values in some tissues such as liver and kidney, brain and bone, heart and muscle. Interestingly, AT-based promoters show also equal values in these tissues but different than those found for ATCG-compact promoters (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<p>12. There was no clear correlation regarding the tissue order between ATCG-balanced and ATCG-compact promoters. However, ATCG-balanced and ATCG-compact promoters seem to have almost equal percentages in about 16 tissues. Both of these classes have the closest values in blood, bone, brain, cervix, colon, heart, muscle, skin and uterus (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<p>13. ATCG-less promoters are rare (0.03%) and are even more enigmatic since they are mainly represented in cervix and tongue (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10I online). In humans, from a total of 8,512 promoter sequences the percentage of ATCG-less promoters it is close to 1.08% whereas their appearances among 2,369 promoters of tissue-specific genes it is almost 0.03%. These results are not consistent with ATCG-less expected frequency of 0.3%, which may suggest that most of their genes are silent (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<p>14. ATCG-middle promoters are present only in nine of the thirty tissues, namely in soft tissue, eye, pancreas, liver, placenta, bladder, muscle, larynx and bone marrow (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S10J online). However, in humans, from a total of 8,512 promoter sequences the percentage of ATCG-middle promoters it is close to 1.05%. Nevertheless, from 2,369 promoters of tissue-specific genes the observed frequency is close to 0.22% whereas their expected frequency is 0.29%, which suggests that some of their genes are also silent. The difference between expected and observed frequencies and an overall low occurrence of genes containing ATCG-middle and ATCG-less promoters may suggest their involvement in anatomical development and in some other cell-related cycles. This observation is supported by several tests performed on promoters from HOX gene family, namely HOXA and HOXB. These genes are represented mostly by patterns showing ATCG-middle characteristics. (Additional file
<xref ref-type="supplementary-material" rid="S5">5</xref>
: Figure S13A-E and Figure S14A-E online). A more broad analysis involving expected and observed frequencies for all classes of promoters is presented in our Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>Generally, both EPD and PlantProm DB define the TSS as the furthest 5 position in the genome which can be aligned with the 5 end of a cDNA from the corresponding gene [
<xref ref-type="bibr" rid="B25">25</xref>
]. However, many human genes are transcribed from multiple promoters, often involving alternative first exons. EPD considers the most frequent cDNA 5 end as the TSS and applies a specialized algorithm to discover multiple promoters for a given gene, whereas PlantProm DB contains plant promoters based on published TSS mapping data [
<xref ref-type="bibr" rid="B27">27</xref>
]. Using a smaller number of promoters from EPD, we have also made an analysis for
<italic>Bos taurus</italic>
,
<italic>Gallus gallus</italic>
,
<italic>Mus musculus</italic>
,
<italic>Rattus norvegicus</italic>
and
<italic>Xenopus laevis</italic>
which showed a distribution close to that of
<italic>Homo sapiens</italic>
(Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
: Figure S15A-E online). Therefore, promoter distributions (Figure
<xref ref-type="fig" rid="F3">3A</xref>
) seem to be characteristic for all vertebrates rather than a special property of human promoters. However, more significant differences were especially observed in
<italic>Gallus gallus</italic>
, where the average Kappa IC values exceed that of other vertebrates (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
: Figure S15B online). On a visual inspection, promoter patterns from
<italic>Arabidopsis thaliana</italic>
and
<italic>Drosophila melanogaster</italic>
have a more narrow shape than those from
<italic>Oryza sativa</italic>
and
<italic>Homo sapiens</italic>
, which suggests a different distribution of point mutations between these species, resulting perhaps from a difference in nucleosome organization. Furthermore, in our experiments we have found that an even distribution of mutations across different promoter sequences fails to change the shape of their patterns, which strengthened our hypothesis (Additional file
<xref ref-type="supplementary-material" rid="S8">8</xref>
: Figure S16A-D). We also noticed that even for shorter promoter sequences (ie. Arabidopsis - PlantProm DB - 251b promoter sequences), promoter patterns retain their properties. Curiously, sliding windows situated at greater distances from TSS do not seem to make a crucial difference in the pattern shape. The majority of defining characteristics seem to be close to TSS. We further made a distribution across promoters of known orthologous genes (Figure
<xref ref-type="fig" rid="F9">9A</xref>
-D). We used HomoloGene [
<xref ref-type="bibr" rid="B47">47</xref>
] to extract 500 bp genomic regions upstream of INS orthologous genes from 7 species, HIS1 orthologous genes from 9 species and CNOT7 orthologous genes from 12 species (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). We confronted these genomic regions with EPD promoters in order to ensure their accuracy. As expected, their distribution (Figure
<xref ref-type="fig" rid="F9">9A</xref>
) retained the same species-specific boundaries (Figure
<xref ref-type="fig" rid="F3">3A</xref>
-D) and their promoter patterns comply with existing phylogenetic relationships (Figure
<xref ref-type="fig" rid="F9">9B</xref>
-D). For tests performed on human tissues we used a list of genes from TiGER (Tissue-specific Gene Expression). For each gene in this list we searched the corresponding promoter in the Eukaryotic Promoter Database. It was shown that these classes of promoters are preferentially present in certain tissues while other classes of promoters are present in all tissues (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S11 online). Only six out of ten classes of promoters are present in all 30 tissues (Figure
<xref ref-type="fig" rid="F7">7</xref>
). Moreover, it was noted that in certain tissues some classes of promoters can occur in a complementary manner, whereas other classes of promoters can appear in competition (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Figure S12A-AS online). On comparisons made between three promoter classes, other types of promoter relations can unfold. For instance, in tissues from brain, eye or lung, the values for AT-less and AT-spike promoters appear to exhibit a relative complementarity to those from muscle, whereas the number of CG-spike promoters remains proportional to their relative values (Figure
<xref ref-type="fig" rid="F7">7B</xref>
). These parallel behaviors and the tissue-preferential distribution of these promoters suggest that certain promoter classes are preferred for specific biological functions. Therefore, these promoter patterns seem to explain the relationship between their genes in certain biological pathways rather than their gene-specific function. This observation implies that promoters located in transitional states may perhaps reflect signatures of some of the latest evolutionary changes of a species. Biological tissues are complex structures, containing different cell types. Accordingly, ‘tissue specific’ stands as a relative term and does not imply that a particular gene is expressed only in a specific tissue or cell type. To determine whether a gene is predominantly expressed in a certain tissue, TiGER defined the Expression Enrichment (EE) as the ratio between observed expression level in that tissue versus averaged expression level across 30 tissues. They further defined a gene as ‘tissue specific’ if it had an EE in a particular tissue larger than 5 and a P-value <10
<sup>-3.5</sup>
[
<xref ref-type="bibr" rid="B30">30</xref>
]. Although “tissue specific” is a relative term and refers to genes predominantly expressed in different tissues, the fundamental tissue-tissue interactions are reflected in our promoter pattern analysis (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption>
<p>
<bold>Distribution across promoters of </bold>
<bold>orthologous genes.</bold>
(
<bold>A</bold>
) overlapping distribution of orthologous promoters from INS (yellow circles), HIS1 (red circles) and CNOT7 gene (green circles), (
<bold>B</bold>
) distribution of orthologous promoters from INS genes, (
<bold>C</bold>
) distribution of orthologous promoters from HIS1 genes, (
<bold>D</bold>
) distribution of orthologous promoters from CNOT7 genes. Each circle represents the center of weight from a promoter pattern and the circle color is associated with a corresponding species.</p>
</caption>
<graphic xlink:href="1471-2164-13-512-9"></graphic>
</fig>
</sec>
<sec sec-type="conclusions">
<title>Conclusions</title>
<p>A comparative analysis was undertaken for 20,586 promoters from the
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
(Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
), and an analysis based on tissue-specific gene expression profiles in humans (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
). Following the analysis, 10 general classes of promoters have emerged. We used promoter sequences from two databases - the Eukaryotic Promoter Database and PlantProm Database. We showed that existing methods used in cryptography, such as Kappa Index of Coincidence, can be adapted for many types of analysis in molecular genetics, perhaps to highlight certain new features of DNA sequences. Our supplemental data files allow re-analysis of our data. We also provide an animation that displays several hundred promoter patterns in succession and ordered according to their class (Additional file
<xref ref-type="supplementary-material" rid="S9">9</xref>
). We consider a possible subdivision of these promoter patterns in subclasses, between 2 up to 4 subclasses for each major class. Furthermore, our observations suggest the existence of a network between these promoter classes. In the near future we wish to merge the information related to these classes of promoters with other available data in gene regulatory networks, in order to form a better understanding of the relationship between some genetic factors and their pathological implications.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Promoter datasets</title>
<p>The Eukaryotic Promoter Database and PlantProm Database provide a collection of eukaryotic promoters for which the transcription start site (TSS) has been determined experimentally (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). We downloaded and tested 20,586 gene promoters from The Eukaryotic Promoter Database (6,649 gene promoters -
<italic>Oryza sativa</italic>
, 1,922 gene promoters -
<italic>Drosophila Melanogaster</italic>
and 8,512 gene promoters -
<italic>Homo sapiens</italic>
) and PlantProm Database (3,503 gene promoters -
<italic>Arabidopsis thaliana</italic>
). We were mainly interested in the regions flanking the putative TSS. From Eukaryotic Promoter Database we extracted promoter segments ranging from -499b to 100b, relative to the TSS. From PlantProm DB we used promoter segments ranging from 200 bp upstream and 51 bp downstream of the TSS.</p>
</sec>
<sec>
<title>Tissue-specific datasets</title>
<p>We used a publicly available list of 6,534 tissue-specific gene names (under Tissue-Specific Genes based on Expressed Sequence Tags (ESTs)) from the TiGER database (gene names were sorted and redundancy was removed - Additional file
<xref ref-type="supplementary-material" rid="S10">10</xref>
) and we searched for their promoters in the Eukaryotic Promoter Database in which we found 2,369 promoters. We generated 2,369 promoter patterns and we sorted them in order to highlight their proportion in each tissue (Additional file
<xref ref-type="supplementary-material" rid="S11">11</xref>
).</p>
</sec>
<sec>
<title>Promoter patterns</title>
<p>We used Visual Basic to develop a software program for promoter analysis - called PromKappa (Promoter analysis by Kappa), and a software program for sorting promoter patterns - called PromNN (Promoter analysis by Neural Network). The source code implementation of these programs are attached to our Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
. Promoter patterns were generated by PromKappa program. We used sliding window approach to extract two types of values: Kappa IC and (C + G)%. A sliding window with a step of 1 and a window size of 30 nt, allowed us to detail the structure of known promoters. Kappa Index of Coincidence values were plotted on a graph against (C + G)% values, which form a recognizable pattern composed from clusters of various sizes on the Y-axis (Figure
<xref ref-type="fig" rid="F1">1A</xref>
-J). The X-coordinate of each point was represented by a (C + G)% value and the Y-coordinate was represented by a corresponding Kappa IC value. As can be expected, by using a large window size we obtained smooth promoter patterns, whereas a small window size generated sharp and distinguishable characteristics of promoters which have been easily categorized.</p>
</sec>
<sec>
<title>Promoter analysis</title>
<p>We conducted three types of analysis. Initially, for each promoter sequence we generated a graph, representing a promoter pattern. In total, we generated 20,586 graphs (Additional file
<xref ref-type="supplementary-material" rid="S12">12</xref>
). These graphs were saved in BMP (Bitmap Image File) format and were sorted by their shape and density using a neural network. In the second analysis, the center of each pattern was plotted on a graph designed to show the distribution of promoters for each species. We used a color scheme to highlight the denser surfaces. Red areas represent clusters of similar promoters while blue areas represent unique or rare promoters (Figure
<xref ref-type="fig" rid="F3">3</xref>
A-D). For the third analysis, we measured the specificity of each promoter class among thirty tissues by using 2,369 promoters (Figure
<xref ref-type="fig" rid="F7">7A</xref>
,B).</p>
</sec>
<sec>
<title>Pattern recognition and sorting</title>
<p>We have been able to demarcate promoter sequences into ten classes by using the maximum number (≥100) of appearances of similar promoter patterns. To determine the biological characteristics of promoter sequences, we have resorted to machine learning methods. All patterns were analyzed and sorted by PromNN, a pattern recognizer program using 93,264 artificial neurons and a single layer perceptron. It has the ability to learn patterns and classify them into specified classes. We used supervised learning to train the neural network by using 200 input patterns (20 of each class of promoters, 5 from each species - Additional file
<xref ref-type="supplementary-material" rid="S13">13</xref>
). PromNN recognized ten promoter classes and provided information about the match score and match percentage for each promoter pattern.</p>
</sec>
<sec>
<title>Cytosine and guanine content</title>
<p>We extracted C + G values from each sliding window considering the nucleotide frequencies from the entire promoter sequence. In the first stage, to determine the (C + G)% content for the entire promoter sequence we used the formula:</p>
<p>
<disp-formula>
<mml:math id="M1" name="1471-2164-13-512-i1" overflow="scroll">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi mathvariant="italic">TOT</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mfrac>
<mml:mn>100</mml:mn>
<mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="italic">TOT</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mfenced>
<mml:mo>×</mml:mo>
<mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="italic">TOT</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Where “TOT” (total) designates the promoter sequence.
<italic>CG</italic>
<sub>
<italic>TOT</italic>
</sub>
represents the percentage of cytosine and guanine of the entire promoter,
<italic>(A + T +</italic>
<italic>C + G)</italic>
<sub>
<italic>TOT</italic>
</sub>
represents the sum of occurrences of A, T, C and G, and
<italic>(C + G)</italic>
<sub>
<italic>TOT</italic>
</sub>
represents the sum of occurrences of C and G. In the next stage we used the value of
<italic>CG</italic>
<sub>
<italic>TOT</italic>
</sub>
to calculate the (C + G)% content from the sliding window (SW):</p>
<p>
<disp-formula>
<mml:math id="M2" name="1471-2164-13-512-i2" overflow="scroll">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi mathvariant="italic">SW</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mfrac>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi mathvariant="italic">TOT</mml:mi>
</mml:msub>
</mml:mrow>
<mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="italic">SW</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mfenced>
<mml:mo>×</mml:mo>
<mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="italic">SW</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Where
<italic>CG</italic>
<sub>
<italic>SW</italic>
</sub>
represents the percentage of cytosine and guanine from the sliding window. In this stage,
<italic>CG</italic>
<sub>
<italic>SW</italic>
</sub>
value is relative to
<italic>CG</italic>
<sub>
<italic>TOT</italic>
</sub>
. The expression
<italic>(A + T +</italic>
<italic>C + G)</italic>
<sub>
<italic>TOT</italic>
</sub>
represents the sum of occurrences of A, T, C and G from the sliding window sequence.
<italic>(C + G)</italic>
<sub>
<italic>SW</italic>
</sub>
represents the sum of C and G occurrences in the sliding window sequence. Nevertheless, in our implementation we also included the option to extract
<italic>CG</italic>
<sub>
<italic>SW</italic>
</sub>
values without considering
<italic>CG</italic>
<sub>
<italic>TOT</italic>
</sub>
.</p>
</sec>
<sec>
<title>Kappa Index of Coincidence</title>
<p>The Index of coincidence principle is based on letter frequency distributions and has been used for the analysis of natural-language plaintext in cryptanalysis. Kappa Index of Coincidence is a form of Index of Coincidence used for matching two text strings. Nevertheless, we managed to adapt Kappa IC for the analysis of a single DNA sequence. Here, Kappa IC is used for calculating the level of “randomization” of a DNA sequence. By extracting Kappa IC and C + G content from a sliding window we have been able to measure the localized values along each promoter sequence. Kappa IC is sensitive to various degrees of sequence organization such as simple sequence repeats (SSRs) or short tandem repeats (STRs). The formula for Kappa IC is shown below, where sequences
<italic>A</italic>
and
<italic>B</italic>
have the same length
<italic>N</italic>
. Only if an
<italic>A[i]</italic>
nucleotide from sequence A matches the
<italic>B[i]</italic>
correspondent from sequence
<italic>B</italic>
, then ∑ is incremented by 1.</p>
<p>
<disp-formula>
<mml:math id="M3" name="1471-2164-13-512-i3" overflow="scroll">
<mml:mi>K</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>p</mml:mi>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi mathvariant="italic">IC</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mtext></mml:mtext>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>With small changes, the same method for measuring the Index of Coincidence has been applied for only one sequence, in which the sequence was actually compared with itself, as shown below in the algorithm implementation.</p>
<p>function KIC(A)</p>
<p>T = 0</p>
<p>N = length(A) - 1</p>
<p>  for u = 1 to N</p>
<p>  B = A[u + 1] … A[N]</p>
<p>    for i = 1 to length(B)</p>
<p>    If A[i] = B[i] then C = C + 1</p>
<p>    next i</p>
<p>  T = T + (C / length(B) × 100)</p>
<p>  C = 0</p>
<p>next u</p>
<p>  IC = Round((T / N), 2)</p>
<p>end function</p>
<p>Where
<italic>N</italic>
is the length of the sliding window,
<italic>A</italic>
represents the sliding window content,
<italic>B</italic>
contains all variants of sequences generated from
<italic>A</italic>
(from
<italic>u + 1</italic>
to
<italic>N</italic>
),
<italic>C</italic>
counts the number of coincidences occurring between sequence
<italic>B</italic>
and sequence
<italic>A</italic>
, and
<italic>T</italic>
variable counts the total number of coincidences found between sequences of
<italic>B</italic>
and the sequence
<italic>A</italic>
.</p>
</sec>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors’ contributions</title>
<p>PG conceived of the study and participated in its design and coordination. PG created the algorithms and the software used in the analysis. CIT carried out the assembly of promoter files and manually tested the correctness of each promoter sequence. PG and CIT participated in the promoter sequence analysis and drafted the manuscript. Both authors have verified the accuracy of the data and repeated the experiment independently. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>Promotor sequences.</bold>
A complete set of 20,586 gene promoters from The Eukaryotic Promoter Database (6,649 gene promoters -
<italic>Oryza sativa</italic>
, 1,922 gene promoters -
<italic>Drosophila Melanogaster</italic>
and 8,512 gene promoters -
<italic>Homo sapiens</italic>
) and PlantProm Database (3,503 gene promoters -
<italic>Arabidopsis thaliana</italic>
).</p>
</caption>
<media xlink:href="1471-2164-13-512-S1.rar" mimetype="audio" mime-subtype="x-realaudio">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2</title>
<p>
<bold>Organism-specific data.</bold>
Comparative analysis undertaken for 20,586 promoters from
<italic>Arabidopsis thaliana</italic>
,
<italic>Drosophila melanogaster</italic>
,
<italic>Homo sapiens</italic>
and
<italic>Oryza sativa</italic>
.</p>
</caption>
<media xlink:href="1471-2164-13-512-S2.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3</title>
<p>
<bold>PromKappa and PromNN.</bold>
PromKappa (Promoter analysis by Kappa) software program used for promoter pattern generation and promoter analysis and PromNN (Promoter analysis by Neural Network) software program used for sorting promoter patterns.</p>
</caption>
<media xlink:href="1471-2164-13-512-S3.rar" mimetype="audio" mime-subtype="x-realaudio">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional file 4</title>
<p>
<bold>Tissue-specific data.</bold>
Promoter analysis in
<italic>Homo sapiens</italic>
, based on tissuespecific gene expression profiles.</p>
</caption>
<media xlink:href="1471-2164-13-512-S4.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional file 5</title>
<p>
<bold>Observations for HOX genes.</bold>
Comparative analysis of HOX gene promoter patterns.</p>
</caption>
<media xlink:href="1471-2164-13-512-S5.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional file 6</title>
<p>
<bold>Observed and expected frequencies.</bold>
Analysis involving expected and observed promoter frequencies based on Organism-specific and Tissue-specific data.</p>
</caption>
<media xlink:href="1471-2164-13-512-S6.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S7">
<caption>
<title>Additional file 7</title>
<p>
<bold>Distribution in other species.</bold>
A secondary distribution of gene promoters in
<italic>Bos taurus</italic>
,
<italic>Gallus gallus</italic>
,
<italic>Mus musculus</italic>
,
<italic>Rattus norvegicus</italic>
,
<italic>Xenopus laevis</italic>
and
<italic>Zea mays</italic>
.</p>
</caption>
<media xlink:href="1471-2164-13-512-S7.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S8">
<caption>
<title>Additional file 8</title>
<p>
<bold>DET1 promoter pattern.</bold>
Shows a simulation which highlights the promoter sequence resistance to random mutations.</p>
</caption>
<media xlink:href="1471-2164-13-512-S8.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S9">
<caption>
<title>Additional file 9</title>
<p>
<bold>Promoter pattern animation.</bold>
Animation showing several hundred promoter patterns in succession and ordered according to their class.</p>
</caption>
<media xlink:href="1471-2164-13-512-S9.wmv" mimetype="video" mime-subtype="x-ms-wmv">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S10">
<caption>
<title>Additional file 10</title>
<p>
<bold>List of tissue-specific genes.</bold>
List of 6,534 tissue-specific gene names from TiGER database (gene names were sorted and redundancy was removed).</p>
</caption>
<media xlink:href="1471-2164-13-512-S10.rar" mimetype="audio" mime-subtype="x-realaudio">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S11">
<caption>
<title>Additional file 11</title>
<p>
<bold>Tissue-specific promoter patterns.</bold>
The complete set of 2,369 image-based promoter patterns used for tissue-specific analysis.</p>
</caption>
<media xlink:href="1471-2164-13-512-S11.rar" mimetype="audio" mime-subtype="x-realaudio">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S12">
<caption>
<title>Additional file 12</title>
<p>
<bold>Organism-specific promoter patterns.</bold>
The complete set of 20,597 image-based promoter patterns used for a comparative analysis of the four species taken into consideration.</p>
</caption>
<media xlink:href="1471-2164-13-512-S12.rar" mimetype="audio" mime-subtype="x-realaudio">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S13">
<caption>
<title>Additional file 13</title>
<p>
<bold>Gene promoters used for neural network training.</bold>
List of 918 image-based promoter patterns used for PromNN training.</p>
</caption>
<media xlink:href="1471-2164-13-512-S13.rar" mimetype="audio" mime-subtype="x-realaudio">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgments</title>
<p>This work was supported by a grant of the Romanian National Authority for Scientific Research, CNCS-UEFISCDI, project number PN-II-ID-PCE-2011-3-0429.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Levine</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tjian</surname>
<given-names>R</given-names>
</name>
<article-title>Transcription regulation and animal diversity</article-title>
<source>Nature</source>
<year>2003</year>
<volume>424</volume>
<fpage>147</fpage>
<lpage>151</lpage>
<pub-id pub-id-type="doi">10.1038/nature01763</pub-id>
<pub-id pub-id-type="pmid">12853946</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Smale</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Kadonaga</surname>
<given-names>JT</given-names>
</name>
<article-title>The RNA polymerase II core promoter</article-title>
<source>Annu Rev Biochem</source>
<year>2003</year>
<volume>72</volume>
<fpage>449</fpage>
<lpage>479</lpage>
<pub-id pub-id-type="doi">10.1146/annurev.biochem.72.121801.161520</pub-id>
<pub-id pub-id-type="pmid">12651739</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Hahn</surname>
<given-names>S</given-names>
</name>
<article-title>Structure and mechanism of the RNA polymerase II transcription machinery</article-title>
<source>Nat Struct Mol Biol</source>
<year>2004</year>
<volume>11</volume>
<fpage>394</fpage>
<lpage>403</lpage>
<pub-id pub-id-type="doi">10.1038/nsmb763</pub-id>
<pub-id pub-id-type="pmid">15114340</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Bucher</surname>
<given-names>P</given-names>
</name>
<article-title>Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences</article-title>
<source>J Mol Biol</source>
<year>1990</year>
<volume>212</volume>
<fpage>563</fpage>
<lpage>578</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(90)90223-9</pub-id>
<pub-id pub-id-type="pmid">2329577</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Mantovani</surname>
<given-names>R</given-names>
</name>
<article-title>The molecular biology of the CCAAT-binding factor NF-Y</article-title>
<source>Gene</source>
<year>1999</year>
<volume>239</volume>
<fpage>15</fpage>
<lpage>27</lpage>
<pub-id pub-id-type="doi">10.1016/S0378-1119(99)00368-6</pub-id>
<pub-id pub-id-type="pmid">10571030</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Fujimori</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Washio</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tomita</surname>
<given-names>M</given-names>
</name>
<article-title>GC-compositional strand bias around transcription start sites in plants and fungi</article-title>
<source>BMC Genomics</source>
<year>2005</year>
<volume>6</volume>
<fpage>26</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-6-26</pub-id>
<pub-id pub-id-type="pmid">15733327</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Tatarinova</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Brover</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Troukhan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Alexandrov</surname>
<given-names>N</given-names>
</name>
<article-title>Skew in CG content near the transcription start site in Arabidopsis thaliana</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<issue>Suppl. 1</issue>
<fpage>1313</fpage>
<lpage>1314</lpage>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Molina</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Grotewold</surname>
<given-names>E</given-names>
</name>
<article-title>Genome wide analysis of Arabidopsis core promoters</article-title>
<source>BMC Genomics</source>
<year>2005</year>
<volume>6</volume>
<fpage>25</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-6-25</pub-id>
<pub-id pub-id-type="pmid">15733318</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Juo</surname>
<given-names>ZS</given-names>
</name>
<name>
<surname>Chiu</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Leiberman</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Baikalov</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Berk</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Dickerson</surname>
<given-names>RE</given-names>
</name>
<article-title>How proteins recognize the TATA box</article-title>
<source>J Mol Biol</source>
<year>1996</year>
<volume>261</volume>
<fpage>239</fpage>
<lpage>254</lpage>
<pub-id pub-id-type="doi">10.1006/jmbi.1996.0456</pub-id>
<pub-id pub-id-type="pmid">8757291</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Kiran</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ansari</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Srivastava</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lodhi</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Chaturvedi</surname>
<given-names>CP</given-names>
</name>
<name>
<surname>Sawant</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Tuli</surname>
<given-names>R</given-names>
</name>
<article-title>The TATA-box sequence in the basal promoter contributes to determining light-dependent gene expression in plants</article-title>
<source>Plant Physiol</source>
<year>2006</year>
<volume>142</volume>
<fpage>364</fpage>
<lpage>376</lpage>
<pub-id pub-id-type="doi">10.1104/pp.106.084319</pub-id>
<pub-id pub-id-type="pmid">16844831</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Yamamoto</surname>
<given-names>YY</given-names>
</name>
<name>
<surname>Ichida</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Matsui</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Obokata</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sakurai</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Satou</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Seki</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shinozaki</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Abe</surname>
<given-names>T</given-names>
</name>
<article-title>Identification of plant promoter constituents by analysis of local distribution of short sequences</article-title>
<source>BMC Genomics</source>
<year>2007</year>
<volume>8</volume>
<fpage>67</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-8-67</pub-id>
<pub-id pub-id-type="pmid">17346352</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Ioshikhes</surname>
<given-names>IP</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>MQ</given-names>
</name>
<article-title>Large-scale human promoter mapping using CpG islands</article-title>
<source>Nat Genet</source>
<year>2000</year>
<volume>26</volume>
<fpage>61</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="doi">10.1038/79189</pub-id>
<pub-id pub-id-type="pmid">10973249</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Ludwig</surname>
<given-names>MZ</given-names>
</name>
<article-title>Functional evolution of noncoding DNA</article-title>
<source>Curr Opin Genet Dev</source>
<year>2002</year>
<volume>12</volume>
<fpage>634</fpage>
<lpage>639</lpage>
<pub-id pub-id-type="doi">10.1016/S0959-437X(02)00355-6</pub-id>
<pub-id pub-id-type="pmid">12433575</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Yamamoto</surname>
<given-names>YY</given-names>
</name>
<name>
<surname>Yoshioka</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Hyakumachi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Obokata</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yoshiharu</surname>
<given-names>Y</given-names>
</name>
<article-title>Characteristics of core promoter types with respect to gene structure and expression in Arabidopsis thaliana</article-title>
<source>DNA Res</source>
<year>2011</year>
<volume>18</volume>
<fpage>333</fpage>
<lpage>342</lpage>
<pub-id pub-id-type="doi">10.1093/dnares/dsr020</pub-id>
<pub-id pub-id-type="pmid">21745829</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Fukue</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Sumida</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Nishikawa</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ohyama</surname>
<given-names>T</given-names>
</name>
<article-title>Core promoter elements of eukaryotic genes have a highly distinctive mechanical property</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>5834</fpage>
<lpage>5840</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh905</pub-id>
<pub-id pub-id-type="pmid">15520466</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Florquin</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Saeys</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Degroeve</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rouzé</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
<article-title>Large-scale structural analysis of the core promoter in mammalian and plant genomes</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>4255</fpage>
<lpage>4264</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gki737</pub-id>
<pub-id pub-id-type="pmid">16049029</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Kanhere</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bansal</surname>
<given-names>M</given-names>
</name>
<article-title>Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>3165</fpage>
<lpage>3175</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gki627</pub-id>
<pub-id pub-id-type="pmid">15939933</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Yamamoto</surname>
<given-names>YY</given-names>
</name>
<name>
<surname>Ichida</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Abe</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Suzuki</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Sugano</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Obokata</surname>
<given-names>J</given-names>
</name>
<article-title>Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>6219</fpage>
<lpage>6226</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkm685</pub-id>
<pub-id pub-id-type="pmid">17855401</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Dineen</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Wilm</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cunningham</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>DG</given-names>
</name>
<article-title>High DNA melting temperature predicts transcription start site location in human and mouse</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<fpage>7360</fpage>
<lpage>7367</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkp821</pub-id>
<pub-id pub-id-type="pmid">19820114</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Carninci</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sandelin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lenhard</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Katayama</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shimokawa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ponjavic</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Semple</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Engström</surname>
<given-names>PG</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Forrest</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Alkema</surname>
<given-names>WB</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Plessy</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kodzius</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ravasi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kasukawa</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Fukuda</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kanamori-Katayama</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kitazume</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kawaji</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kai</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Nakamura</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Konno</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nakano</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mottagui-Tabar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Arner</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Chesi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gustincich</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Persichetti</surname>
<given-names>F</given-names>
</name>
<etal></etal>
<article-title>Genome-wide analysis of mammalian promoter architecture and evolution</article-title>
<source>Nat Genet</source>
<year>2006</year>
<volume>38</volume>
<fpage>626</fpage>
<lpage>635</lpage>
<pub-id pub-id-type="doi">10.1038/ng1789</pub-id>
<pub-id pub-id-type="pmid">16645617</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="book">
<name>
<surname>Friedman</surname>
<given-names>WF</given-names>
</name>
<source>The index of coincidence and its applications in cryptology</source>
<series>Department of Ciphers</series>
<year>1922</year>
<publisher-name>Geneva: Riverbank Laboratories</publisher-name>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="book">
<name>
<surname>Mountjoy</surname>
<given-names>M</given-names>
</name>
<source>The Bar Statistics</source>
<year>1963</year>
<publisher-name>USA: NSA Technical Journal VII (2,4)</publisher-name>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="book">
<name>
<surname>Friedman</surname>
<given-names>WF</given-names>
</name>
<name>
<surname>Callimahos</surname>
<given-names>LD</given-names>
</name>
<source>Military Cryptanalytics. Part I, 2</source>
<year>1985</year>
<publisher-name>USA: Reprinted by Aegean Park Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="book">
<name>
<surname>Kahn</surname>
<given-names>D</given-names>
</name>
<source>[1967] The Codebreakers - TheStory of Secret Writing</source>
<year>1996</year>
<publisher-name>Macmillan: New York</publisher-name>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Schmid</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Perier</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Praz</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Bucher</surname>
<given-names>P</given-names>
</name>
<article-title>EPD in its twentieth year: towards complete promoter coverage of selected model organisms</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<issue>Database issue</issue>
<fpage>D82</fpage>
<lpage>D85</lpage>
<pub-id pub-id-type="pmid">16381980</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Périer</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Praz</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Junier</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bonnard</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bucher</surname>
<given-names>P</given-names>
</name>
<article-title>The eukaryotic promoter database (EPD)</article-title>
<source>Nucleic Acids Res</source>
<year>2000</year>
<volume>28</volume>
<issue>1</issue>
<fpage>302</fpage>
<lpage>303</lpage>
<pub-id pub-id-type="doi">10.1093/nar/28.1.302</pub-id>
<pub-id pub-id-type="pmid">10592254</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Shahmuradov</surname>
<given-names>IA</given-names>
</name>
<name>
<surname>Gammerman</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Hancock</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Bramley</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Solovyev</surname>
<given-names>VV</given-names>
</name>
<article-title>PlantProm: a database of plant promoter sequences</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>114</fpage>
<lpage>117</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkg041</pub-id>
<pub-id pub-id-type="pmid">12519961</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zack</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>J</given-names>
</name>
<article-title>TiGER: a database for tissue-specific gene expression and regulation</article-title>
<source>BMC Bioinforma</source>
<year>2008</year>
<volume>9</volume>
<fpage>271</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-271</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Yu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zack</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>J</given-names>
</name>
<article-title>Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors</article-title>
<source>BMC Bioinforma</source>
<year>2007</year>
<volume>8</volume>
<fpage>437</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-437</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Yu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zack</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>J</given-names>
</name>
<article-title>Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<fpage>4925</fpage>
<lpage>4936</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkl595</pub-id>
<pub-id pub-id-type="pmid">16982645</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Nelson</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Finch</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Luisi</surname>
<given-names>BF</given-names>
</name>
<name>
<surname>Klug</surname>
<given-names>A</given-names>
</name>
<article-title>The structure of an oligo(dA).oligo(dT) tract and its biological implications</article-title>
<source>Nature</source>
<year>1987</year>
<volume>330</volume>
<fpage>221</fpage>
<lpage>226</lpage>
<pub-id pub-id-type="doi">10.1038/330221a0</pub-id>
<pub-id pub-id-type="pmid">3670410</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<name>
<surname>Zhou</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bizzaro</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Marx</surname>
<given-names>KA</given-names>
</name>
<article-title>Homopolymer tract length dependent enrichments in functional regions of 27 eukaryotes and their novel dependence on the organism DNA (G + C)% composition</article-title>
<source>BMC Genomics</source>
<year>2004</year>
<volume>5</volume>
<fpage>95</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-5-95</pub-id>
<pub-id pub-id-type="pmid">15598342</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Gershenzon</surname>
<given-names>NI</given-names>
</name>
<name>
<surname>Ioshikhes</surname>
<given-names>IP</given-names>
</name>
<article-title>Synergy of human Pol II core promoter elements revealed by statistical sequence analysis</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>1295</fpage>
<lpage>1300</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti172</pub-id>
<pub-id pub-id-type="pmid">15572469</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Suzuki</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tsunoda</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Sese</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Taira</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Mizushima-Sugano</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hata</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ota</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Isogai</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tanaka</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Nakamura</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Suyama</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sakaki</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Morishita</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Okubo</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Sugano</surname>
<given-names>S</given-names>
</name>
<article-title>Identification and characterization of the potential promoter regions of 1031 kinds of human genes</article-title>
<source>Genome Res</source>
<year>2001</year>
<volume>11</volume>
<fpage>677</fpage>
<lpage>684</lpage>
<pub-id pub-id-type="doi">10.1101/gr.GR-1640R</pub-id>
<pub-id pub-id-type="pmid">11337467</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Yang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bolotin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Sladek</surname>
<given-names>FM</given-names>
</name>
<name>
<surname>Martinez</surname>
<given-names>E</given-names>
</name>
<article-title>Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters</article-title>
<source>Gene</source>
<year>2007</year>
<volume>389</volume>
<fpage>52</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1016/j.gene.2006.09.029</pub-id>
<pub-id pub-id-type="pmid">17123746</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<name>
<surname>Bradley</surname>
<given-names>RC</given-names>
</name>
<article-title>Review Article The logic of chromatin architecture and remodelling at promoters</article-title>
<source>Nature</source>
<year>2009</year>
<volume>461</volume>
<fpage>193</fpage>
<lpage>198</lpage>
<pub-id pub-id-type="doi">10.1038/nature08450</pub-id>
<pub-id pub-id-type="pmid">19741699</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Ioshikhes</surname>
<given-names>IP</given-names>
</name>
<name>
<surname>Albert</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zanton</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Pugh</surname>
<given-names>BF</given-names>
</name>
<article-title>Nucleosome positions predicted through comparative genomics</article-title>
<source>Nat Genet</source>
<year>2006</year>
<volume>38</volume>
<fpage>1210</fpage>
<lpage>1215</lpage>
<pub-id pub-id-type="doi">10.1038/ng1878</pub-id>
<pub-id pub-id-type="pmid">16964265</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Albert</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Mavrich</surname>
<given-names>TN</given-names>
</name>
<name>
<surname>Tomsho</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zanton</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Pugh</surname>
<given-names>BF</given-names>
</name>
<article-title>Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome</article-title>
<source>Nature</source>
<year>2007</year>
<volume>446</volume>
<fpage>572</fpage>
<lpage>576</lpage>
<pub-id pub-id-type="doi">10.1038/nature05632</pub-id>
<pub-id pub-id-type="pmid">17392789</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Tirosh</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Berman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Barkai</surname>
<given-names>N</given-names>
</name>
<article-title>The pattern and evolution of yeast promoter bendability</article-title>
<source>Trends Genet</source>
<year>2007</year>
<volume>23</volume>
<fpage>318</fpage>
<lpage>321</lpage>
<pub-id pub-id-type="doi">10.1016/j.tig.2007.03.015</pub-id>
<pub-id pub-id-type="pmid">17418911</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>Tirosh</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Barkai</surname>
<given-names>N</given-names>
</name>
<article-title>Two strategies for gene regulation by promoter nucleosomes</article-title>
<source>Genome Res</source>
<year>2008</year>
<volume>18</volume>
<fpage>1084</fpage>
<lpage>1091</lpage>
<pub-id pub-id-type="doi">10.1101/gr.076059.108</pub-id>
<pub-id pub-id-type="pmid">18448704</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Cai</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Kohwi-Shigematsu</surname>
<given-names>T</given-names>
</name>
<article-title>Tissue-specific nuclear architecture and gene expression regulated by SATB1</article-title>
<source>Nat Genet</source>
<year>2003</year>
<volume>34</volume>
<fpage>42</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="doi">10.1038/ng1146</pub-id>
<pub-id pub-id-type="pmid">12692553</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Iyer</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Struhl</surname>
<given-names>K</given-names>
</name>
<article-title>Poly(dA:dT), a ubiquitous promoter element that stimulates transcription via its intrinsic DNA structure</article-title>
<source>EMBO J</source>
<year>1995</year>
<volume>14</volume>
<fpage>2570</fpage>
<lpage>2579</lpage>
<pub-id pub-id-type="pmid">7781610</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Suter</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Schnappauf</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Thoma</surname>
<given-names>F</given-names>
</name>
<article-title>Poly(dA:dT) sequences exist as rigid DNA structures in nucleosome-free yeast promoters in vivo</article-title>
<source>Nucleic Acids Res</source>
<year>2000</year>
<volume>28</volume>
<fpage>4083</fpage>
<lpage>4089</lpage>
<pub-id pub-id-type="doi">10.1093/nar/28.21.4083</pub-id>
<pub-id pub-id-type="pmid">11058103</pub-id>
</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="journal">
<name>
<surname>Filetici</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Aranda</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gonzàlez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ballario</surname>
<given-names>P</given-names>
</name>
<article-title>GCN5, a yeast transcriptional coactivator, induced chromatin reconfiguration of HIS3 promoter in vivo</article-title>
<source>Biochem Biophys Res</source>
<year>1998</year>
<volume>242</volume>
<fpage>84</fpage>
<lpage>87</lpage>
<pub-id pub-id-type="doi">10.1006/bbrc.1997.7918</pub-id>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="journal">
<name>
<surname>Koch</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Thiele</surname>
<given-names>DJ</given-names>
</name>
<article-title>Functional analysis of a homopolymeric (dA-dT) element that provides nucleosome access to yeast and mammalian transcription factors</article-title>
<source>J Biol Chem</source>
<year>1999</year>
<volume>274</volume>
<fpage>23752</fpage>
<lpage>23760</lpage>
<pub-id pub-id-type="doi">10.1074/jbc.274.34.23752</pub-id>
<pub-id pub-id-type="pmid">10446135</pub-id>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="journal">
<name>
<surname>Fashena</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Reeves</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ruddle</surname>
<given-names>NH</given-names>
</name>
<article-title>A poly(dA:dT) upstream activating sequence binds high-mobility group I protein and contributes to lymphotoxin (tumor necrosis factor-β) gene regulation</article-title>
<source>Mol Cell Biol</source>
<year>1992</year>
<volume>12</volume>
<fpage>894</fpage>
<lpage>903</lpage>
<pub-id pub-id-type="pmid">1732752</pub-id>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="journal">
<name>
<surname>Sayers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Barrett</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Benson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Bolton</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Canese</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chetvernin</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Dicuccio</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Federhen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Feolo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fingerman</surname>
<given-names>IM</given-names>
</name>
<name>
<surname>Geer</surname>
<given-names>LY</given-names>
</name>
<name>
<surname>Helmberg</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Kapustin</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Krasnov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Landsman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Madej</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Maglott</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Marchler-Bauer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Karsch-Mizrachi</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Ostell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Panchenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Phan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Pruitt</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Schuler</surname>
<given-names>GD</given-names>
</name>
<etal></etal>
<article-title>Database resources of the National Center for Biotechnology Information</article-title>
<source>Nucleic Acids Res</source>
<year>2012</year>
<volume>40</volume>
<fpage>D13</fpage>
<lpage>D25</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkr1184</pub-id>
<pub-id pub-id-type="pmid">22140104</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
<affiliations>
<list>
<country>
<li>Roumanie</li>
</country>
</list>
<tree>
<country name="Roumanie">
<noRegion>
<name sortKey="Gagniuc, Paul" sort="Gagniuc, Paul" uniqKey="Gagniuc P" first="Paul" last="Gagniuc">Paul Gagniuc</name>
</noRegion>
<name sortKey="Ionescu Tirgoviste, Constantin" sort="Ionescu Tirgoviste, Constantin" uniqKey="Ionescu Tirgoviste C" first="Constantin" last="Ionescu-Tirgoviste">Constantin Ionescu-Tirgoviste</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000108 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000108 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:3549790
   |texte=   Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:23020586" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024