Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000560 ( Pmc/Corpus ); précédent : 0005599; suivant : 0005610 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Comparative genomics of
<italic>Drosophila </italic>
and human core promoters</title>
<author>
<name sortKey="Fitzgerald, Peter C" sort="Fitzgerald, Peter C" uniqKey="Fitzgerald P" first="Peter C" last="Fitzgerald">Peter C. Fitzgerald</name>
<affiliation>
<nlm:aff id="I1">Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sturgill, David" sort="Sturgill, David" uniqKey="Sturgill D" first="David" last="Sturgill">David Sturgill</name>
<affiliation>
<nlm:aff id="I2">Laboratory of Cellular and Developmental Biology National Institute of Diabetes and Digestive and Kidney, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Shyakhtenko, Andrey" sort="Shyakhtenko, Andrey" uniqKey="Shyakhtenko A" first="Andrey" last="Shyakhtenko">Andrey Shyakhtenko</name>
<affiliation>
<nlm:aff id="I3">Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliver, Brian" sort="Oliver, Brian" uniqKey="Oliver B" first="Brian" last="Oliver">Brian Oliver</name>
<affiliation>
<nlm:aff id="I2">Laboratory of Cellular and Developmental Biology National Institute of Diabetes and Digestive and Kidney, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vinson, Charles" sort="Vinson, Charles" uniqKey="Vinson C" first="Charles" last="Vinson">Charles Vinson</name>
<affiliation>
<nlm:aff id="I3">Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">16827941</idno>
<idno type="pmc">1779564</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779564</idno>
<idno type="RBID">PMC:1779564</idno>
<idno type="doi">10.1186/gb-2006-7-7-r53</idno>
<date when="2006">2006</date>
<idno type="wicri:Area/Pmc/Corpus">000560</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000560</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Comparative genomics of
<italic>Drosophila </italic>
and human core promoters</title>
<author>
<name sortKey="Fitzgerald, Peter C" sort="Fitzgerald, Peter C" uniqKey="Fitzgerald P" first="Peter C" last="Fitzgerald">Peter C. Fitzgerald</name>
<affiliation>
<nlm:aff id="I1">Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sturgill, David" sort="Sturgill, David" uniqKey="Sturgill D" first="David" last="Sturgill">David Sturgill</name>
<affiliation>
<nlm:aff id="I2">Laboratory of Cellular and Developmental Biology National Institute of Diabetes and Digestive and Kidney, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Shyakhtenko, Andrey" sort="Shyakhtenko, Andrey" uniqKey="Shyakhtenko A" first="Andrey" last="Shyakhtenko">Andrey Shyakhtenko</name>
<affiliation>
<nlm:aff id="I3">Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliver, Brian" sort="Oliver, Brian" uniqKey="Oliver B" first="Brian" last="Oliver">Brian Oliver</name>
<affiliation>
<nlm:aff id="I2">Laboratory of Cellular and Developmental Biology National Institute of Diabetes and Digestive and Kidney, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vinson, Charles" sort="Vinson, Charles" uniqKey="Vinson C" first="Charles" last="Vinson">Charles Vinson</name>
<affiliation>
<nlm:aff id="I3">Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Genome Biology</title>
<idno type="ISSN">1465-6906</idno>
<idno type="eISSN">1465-6914</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Comparison of DNA sequence distributions in
<italic>Drosophila </italic>
and human promoters suggests that different motifs have distinct functional roles.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Genome Biol</journal-id>
<journal-title>Genome Biology</journal-title>
<issn pub-type="ppub">1465-6906</issn>
<issn pub-type="epub">1465-6914</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">16827941</article-id>
<article-id pub-id-type="pmc">1779564</article-id>
<article-id pub-id-type="publisher-id">gb-2006-7-7-r53</article-id>
<article-id pub-id-type="doi">10.1186/gb-2006-7-7-r53</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Comparative genomics of
<italic>Drosophila </italic>
and human core promoters</article-title>
</title-group>
<contrib-group>
<contrib id="A1" contrib-type="author">
<name>
<surname>FitzGerald</surname>
<given-names>Peter C</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>pcf@helix.nih.gov</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Sturgill</surname>
<given-names>David</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>sturgill@helix.nih.gov</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Shyakhtenko</surname>
<given-names>Andrey</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>shlyakha@mail.nih.gov</email>
</contrib>
<contrib id="A4" contrib-type="author">
<name>
<surname>Oliver</surname>
<given-names>Brian</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>oliver@helix.nih.gov</email>
</contrib>
<contrib id="A5" corresp="yes" contrib-type="author">
<name>
<surname>Vinson</surname>
<given-names>Charles</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>vinsonc@dc37a.nci.nih.gov</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</aff>
<aff id="I2">
<label>2</label>
Laboratory of Cellular and Developmental Biology National Institute of Diabetes and Digestive and Kidney, National Institutes of Health, Bethesda, MD 20892, USA</aff>
<aff id="I3">
<label>3</label>
Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA</aff>
<pub-date pub-type="ppub">
<year>2006</year>
</pub-date>
<pub-date pub-type="epub">
<day>7</day>
<month>7</month>
<year>2006</year>
</pub-date>
<volume>7</volume>
<issue>7</issue>
<fpage>R53</fpage>
<lpage>R53</lpage>
<ext-link ext-link-type="uri" xlink:href="http://genomebiology.com/2006/7/7/R53"></ext-link>
<history>
<date date-type="received">
<day>22</day>
<month>3</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>8</day>
<month>5</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>6</day>
<month>6</month>
<year>2006</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2006 FitzGerald et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2006</copyright-year>
<copyright-holder>FitzGerald et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an open access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> FitzGerald C Peter pcf@helix.nih.gov Comparative genomics of Drosophila and human core promoters 2006Genome Biology 7(7): R53-. (2006)1465-6906(2006)7:7urn:ISSN:1465-6906</pmc-comment>
</license>
</permissions>
<abstract abstract-type="short">
<p>Comparison of DNA sequence distributions in
<italic>Drosophila </italic>
and human promoters suggests that different motifs have distinct functional roles.</p>
</abstract>
<abstract>
<sec>
<title>Background</title>
<p>The core promoter region plays a critical role in the regulation of eukaryotic gene expression. We have determined the non-random distribution of DNA sequences relative to the transcriptional start site in
<italic>Drosophila melanogaster </italic>
promoters to identify sequences that may be biologically significant. We compare these results with those obtained for human promoters.</p>
</sec>
<sec>
<title>Results</title>
<p>We determined the distribution of all 65,536 octamer (8-mers) DNA sequences in 10,914
<italic>Drosophila </italic>
promoters and two sets of human promoters aligned relative to the transcriptional start site. In
<italic>Drosophila</italic>
, 298 8-mers have highly significant (
<italic>p </italic>
≤ 1 × 10
<sup>-16</sup>
) non-random distributions peaking within 100 base-pairs of the transcriptional start site. These sequences were grouped into 15 DNA motifs. Ten motifs, termed directional motifs, occur only on the positive strand while the remaining five motifs, termed non-directional motifs, occur on both strands. The only directional motifs to localize in human promoters are TATA, INR, and DPE. The directional motifs were further subdivided into those precisely positioned relative to the transcriptional start site and those that are positioned more loosely relative to the transcriptional start site. Similar numbers of non-directional motifs were identified in both species and most are different. The genes associated with all 15 DNA motifs, when they occur in the peak, are enriched in specific Gene Ontology categories and show a distinct mRNA expression pattern, suggesting that there is a core promoter code in
<italic>Drosophila</italic>
.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>
<italic>Drosophila </italic>
and human promoters use different DNA sequences to regulate gene expression, supporting the idea that evolution occurs by the modulation of gene regulation.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The regulation of eukaryotic gene expression is a complex process involving many different control mechanisms, including chromatin structure and DNA sequences that bind specific proteins [
<xref ref-type="bibr" rid="B1">1</xref>
]. For convenience, we divide DNA sequence motifs that are bound by proteins into three distinct classes: the core promoter region where the basal transcription machinery binds; motifs within the core promoter region that bind to transcription factors; and classic enhancer or silencer motifs, that function at large distances from the transcriptional start site (TSS). Two extremes of regulated gene expression may be envisioned. In one extreme, the general transcriptional machinery is identical for all promoters, and the binding of different transcription factors to the core promoter and more distant motifs recruits and regulates RNA polymerase activity to control gene expression. In the other extreme, different motifs within the core promoter direct the assembly of transcriptional machinery with different components. The latter system is used in prokaryotic systems where different sigma factors, a component of the polymerase complex, bind different motifs in the core promoter to regulate functionally related genes [
<xref ref-type="bibr" rid="B2">2</xref>
]. This type of system also operates in sex specific tissues of
<italic>Drosophila </italic>
where the germ cells express variant isoforms of the general transcriptional complex [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
] termed core promoter selectivity factors [
<xref ref-type="bibr" rid="B5">5</xref>
]. Furthermore, genetic studies in
<italic>Drosophila </italic>
indicate that the core promoter contains information that directs tissue-specific mRNA expression [
<xref ref-type="bibr" rid="B6">6</xref>
-
<xref ref-type="bibr" rid="B9">9</xref>
].</p>
<p>A variety of computational methods have been used to identify DNA binding sites for transcription factors and core promoter elements in both
<italic>Drosophila </italic>
and human [
<xref ref-type="bibr" rid="B10">10</xref>
-
<xref ref-type="bibr" rid="B12">12</xref>
]. Previous full-genome-analysis of
<italic>Drosophila </italic>
core promoters has examined abundance, but not the precise positioning of motifs near the TSS. Here, we use the technique of examining non-random distribution relative to the TSS in
<italic>Drosophila melanogaster </italic>
promoter sequences to identify DNA motifs that are biologically significant. This study adds to our understanding of
<italic>Drosophila </italic>
core promoters by identifying new motifs and showing that motifs correlate with different biological functions. Comparing these results with those obtained with human indicate that the DNA motifs that localize are different except for the strand specific core promoter elements TATA, initiator element (INR), and downstream promoter element (DPE).</p>
</sec>
<sec>
<title>Results</title>
<p>Genomic DNA sequences and gene annotation data for
<italic>Drosophila </italic>
and human were downloaded from the UCSC Genome Browser site [
<xref ref-type="bibr" rid="B13">13</xref>
]. Human gene annotation data were also obtained from the DBTSS [
<xref ref-type="bibr" rid="B14">14</xref>
]. For each organism, we created a dataset corresponding to the region -1,001 to +499 base-pairs (bp) relative to the annotated TSS sequences of each RefSeq gene that had an annotated 5' untranslated region (UTR) of 10 or more bp. We created two human datasets, one using the UCSC annotations and one using the DBTSS annotations.</p>
<sec>
<title>Distribution of mono-nucleotides is different between
<italic>Drosophila </italic>
and human promoters</title>
<p>To determine the gross structure of
<italic>Drosophila </italic>
and human promoters, we determined the abundance of the four mononucleotides (1-mer; Figure
<xref ref-type="fig" rid="F1">1a</xref>
) across the 1,500 bp from -1,000 bp to +499 bp for 10,914
<italic>Drosophila </italic>
promoters and compared these to distributions in 15,011 (UCSC) and 12,926 (DBTSS) human promoters (Figure
<xref ref-type="fig" rid="F1">1b,c</xref>
).
<italic>Drosophila </italic>
promoters are more A and T rich (56%) than human promoters (44%). In addition,
<italic>Drosophila </italic>
promoters had a peak for both A and T between -200 bp and the TSS, while the human promoters had a broad peak for both G and C centered at the TSS, suggesting a fundamental difference in global promoter architecture. The two human datasets show the same general distribution patterns, but the DBTSS set has more pronounced peaks and valleys at the TSS.</p>
<p>The CA dinucleotide is often associated with the TSS [
<xref ref-type="bibr" rid="B15">15</xref>
] and is often associated with a unique TSS [
<xref ref-type="bibr" rid="B16">16</xref>
]. RNA polymerase is known to prefer an adenine in the +1 position [
<xref ref-type="bibr" rid="B17">17</xref>
]. This provides an important quality control metric. A tight cluster of CA sites at the TSS would indicate that enough TSSs have been accurately assigned to permit analysis of other motifs. Figure
<xref ref-type="fig" rid="F1">1d</xref>
presents the CA dinucleotide distribution plotted at a single nucleotide resolution, rather than the 20 bp bin shown in Figure
<xref ref-type="fig" rid="F1">1a-c</xref>
. The CA distribution in both
<italic>Drosophila </italic>
and human promoters showed a spike exactly at the TSS (the A of the CA dinucleotide is at position +1 in the peak). The
<italic>Drosophila </italic>
CA spike at the TSS occurs in approximately 20% of all promoters while the spike is less pronounced in the human (UCSC) dataset (approximately 10%) and more pronounced in the human (DBTSS) dataset (approximately 40%). This CA peak is part of the initiator (INR) motif (TCAGTY) that is positioned at the TSS (see below). That CA is often present at the TSS suggests that the TSS has been appropriately assigned in many of the transcripts in both the
<italic>Drosophila </italic>
and human promoter dataset. If the CA peak is taken as a relative measure of the quality, or precise alignment, of the datasets, then the two human sets bracket the
<italic>Drosophila </italic>
set with respect to the accuracy of the positioning of the TSS.</p>
</sec>
<sec>
<title>Distribution of all 8-mer DNA sequences in promoters</title>
<p>Having validated the quality of the TSS assignments, we determined the distribution of all 8-mers in the set of
<italic>Drosophila </italic>
and human putative promoters to identify potential DNA binding sites for transcription factors that are localized relative to the TSS. A clustering factor (CF), describing the presence of a peak in the distribution of each 8-mer, was calculated three ways, by examining the distribution on both strands (CF), on the positive strand (CF
<sup>+</sup>
), and on the negative strand (CF
<sup>-</sup>
). For these calculations we divided the 1,500 bp of genomic DNA, from -1,000 bp to +499 bp relative to the TSS, into 75 bins of 20 bp each (see Materials and methods).</p>
<p>When CF values were plotted against the bin with the maximum number of members for the
<italic>Drosophila </italic>
and human promoters, respectively (Figure
<xref ref-type="fig" rid="F2">2a-c</xref>
), all distributions showed similar patterns, with a grouping of DNA sequences that peak within 100 bp of the TSS. The highest CF values for all plots is 20 to 30, indicating that these 8-mers are approximately 20 to 30 times more abundant at one position relative to the TSS than elsewhere in promoters. In contrast to the similarity in CF values, when the data were plotted for CF
<sup>+</sup>
, (Figure
<xref ref-type="fig" rid="F2">2d-f</xref>
), a profound difference between
<italic>Drosophila </italic>
and both human datasets was revealed.
<italic>Drosophila </italic>
8-mers have a maximum CF
<sup>+ </sup>
value of approximately 50 while the maximum CF
<sup>+ </sup>
for human sequences is approximately 20. This suggests that
<italic>Drosophila </italic>
has more 8-mers that occur preferentially on one strand of DNA, and that the
<italic>Drosophila </italic>
strand-dependent 8-mers have a higher degree of localization than their human counterparts. Control data, using 7th-order Markov random datasets, show a complete lack of clustering for any 8-mers for either human or
<italic>Drosophila </italic>
(data not shown).</p>
<p>To determine if an 8-mer has a peak in its distribution on only one strand of DNA, we compared the CF
<sup>+ </sup>
with the CF on the opposite strand (CF
<sup>-</sup>
). In
<italic>Drosophila</italic>
, we identified two types of peaking 8-mers; those that peak on both strands and thus have similar CF
<sup>+ </sup>
and CF
<sup>- </sup>
values (termed non-directional motifs (NDMs)), and 8-mers that peak preferentially on one strand (termed directional motifs (DMs)) and thus have significantly different CF
<sup>+ </sup>
and CF
<sup>- </sup>
values (Figure
<xref ref-type="fig" rid="F3">3a</xref>
). Indeed, many motifs are randomly positioned on one strand and >20-fold enriched at a given position of the opposite strand. These two distinct types of motifs are potentially bound by proteins that have different roles in transcription regulation. The 8-mers with a high CF
<sup>+ </sup>
but a low CF
<sup>- </sup>
contain directional information and could be binding sites for core promoter selectivity factors. In contrast, in both human promoter sets, we observed a significant number of 8-mers that peak on both strands (Figure
<xref ref-type="fig" rid="F3">3b,c</xref>
), and few that preferentially peak on one strand (as shown below, these are predominantly TATA and INR-like sequences). While the human DBTSS dataset contains a greater number of DMs than does the UCSC dataset, both sets are clearly more biased toward NDM than is the
<italic>Drosophila </italic>
dataset. These data suggest that there is a significant difference in the sequence organization of promoters between these human and
<italic>Drosophila </italic>
datasets.</p>
</sec>
<sec>
<title>
<italic>Drosophila </italic>
and human 8-mers that peak are different</title>
<p>Are the motifs that peak in humans similar to the motifs that peak in
<italic>Drosophila</italic>
? To answer this, we directly compared the CF values for all 8-mers between human and
<italic>Drosophila </italic>
(Figure
<xref ref-type="fig" rid="F3">3d,e</xref>
). The majority of 8-mers with high CF values are different between the two species. In contrast, 8-mers with the largest CF values are common between the two human datasets (Figure
<xref ref-type="fig" rid="F3">3f</xref>
), lending confidence to the idea that the differences between the two species are real.</p>
</sec>
<sec>
<title>Fifteen DNA motifs that cluster in
<italic>Drosophila</italic>
</title>
<p>To determine the statistical significance of the CF
<sup>+ </sup>
values, we converted the CF
<sup>+ </sup>
into a probability term using the 8-mer frequencies observed in the 10,914
<italic>Drosophila </italic>
promoter dataset. The probability term,
<italic>P</italic>
, represents -log
<sub>10</sub>
(1 -
<italic>p</italic>
), where
<italic>p </italic>
is the area under the normalized curve of the distribution of CF
<sub>expt</sub>
. A high
<italic>P </italic>
value indicates that it is very unlikely that the peak for the 8-mer occurs by chance. A plot of the
<italic>P </italic>
values versus the most populated bin number (Figure
<xref ref-type="fig" rid="F4">4a</xref>
) shows a group of 8-mers near the TSS whose distributions are very unlikely to occur by chance. We analyzed the 298 8-mers that have a
<italic>P </italic>
value ≥ 16. All these 8-mers had peaks centered between -100 bp and +40 bp. As illustrated in Figure
<xref ref-type="fig" rid="F4">4a</xref>
,
<italic>P </italic>
≥ 16 is a conservative cutoff. We plotted CF
<sup>+ </sup>
versus CF
<sup>- </sup>
for these 298 sequences to examine their strand specific localization (Figure
<xref ref-type="fig" rid="F4">4b</xref>
). DMs (black circles) predominate, but NDMs (red circles) were also identified.</p>
<p>The 298 8-mer sequences were manually grouped into 15 families and a consensus motif was determined for each family (Figure
<xref ref-type="fig" rid="F5">5</xref>
). The placement of an 8-mer into a particular motif was guided by: the similarity amongst DNA sequences; the shape of the distribution histogram; the peak position relative to the TSS; and whether the 8-mer was directional or non-directional. The total number of 8-mers in each of the 15 motifs varied dramatically, with over one-third of the 298 8-mers representing variations of the INR motif (TCAGTY) and 8 motifs were represented by 5 or fewer 8-mers. We determined the abundance of the 15 motifs by counting unique promoters that contained a motif in the peak (Figure
<xref ref-type="fig" rid="F4">4c</xref>
). A total of 6,067 promoters contain one or more of the 15 motifs. The most abundant motif is the non-directional DRE, found in 15% (1,593) of
<italic>Drosophila </italic>
promoters, followed by directional INR, found in 14% (1,501) of promoters. The least abundant motif identified, DMp5, is found in 0.7% (80) of all promoters.</p>
<p>Figure
<xref ref-type="fig" rid="F6">6</xref>
presents the distribution of each of the 15 consensus motifs, showing the number of occurrences on each DNA strand. To gain more insight into how constrained motif position is relative to the TSS, we examined the distribution of the 15 DNA motifs at a single base-pair resolution. The inserts in Figure
<xref ref-type="fig" rid="F6">6</xref>
show the single base-pair distribution plots for the motifs in the region -100 to +100 relative to the TSS. Five of the DMs (Figure
<xref ref-type="fig" rid="F6">6a-e</xref>
) are positioned at a single base-pair resolution relative to the TSS while the other five DMs (Figure
<xref ref-type="fig" rid="F6">6f-j</xref>
) and the five NDMs (Figure
<xref ref-type="fig" rid="F6">6k-o</xref>
) are spread across a broad region of up to 50 bp, though they all clustered near the TSS. We thus classified the DMs as either precise or variably positioned. The DMs are named DMp1 to 5 (for directional motif precise) and DMv1 to 5 (for directional motif variable). The NDMs are named NDM1 to 5. Where a motif has a previous common name we use that name, for example, DMp1 is TATA, DMp2 is INR, DMp4 and DMp5 are DPE-like, NDM1 is GAGA and NDM4 is downstream responsive element (DRE). The single base-pair resolution plots not only reveal the precise versus variable positioning of the motifs, they also reveal the power of the initial analysis based on 20 bp bins. Many of the motifs (DMvs and NDMs) would not have been identified at a single base-pair resolution. Also, the number of promoters identified that contain a specific motif is much greater at a 20 bp resolution than a 1 bp resolution (for example, for INR there are approximately 1,500 versus approximately 400).</p>
<p>To further examine the localization of DNA sequences at a single base-pair resolution, we examined the CF
<sup>+ </sup>
values of all 6-mers for both
<italic>Drosophila </italic>
and human promoters (Figure
<xref ref-type="fig" rid="F7">7</xref>
). We chose 6-mers to produce enough occurrences at each base pair position to be able to determine peaks reliably. The
<italic>Drosophila </italic>
data (Figure
<xref ref-type="fig" rid="F7">7a</xref>
) showed three distinct regions in which individual 6-mers were preferentially localized. Examination of the DNA sequences that cluster around each of these three positions indicated they can be grouped into a single motif that is localized at a specific base-pair position relative to the TSS. The three motifs are TATA, INR and DPE. Where promoters have two of these motifs, they are precisely positioned relative to each other (Figure
<xref ref-type="fig" rid="F7">7d</xref>
).</p>
<p>The clustering of 6-mers at a single base-pair resolution in the UCSC human promoters showed generally lower CF
<sup>+ </sup>
values and only two peaks corresponding to the TATA and INR positions (Figure
<xref ref-type="fig" rid="F7">7b</xref>
). While the DBTSS dataset (Figure
<xref ref-type="fig" rid="F7">7c</xref>
) showed more pronounced peaks than the UCSC dataset, it still failed to show a clear DPE peak. Examination of the sequences localized under the main human (DBTSS) peaks produced a result similar to that seen form
<italic>Drosophila</italic>
. The sequences lying under the TATA peak were exclusively TATA-like sequences. The sequences under the INR peak represented INR variants localized exactly at the TSS and other NDMs, predominantly erythroblast transformation specific (ETS), localized close to the TSS. However, the variety of INR sequences that localized in the human dataset was greater than that seen for the
<italic>Drosophila </italic>
data. Attempts to identify distinct human INR motifs six nucleotides or greater were unsuccessful due to the wide degeneracy in sequences that surround the prominent central CA core.</p>
</sec>
<sec>
<title>Comparison of
<italic>Drosophila </italic>
and human motifs that peak</title>
<p>We examined if motifs that peak in
<italic>Drosophila </italic>
also peak in human and vice-versa. Of the 15
<italic>Drosophila </italic>
motifs that peaked, four also localized in human promoters (TATA, INR, DPE1 and NDM2; Figure
<xref ref-type="fig" rid="F8">8a,b,d,l</xref>
) with INR, DPE1 and NDM2 occurring at much lower frequency in human promoters. While both the human and
<italic>Drosophila </italic>
promoters showed a clear overabundance of the CA dimer at the TSS (Figure
<xref ref-type="fig" rid="F1">1d</xref>
), we were previously [
<xref ref-type="bibr" rid="B11">11</xref>
] unable to detect an INR signal in human promoters using the degenerate human consensus sequence (YYANWYY). However, mapping the
<italic>Drosophila </italic>
INR motif (TCAGTY) to human promoters does produce a weak peak at the TSS in the UCSC dataset and a more pronounced peak in the DBTSS dataset (Figure
<xref ref-type="fig" rid="F8">8b</xref>
). Analysis of this peak at a 1 bp resolution (Figure
<xref ref-type="fig" rid="F8">8x</xref>
) revealed that both human datasets contain significantly fewer of these precisely positioned elements than does the
<italic>Drosophila </italic>
dataset. This result suggests that this TCAGTY motif plays a less significant role in human gene transcription than it does in
<italic>Drosophila</italic>
, and agrees with previous findings that the human INR is more degenerate than its
<italic>Drosophila </italic>
counterpart. It should be noted that in all cases, the motifs that contained a peak in one human dataset also showed peaks in the other human dataset, although the DBTSS dataset showed more pronounced peaks. This confirms both the qualitative similarity of the two datasets and the suggestion that the DBTSS data contains greater numbers of accurately positioned TSSs. Of the eight motifs previously identified to abundantly peak in humans [
<xref ref-type="bibr" rid="B11">11</xref>
], only TATA also peaked in
<italic>Drosophila </italic>
promoters (Figure
<xref ref-type="fig" rid="F9">9</xref>
).</p>
<p>In comparing the distributions of the
<italic>Drosophila </italic>
and human motifs, it is apparent that some sequences, even when they occur outside of the peak, display different abundances for the two organisms. This is true for DRE (Figure
<xref ref-type="fig" rid="F8">8n</xref>
), which peaks in
<italic>Drosophila </italic>
but is also a highly abundant motif outside of the peak (total of 7,058 across 1,500 bp of 10,914 promoters). In humans, there is no indication of any clustering, and this element is also very rare (total of 1,015 across 1,500 bp of 15,011 promoters). The reciprocal observation is made for human promoters, where SP1 (Figure
<xref ref-type="fig" rid="F9">9h</xref>
) is characterized by a very large peak and is also abundant outside of the peak but is virtually absent from
<italic>Drosophila </italic>
core promoters. In contrast, the INR (Figure
<xref ref-type="fig" rid="F8">8b</xref>
), which peaks in both organisms, albeit on different scales, shows very similar total abundance in both organisms (a total of 17,377 and 20,320 occurrences across 1,500 bp, in 10,914 and 15,011 promoters, for
<italic>Drosophila </italic>
and human, respectively).</p>
</sec>
<sec>
<title>E-box motifs that peak in both
<italic>Drosophila </italic>
and humans</title>
<p>NDM5 (CAGCTSWW) is a derivative of the general DNA sequence termed an E-box (CANNTG) that is bound by B-HLH-ZIP transcription factors, including the oncogene Myc|Max. A recent paper [
<xref ref-type="bibr" rid="B18">18</xref>
] has shown that an E-box sequence is located near the TSS of
<italic>Drosophila </italic>
genes. The sequence CACGTG is the core of the upstream stimulatory factor (USF) sequence previously identified in humans to peak near the TSS [
<xref ref-type="bibr" rid="B11">11</xref>
]. We compared the distribution of these related sequences in
<italic>Drosophila </italic>
and human. The USF consensus sequence (TCACGTGR) does not show any clustering in
<italic>Drosophila </italic>
(Figure
<xref ref-type="fig" rid="F9">9b</xref>
). However, the 6-mer E-box variants CACGTG and CAGCTG have peaks in both human and
<italic>Drosophila </italic>
promoters (Figure
<xref ref-type="fig" rid="F10">10a,b</xref>
). In
<italic>Drosophila</italic>
, the sequence CACGTG peaks downstream of the TSS while in human it peaks upstream of the TSS. The E-box variant CAGCTG peaks in both human and
<italic>Drosophila </italic>
just upstream of the TSS. Figures
<xref ref-type="fig" rid="F9">9c,d</xref>
highlight two E-box 8-mer variants with dramatically different peaking properties where sequences outside a conserved 6-mer define the peaking properties of the 8-mer. The sequence RCACGTCY peaks only in
<italic>Drosophila </italic>
while YCACGTGR peaks only in human, suggesting that distinct B-HLH proteins bind these related sequences.</p>
</sec>
<sec>
<title>Correlation of different DNA motifs in the same promoter</title>
<p>We examined correlations in the occurrence of the 15 peaking motifs in
<italic>Drosophila </italic>
to gain insight into their potential combinatorial or redundant function. Table
<xref ref-type="table" rid="T1">1</xref>
presents a matrix showing: the number of promoters that contain one motif in a peak that also contain a second motif in a peak (a); the frequency of this co-occurrence (b); and the probability (c). There is a complex pattern of positive and negative correlation for individual motifs, suggesting that combinations of motifs act to regulate core promoter function.</p>
<p>For the precisely positioned directional motifs (DMp1 to 5: TATA, INR, INR1, DPE, and DPE1), promoters that contain INR also preferentially contain either the TATA or DPE sequence. However, TATA and DPE motifs negatively correlate. All five members of the DMp class negatively correlate with some or all of the DMv class. DMp1 to 5 positively correlate with three of the NDMs (NDM1 to 3) but negatively correlate with NDM4 and NDM5.</p>
<p>The five variably positioned directional motifs (DMv1 to 5) have both positive and negative correlations amongst themselves and with the NDMs. The DMv class members positively correlate with NDM4 and NDM5 and negatively correlate with NDM1 to 3, correlations that are exactly the opposite of those observed for the DMp class (see above). On average, members of the NDM class positively correlate with each other. Positive correlations between motifs suggest the possibility of physical interactions between the proteins that bind the co-occurring DNA motifs. Negative correlations, as are observed between the precisely positioned DMs (DMp) and the variably positioned DMs (DMv), suggest that the proteins that bind them have distinct functions.</p>
</sec>
<sec>
<title>Consensus DNA motifs correlate with biological function</title>
<p>The non-random distribution of individual motifs and motif combinations at core promoters strongly suggests that the identified motifs are biologically significant and promoters that share the same motif in a peak may also share similar biological functions. To evaluate this possibility, we calculated statistical over- and under-representation of 5,200 Gene Ontology (GO) annotation terms [
<xref ref-type="bibr" rid="B19">19</xref>
] for
<italic>Drosophila </italic>
genes whose promoters contained any of the 15 motifs, either within the peak or elsewhere in the promoter region. We found highly significant correlations (
<italic>p </italic>
< 10
<sup>-4</sup>
) for each motif only when they occurred in the peak (Figure
<xref ref-type="fig" rid="F11">11a</xref>
). With one exception, the simple presence elsewhere within the 1,500 bp promoter region does not correlate with GO terms, demonstrating that the position of a motif in the promoter is critical for predicting biological function, as was observed in human promoters [
<xref ref-type="bibr" rid="B11">11</xref>
]. The directional positioned motifs, DMp and DMv, not only co-occur in promoters with either NDM1 to 3 or NDM4 and NDM5, respectively, but also correlate with similar GO terms. This indicates a combinatorial code of motifs at core promoters directing batteries of genes.</p>
<p>Additional insight can be inferred by examining individual GO terms that correlate. For example,
<italic>Drosophila </italic>
mitochondrial ribosomal genes contain the E-box (
<italic>p </italic>
< 10
<sup>-8</sup>
). In contrast, promoters of human mitochondrial ribosomal genes contain the ETS motif, a motif that peaks in human but not in
<italic>Drosophila</italic>
. Thus, even though the mitochondrial ribosomal genes are highly conserved, their regulation is evolving.</p>
<p>If core promoter motifs are used to drive the expression of gene batteries participating in a common biological process, this should be evident in global gene expression profiles. We turned to
<italic>Drosophila </italic>
mRNA expression patterns determined by micoarray experiments [
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
] to evaluate whether genes that are co-expressed have the same motif in their promoters. Figure
<xref ref-type="fig" rid="F11">11a</xref>
shows correlations between all 15 motifs, either in the peak or elsewhere in the promoter region, and gene expression in testis (male germline), ovary (female germline), and soma. The presence of TATA in the peak in the promoter positively correlates with gene expression in somatic tissue but negatively correlates with expression in germline tissue. The presence of positioned DMv3 to 5, and DRE in promoters positively correlates with female germline expression and negatively correlates with male germline expression. If the motif occurs outside the peak, few correlations are observed, supporting the conclusion that motif position is functionally important.</p>
<p>We see more striking correlations between promoter motifs and mRNA expression in the embryonic and adult stages of
<italic>Drosophila </italic>
development that express different sets of genes. Figure
<xref ref-type="fig" rid="F11">11b</xref>
presents a hierarchal clustering of mRNA expression for 89 samples from a survey of gene expression in embryos and adults for promoters containing any of the 15 motifs (either in or outside the peak). Genes with motifs in the peak show strong mRNA expression differences between embryo and adult samples, suggesting that these motifs help direct the differential utilization of the genome between embryos and adult. Genes with promoters containing DMv1 to 5 and co-occurring NDM4 and NDM5 are preferentially active in the embryo. In contrast, genes with promoters containing the three abundant precisely positioned directional motifs (TATA, INR, and DPE) and the co-occurring NDM1 to 3 are preferentially active in the adult.</p>
</sec>
<sec>
<title>INR derivatives</title>
<p>Both
<italic>Drosophila </italic>
and human promoters have a CA peak exactly at the TSS in a significant number of promoters. About 2,100
<italic>Drosophila </italic>
promoters contain the CA sequence at the TSS but only 400 of these are part of the consensus INR sequence (TCAGTY). We examined the remaining promoter sequences for related INR sequences and identified 4 more motifs, resulting in 1,080 promoters with INR related sequences exactly positioned at the TSS. To evaluate if these INR related sequences correlate with distinct functions or are variants of a single motif, we investigated the correlation of the INR variants with different biological properties by examining GO terms and mRNA expression properties. Figure
<xref ref-type="fig" rid="F12">12a</xref>
shows that the variant INR motifs have distinct patterns of enrichment with categories of GO terms. Similarly, the developmental mRNA expression analysis (Figure
<xref ref-type="fig" rid="F12">12b</xref>
) indicates that one of the INR motif variants (BCACWS) is preferentially associated with genes with embryonic expression while the other variants are preferentially associated with adult expression genes. While some of the GO categories enriched for specific INR variants (for example, mesoderm development) appear at odds with the adult/embryo expression patterns, the overall impression suggests that these variant INR sequences are functionally distinct and may be recognized by distinct proteins. The discrepancies between the GO term enrichment and adult/embryo expression patterns can be explained if one assumes that the preferential use of INR signals is not absolute. Thus, even though there is a general trend toward preferential use of different elements at different stages in development, certain genes may use the 'adult INRs' during embryogenesis.</p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>We have determined the localization of all 8-mers in 10,914
<italic>Drosophila </italic>
and two sets of human promoters (UCSC, 15,011 promoters; DBTSS, 12,926 promoters) aligned relative to the TSS and have identified DNA motifs that are non-randomly distributed in each dataset. Though we examined the region between -1,000 bp and +499 bp, all peaks are within 100 bp of the TSS. Two dramatic differences are observed between
<italic>Drosophila </italic>
and human promoters. First, there is little overlap in the DNA motifs that localize in the promoters of these two species. Second, of the 15 motifs identified in
<italic>Drosophila </italic>
promoters, 10 are directional DNA motifs (DNA sequences that occur on the positive but not the negative strand of DNA), while in human, promoters TATA, INR and DPE1 are the only DMs. We suggest that these DMs may be binding sites for core promoter selectivity factors [
<xref ref-type="bibr" rid="B5">5</xref>
]. While there is little overlap between motifs identified in
<italic>Drosophila </italic>
and human, both organisms contain identifiable TATA and INR core promoter elements, with humans having only a barely discernable DPE element. The identification of common elements in both species indicates a fundamental similarity in core promoter organization, as would be expected because the proteins that bind these sequences are conserved in both species.</p>
<p>A comparison of the promoter structures of two organisms depends on the quality of the data being analyzed. In an attempt to ensure that our results were not biased by differences in the quality of annotation of the TSS of the
<italic>Drosophila </italic>
and human genomes, we have analyzed three datasets. We used the annotation from the UCSC Genome Browser for both
<italic>Drosophila </italic>
and human to construct a dataset of promoters that represents the standard view of these genomes. Additionally, we have constructed a set of promoters based on annotations from the human DBTSS [
<xref ref-type="bibr" rid="B22">22</xref>
], a database specifically aimed at correctly identifying the TSS through the use of full-length cDNA cloning methods. As shown in Figure
<xref ref-type="fig" rid="F1">1d</xref>
, all three datasets show distinct CA peaks at the TSS, with the
<italic>Drosophila </italic>
peak being intermediate in amplitude between the two human datasets. The qualitative similarity of the findings of the two human datasets suggests that the differences we observe between the
<italic>Drosophila </italic>
and human promoters are not due to differences in the quality of the underlying datasets. Additionally, the fact that both
<italic>Drosophila </italic>
and human datasets are sufficiently aligned with respect to the TSS is exemplified by our ability to readily identify over-represented, localized 8-mers in all datasets. We note that our technique is aimed at finding abundant over-represented, localized motifs that have a low degree of degeneracy. Thus, our inability to find a given motif in an organism could indicate one of four possibilities: the motif is absent; the motif is present in low abundance; the motif is present but is highly degenerate; or the motif is present but not significantly constrained with respect to its position relative to the TSS.</p>
<p>Previous work has addressed the DNA sequence of
<italic>Drosophila </italic>
promoters. However, these studies have either examined a limited number of promoters or did not examine the position of motifs relative to the TSS. Kutach and Kadonaga [
<xref ref-type="bibr" rid="B23">23</xref>
] examined a set of 200
<italic>Drosophila </italic>
promoters and identified four types of promoters characterized by containing TATA only (29%), DPE only (26%), TATA + DPE (14%), or neither DNA motif (31%). Our global analysis looks at a much larger set of
<italic>Drosophila </italic>
promoters and finds a lower proportion of genes with these sequences. Instead of 60% of promoters containing a TATA motif, we find only 4.7% and, instead of 40% of promoters containing a DPE motif, we find only 2.1% of promoters that contain these motifs. Kutach and Kadonaga [
<xref ref-type="bibr" rid="B23">23</xref>
] used a less stringent criterion to define the motifs and it is also possible that the 200 promoters examined were biased towards TATA and DPE. They observed a conserved distance between the INR and DPE motifs and experimentally demonstrated that the conserved distance is critical for optimal function. This conserved distance is confirmed in our global analysis.</p>
<p>Another analysis of 2,000
<italic>Drosophila </italic>
promoters identified 10 motifs that are conserved near the TSS [
<xref ref-type="bibr" rid="B10">10</xref>
]; we identified 15 motifs, including 9 of the 10 identified by Ohler
<italic>et al</italic>
. The motif that did not peak in our analysis is motif ten element (MTE), a downstream element important for initiation [
<xref ref-type="bibr" rid="B24">24</xref>
]. Our global analysis extends this analysis of 2,000 promoters. We show that many of the identified DNA motifs occur on only one strand of DNA and are uniquely positioned relative to the TSS. Furthermore, the DNA sequences that peak in
<italic>Drosophila </italic>
are different from the DNA sequences that peak in human promoters.</p>
<sec>
<title>Variably positioned directional motifs may be bound by core promoter selectivity factors</title>
<p>There has been little systematic analysis of
<italic>Drosophila </italic>
promoter function as it relates to regulation versus basal activity. One potential mechanism of regulated gene expression is for the RNA polymerase II complex to use different components in different promoters. This system is used in prokaryotic cells where sigma factors bind different DNA sequences that are part of the polymerase binding site and consequently regulate different sets of genes. Such factors in eukaryotic systems are termed core promoter selectivity factors [
<xref ref-type="bibr" rid="B5">5</xref>
]. Several properties might be expected for DNA motifs bound by core promoter selectivity factors: they occur on one strand of DNA, thus providing directional information to polymerase; they are precisely positioned relative to the TSS; binding sites for different core promoter selectively factors negatively correlate with each other in the same promoter; and the motifs should positively correlate with genes with a similar function. The precisely positioned DMp1 to 5 display all four characteristics while the variably positioned DMv1 to 5 match all criteria except that they are not uniquely positioned. Biochemical studies have already identified the DMp1 to 5 motifs as core promoter motifs (TATA, INR, DPE). We suggest that DMv1 to 5 may also be core promoter motifs that function independently of the DMp1 to 5 motifs. The DMv motifs are preferentially used in the embryo while the DMp motifs are used in the adult, consistent with an earlier suggestion that the mechanism of gene expression is different in the embryo than in the adult [
<xref ref-type="bibr" rid="B21">21</xref>
]. The DMv class of motif is not observed in humans and has not been studied biochemically.</p>
<p>When examining all aligned promoters, the most distinct feature is the TSS, which is observed even when we examine the distribution of the four mono-nucleotides at a single base-pair resolution. The CA dinucleotide sequence has a peak exactly at the TSS containing approximately 2,100 members of which approximately 1,400 members are above background. Of these, only 29% have the INR consensus TCAGTY. We defined four additional variant INR motifs that represent another 35% of the CA dinucleotides, indicating that two-thirds of the CA dinucleotides at the TSS are INR or variant motifs. In theory, the INR variants might all have the same general function. However, these variant INR motifs have distinct and nearly non-overlapping enrichments with specific GO terms. Furthermore, genes with one INR variant (BCACWS) are preferentially expressed in the embryo, instead of the adult. These associations with GO terms and different expression patterns demonstrate that variant INR motifs are biologically distinct and suggest they may be bound by different proteins or modified proteins in addition to the proteins known to bind the consensus INR (for example, RNA polymerase, TFIID, TBP250, and TFII-I [
<xref ref-type="bibr" rid="B1">1</xref>
]). It will be interesting to experimentally determine whether known INR binding proteins have different affinities for the five INR variants.</p>
</sec>
<sec>
<title>Gene regulation in
<italic>Drosophila </italic>
and humans</title>
<p>Two observations suggest that
<italic>Drosophila </italic>
and human promoters use different mechanisms to regulate gene expression. First, they have a different frequency and distribution of mononucleotides in promoters. This distribution correlates with nucleosome positioning. Second,
<italic>Drosophila </italic>
promoters have a large number of DMs near the TSS while they are nearly absent from human promoters.</p>
<p>
<italic>Drosophila </italic>
promoters are A and T rich with a peak of A and T dinucleotides between -200 bp and the TSS (Figure
<xref ref-type="fig" rid="F1">1</xref>
), a region that experimentally is known to be nucleosome free, particularly for active genes [
<xref ref-type="bibr" rid="B25">25</xref>
]. A similar correlation is observed in the yeast genome where the promoter regions between -200 and the TSS are A and T rich and devoid of nucleosomes [
<xref ref-type="bibr" rid="B26">26</xref>
]. In
<italic>Drosophila</italic>
, the transcription factors that bind NDM1 to 5 bind in this nucleosome free region and could interact with the pre-initiation complex composed of RNA polymerase and proteins that bind the DMs (DMp or DMv) that are critical for defining the TSS. This model of promoter organization has an appealing simplicity. The promoter region is accessible and is regulated by complex interactions between proteins that bind different DNA sequences; NDMs in the core promoter, DMs that act as core promoter selectivity elements, and distant enhancers.</p>
<p>In humans, the core promoter is different so the above model does not apply. There is no nucleosome free region observed in promoters [
<xref ref-type="bibr" rid="B27">27</xref>
] and this is consistent with a valley in A and T distribution at the TSS. Upstream of the TSS are NDMs, binding sites for transcription factors that recruit cofactors involved in chromatin remodeling. A simple image is that chromatin remodeling displaces the nucleosome over the TSS, leaving naked DNA that is the signal for polymerase initiation. This model would explain the absence of DMs in human promoters. The core promoter elements are more degenerate in human, suggesting that the energy for binding of the general transcriptional machinery comes from more global architectural features of the promoter.</p>
<p>Perhaps the differences in
<italic>Drosophila </italic>
and human promoter architecture reflect a solution to the over 10-fold larger size of the human genome (2.9 × 10
<sup>9 </sup>
bp) compared to the
<italic>Drosophila </italic>
genome (1.8 × 10
<sup>8 </sup>
bp). It has been suggested that repression of inappropriate gene expression is more important as a genome becomes larger [
<xref ref-type="bibr" rid="B28">28</xref>
]. Thus, it may be that the critical step in human gene regulation is relieving repression by displacing the nucleosome over the TSS while in
<italic>Drosophila </italic>
it is the assembly of the components that bind specifically to the DNA motifs in the promoter. This may also help explain the evolution in vertebrates of a G and C rich region over the TSS that contains CpG islands that can be repressed by methylation [
<xref ref-type="bibr" rid="B29">29</xref>
]. Such methylation is greatly reduced in
<italic>Drosophila</italic>
.</p>
</sec>
<sec>
<title>Core promoter structure evolves rapidly</title>
<p>The only DNA motifs that peak in
<italic>Drosophila </italic>
and human promoters are TATA, INR, DPE, NDM2, and the E-box. Conservation of motifs might be expected to occur in highly conserved genes, thus we examined whether the evolutionarily conserved mitochondrial ribosomal genes that function in a large multi-protein complex had similar DNA motifs in
<italic>Drosophila </italic>
and human promoters. The ETS motif is found in the promoters of human [
<xref ref-type="bibr" rid="B11">11</xref>
] and other mammalian mitochondrial ribosomal genes [
<xref ref-type="bibr" rid="B30">30</xref>
]. In
<italic>Drosophila</italic>
, the ETS motif does not occur in these promoters, even though the ETS protein is present in the
<italic>Drosophila </italic>
genome. In contrast, the E-box sequence clusters in
<italic>Drosophila </italic>
mitochondrial ribosomal genes. This highlights the observation that even for genes that are conserved over a long evolutionary time, the DNA motifs that regulate them are not always conserved. Similarly, there is a fast turnover of DNA sequences controlling the expression of ribosomal protein genes in different species of yeast [
<xref ref-type="bibr" rid="B31">31</xref>
] and the recent genome wide comparison of human and chimpanzee showed that regulatory sequences were the most rapidly evolving part of the genome [
<xref ref-type="bibr" rid="B32">32</xref>
].</p>
<p>The failure to find similar positioned motifs in human and
<italic>Drosophila </italic>
would be trivial if the DNA binding proteins were absent in one of the species. This does not appear to be the case. In many cases where DNA motifs peak in human promoters but not in
<italic>Drosophila </italic>
promoters, the proteins that bind them are present in
<italic>Drosophila</italic>
. For example, the CRE motif peaks in human but not in
<italic>Drosophila </italic>
promoters. However, CREB and other B-ZIP proteins that bind the CRE sequence (5'-TGACGTCA-3') are conserved between the two species [
<xref ref-type="bibr" rid="B33">33</xref>
] and genetic mutation of these loci produce dramatic phenotypes, demonstrating their functional importance. This suggests either that the signaling and transcriptional pathways are operating but are not regulating enough genes to produce a peak in the distribution, or the transcription factors can function at a variable distance from the TSS, or the motifs are so highly degenerate that they do not produce an identifiable signature. As more genomes are sequenced and DNA motifs identified that peak in promoters, it will become more obvious how transcription factors are used in evolution to express coordinately regulated genes. Our data support the emerging notion that evolution of gene regulation underpins many of the differences between species. These changes in gene expression are mediated in part by sequences located very close to the TSS.</p>
</sec>
</sec>
<sec>
<title>Conclusion</title>
<p>We used the technique of determining the non-random distribution of DNA sequences to identify 298 8-mers with highly significant (
<italic>p </italic>
≤ 1 × 10
<sup>-16</sup>
) distribution patterns in a set of 10,914
<italic>D. melanogaster </italic>
promoters. These sequences were grouped into 15 unique motifs that were further classified into three families: precisely positioned DMs (DMp1 to 5); variably positioned DMs (DMv1 to 5); and NDM1 to 5. Correlations between GO annotation and mRNA expression patterns suggest that these different motifs play different functional roles. Additionally, we suggest that the DMs may be binding sites for core promoter selectivity factors in
<italic>Drosophila</italic>
. A comparison of the promoter regions of
<italic>Drosophila </italic>
and human revealed two characteristics that suggest that they use different mechanisms to regulate gene expression. First, the frequency and distribution of mononucleotides in
<italic>Drosophila </italic>
and human promoters are markedly different. Second, we have identified a large number of DMs near the TSS of
<italic>Drosophila </italic>
while the only identifiable DMs in human promoters are TATA, INR, and DPE. Thus, these data support the emerging notion that evolution of gene regulation underpins many of the differences between species.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title>Dataset generation</title>
<p>Genomic DNA sequence and gene annotation data for
<italic>Drosophila </italic>
(Jan 2003, dm1), human (May 2004, hg17) were downloaded from the UCSC Genome Browser site [
<xref ref-type="bibr" rid="B13">13</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
]. For each organism a dataset was generated that contained only those RefSeq genes that had a unique transcription start site and at least 10 bp separating the TSS and the translation start site (ATG). When multiple RefSeq entries were identified as being identical by
<italic>blastclust </italic>
[
<xref ref-type="bibr" rid="B35">35</xref>
], a single entry was used to represent that region. While frequently ignored (masked) in promoter analyses, we have not excluded repetitive sequences in this study. For each entry the 1,500 bp corresponding to the region -1,001 to +499, relative to the TSS, was extracted from the genomic sequence data and subjected to the analyses describe in this manuscript. The total number of promoters represented in each dataset was 10,914 for
<italic>Drosophila </italic>
and 15,011 for human (UCSC). In addition, a second human dataset was prepared using the DBTSS annotations [
<xref ref-type="bibr" rid="B14">14</xref>
], and hg17 sequence data. A 1,500 bp promoter dataset was generated for the 5'-most TSS of each DBTSS annotated gene cluster. Entries that had an annotated ATG, translation start site, within 30 bp of the TSS were rejected. The resulting human (DBTSS) dataset contains 12,926 promoters.</p>
</sec>
<sec>
<title>Analysis</title>
<p>The datasets were queried with the programs
<italic>fuzznuc </italic>
from the EMBOSS suite of software [
<xref ref-type="bibr" rid="B36">36</xref>
] or
<italic>tacg </italic>
[
<xref ref-type="bibr" rid="B37">37</xref>
] to locate the occurrence and position of different DNA sequence motifs.</p>
</sec>
<sec>
<title>8-mer/6-mer analysis</title>
<p>The raw data generated by
<italic>tacg </italic>
was processed by a combination of scripts and programs to generate the final binned distribution for each 8-mer/6-mer. To analyze the data, we divided the 1,500 bp into 75 bins, with each bin containing 20 bp. For the dataset -1,000 bp to +499 bp the numbering for bin 1 is -1,000 bp to -981; thus, bin 51 is from +1 bp to +20 bp. We determined the number of times a particular DNA sequence occurred in each 20 bp bin. The
<italic>Drosophila </italic>
distribution pattern for each 8-mer along with the identity of the promoters containing each 8-mer is available [
<xref ref-type="bibr" rid="B38">38</xref>
].</p>
</sec>
<sec>
<title>Clustering factor calculation</title>
<p>To determine if a DNA sequence forms a peak in its distribution (that is, clustered), we used an automated method of detecting and quantifying peak height. For the 75 bins in each frequency distribution a mean (
<inline-graphic xlink:href="gb-2006-7-7-r53-i1.gif"></inline-graphic>
) and standard deviation (
<italic>σ</italic>
) were determined. Those points, which were ≥2 standard deviations above the mean, were considered to be part of the peak and a new mean (
<inline-graphic xlink:href="gb-2006-7-7-r53-i2.gif"></inline-graphic>
) and standard deviation (
<italic>σ</italic>
') were calculated excluding these points. The CF was then calculated based on the maximum bin value (
<italic>x</italic>
<sub>
<italic>max</italic>
</sub>
) and the corrected mean and standard deviation:</p>
<p>
<inline-graphic xlink:href="gb-2006-7-7-r53-i3.gif"></inline-graphic>
</p>
</sec>
<sec>
<title>Calculation of
<italic>P </italic>
value for distribution</title>
<p>To evaluate the probability that the clustering results were obtained by chance, we converted the CF values into probability terms based on the analysis of the occurrence of each 8-mer in 1,000 random datasets as described previously [
<xref ref-type="bibr" rid="B11">11</xref>
]. We generated 1,000 random datasets, each containing 10,914 sequences 1,500 bp long, using the 8-mer frequencies observed in the original Drosophila dataset. Finally, we calculated the probability term,
<italic>P</italic>
, that represents -log
<sub>10</sub>
(1 -
<italic>p</italic>
), where
<italic>p </italic>
is the area that lies under the normalized curve of the distribution of CF
<sub>expt</sub>
. Thus, the greater the P value the more unlikely it is that the result could occur by chance.</p>
<p>The clustering and graphing of the data were performed using the programs Microsoft Excel and/or Grace [
<xref ref-type="bibr" rid="B39">39</xref>
].</p>
</sec>
<sec>
<title>Sequence logos</title>
<p>Graphical representations of the 15
<italic>Drosophila </italic>
motifs, in the form of sequence logos [
<xref ref-type="bibr" rid="B40">40</xref>
], were generated using the WebLogo software [
<xref ref-type="bibr" rid="B41">41</xref>
].</p>
</sec>
<sec>
<title>Calculation of
<italic>P </italic>
value for subsets in a set</title>
<p>To determine the significance of the numbers presented in Table
<xref ref-type="table" rid="T1">1</xref>
, we calculated two-tailed normalized cumulative probability (
<italic>P </italic>
value) that the numbers were greater (or less) than expected by random chance. The number of possible associations of
<italic>s</italic>
<sub>
<italic>2 </italic>
</sub>
elements out of
<italic>S </italic>
elements is:</p>
<p>
<inline-graphic xlink:href="gb-2006-7-7-r53-i4.gif"></inline-graphic>
</p>
<p>the number of combinations when subsets
<italic>s</italic>
<sub>
<italic>1 </italic>
</sub>
and
<italic>s</italic>
<sub>
<italic>2</italic>
</sub>
have
<italic>m </italic>
members in common:</p>
<p>
<inline-graphic xlink:href="gb-2006-7-7-r53-i5.gif"></inline-graphic>
</p>
<p>and the probability of having
<italic>m </italic>
members in the intersection</p>
<p>
<inline-graphic xlink:href="gb-2006-7-7-r53-i6.gif"></inline-graphic>
</p>
<p>where
<inline-graphic xlink:href="gb-2006-7-7-r53-i7.gif"></inline-graphic>
is combinatorial combination.</p>
<p>The cumulative probability that the value
<italic>m</italic>
* is greater or less than expected is, respectively,</p>
<p>
<inline-graphic xlink:href="gb-2006-7-7-r53-i8.gif"></inline-graphic>
</p>
<p>or</p>
<p>
<inline-graphic xlink:href="gb-2006-7-7-r53-i9.gif"></inline-graphic>
</p>
<p>where
<italic>m</italic>
<sub>
<italic>max </italic>
</sub>
= min (
<italic>s</italic>
<sub>1</sub>
,
<italic>s</italic>
<sub>2</sub>
). We doubled the result so cumulative probability varies in a range
<italic>0 </italic>
to
<italic>1</italic>
, and took the logarithm:</p>
<p>
<italic>P</italic>
(
<italic>m</italic>
*) = -
<italic>Log</italic>
<sub>10</sub>
(
<italic>I</italic>
)</p>
<p>The value of
<italic>P </italic>
indicates the statistical probability of numbers occurring by chance: the greater the number, the more statistically significant the result.</p>
</sec>
<sec>
<title>GO term analysis</title>
<p>Patterns in gene product functions for each promoter group were investigated using their assignments to GO terms. Each of the 4,192 GO terms in the
<italic>Drosophila </italic>
GO annotation, and their 1,008 parent GO terms (5,200 total), was analyzed. Gene group NM identifiers were matched up to Flybase identifiers by retrieving CG numbers from a batch GenBank search, and then matching up with FBgn identifiers through Flybase. GO assignments were retrieved from a flat file downloaded from the 'current annotations' page at the Gene Ontology website [
<xref ref-type="bibr" rid="B42">42</xref>
]. File update is dated 18 June 2005. Since our promoter analysis was based on an earlier annotation, the complete set of GO annotations was reduced to create a normalized reference. GO annotations for FlyBase identifiers that are not included in the original set of promoters were removed. Dependencies for each GO term were retrieved using the R package GOstats. The number of occurrences for each gene list matched up to a GO term of interest or its children were counted, and considered the observed value. The expected value was calculated as: (number of genes assigned to GO term and children/the number of genes in the entire normalized reference) × the number of genes in the group with the GO annotation. This expected value was used to calculate the O/E ratio. These values were used to convert
<italic>P </italic>
values to a positive or negative value to indicate correlation direction.
<italic>P </italic>
values were generated with a 2 × 2 matrix for each promoter group/GO term pair [
<xref ref-type="bibr" rid="B43">43</xref>
] with the fisher.test function in R [
<xref ref-type="bibr" rid="B44">44</xref>
].</p>
</sec>
<sec>
<title>mRNA expression correlation with motifs that peak</title>
<p>Promoter lists were correlated with mRNA expression patterns that vary by sex, developmental stage, and tissue by examining microarray results from previous publications [
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
]. Testis, ovary, and soma-biased expression were categorized by performing hierarchical clustering, generating gene lists that occur in the same self-organizing map (SOM) cluster [
<xref ref-type="bibr" rid="B20">20</xref>
]. Observed and expected representation of each promoter class and
<italic>P </italic>
values were calculated by 2 × 2 fisher exact test in a similar fashion to the GO term analysis described above. Adult and embryo expression patterns [
<xref ref-type="bibr" rid="B21">21</xref>
] were examined by calculating the median rank of all expression values in each sample, and performing hierarchical clustering. In each case, a standardized reference was created that corrected for differences in annotation and microarray platform.</p>
</sec>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>This study utilized the high-performance computational capabilities of the Beowulf PC/Linux cluster at the National Institutes of Health, Bethesda, MD. This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research, and the National Institute of Diabetes and Digestive and Kidney Diseases.</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smale</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Kadonaga</surname>
<given-names>JT</given-names>
</name>
</person-group>
<article-title>The RNA polymerase II core promoter.</article-title>
<source>Annu Rev Biochem</source>
<year>2003</year>
<volume>72</volume>
<fpage>449</fpage>
<lpage>479</lpage>
<pub-id pub-id-type="pmid">12651739</pub-id>
<pub-id pub-id-type="doi">10.1146/annurev.biochem.72.121801.161520</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Margolis</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Driks</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Losick</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Differentiation and the establishment of cell type during sporulation in Bacillus subtilis.</article-title>
<source>Curr Opin Genet Dev</source>
<year>1991</year>
<volume>1</volume>
<fpage>330</fpage>
<lpage>335</lpage>
<pub-id pub-id-type="pmid">1840889</pub-id>
<pub-id pub-id-type="doi">10.1016/S0959-437X(05)80296-5</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hiller</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Pringle</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Suchorolski</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sancak</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Viswanathan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bolival</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>TY</given-names>
</name>
<name>
<surname>Marino</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fuller</surname>
<given-names>MT</given-names>
</name>
</person-group>
<article-title>Testis-specific TAF homologs collaborate to control a tissue-specific transcription program.</article-title>
<source>Development</source>
<year>2004</year>
<volume>131</volume>
<fpage>5297</fpage>
<lpage>5308</lpage>
<pub-id pub-id-type="pmid">15456720</pub-id>
<pub-id pub-id-type="doi">10.1242/dev.01314</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kai</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Spradling</surname>
<given-names>AC</given-names>
</name>
</person-group>
<article-title>The expression profile of purified Drosophila germline stem cells.</article-title>
<source>Dev Biol</source>
<year>2005</year>
<volume>283</volume>
<fpage>486</fpage>
<lpage>502</lpage>
<pub-id pub-id-type="pmid">15927177</pub-id>
<pub-id pub-id-type="doi">10.1016/j.ydbio.2005.04.018</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hochheimer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tjian</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Diversified transcription initiation complexes expand promoter selectivity and tissue-specific gene expression.</article-title>
<source>Genes Dev</source>
<year>2003</year>
<volume>17</volume>
<fpage>1309</fpage>
<lpage>1320</lpage>
<pub-id pub-id-type="pmid">12782648</pub-id>
<pub-id pub-id-type="doi">10.1101/gad.1099903</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bielinska</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sturgill</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Oliver</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Core promoter sequences contribute to ovo-B regulation in the Drosophila melanogaster germline.</article-title>
<source>Genetics</source>
<year>2005</year>
<volume>169</volume>
<fpage>161</fpage>
<lpage>172</lpage>
<pub-id pub-id-type="pmid">15371353</pub-id>
<pub-id pub-id-type="doi">10.1534/genetics.104.033118</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Oliver</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Drosophila OVO regulates ovarian tumor transcription by binding unusually near the transcription start site.</article-title>
<source>Development</source>
<year>2001</year>
<volume>128</volume>
<fpage>1671</fpage>
<lpage>1686</lpage>
<pub-id pub-id-type="pmid">11290304</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruez</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Payre</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Vincent</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Transcriptional control of Drosophila bicoid by Serendipity delta: cooperative binding sites, promoter context, and co-evolution.</article-title>
<source>Mech Dev</source>
<year>1998</year>
<volume>78</volume>
<fpage>125</fpage>
<lpage>134</lpage>
<pub-id pub-id-type="pmid">9858707</pub-id>
<pub-id pub-id-type="doi">10.1016/S0925-4773(98)00159-2</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kaufmann</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hyland</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Renkawitz-Pohl</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>The initiator element of the Drosophila beta2 tubulin gene core promoter contributes to gene expression in vivo but is not required for male germ-cell specific expression.</article-title>
<source>Nucleic Acids Res</source>
<year>2000</year>
<volume>28</volume>
<fpage>1439</fpage>
<lpage>1446</lpage>
<pub-id pub-id-type="pmid">10684940</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/28.6.1439</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ohler</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Niemann</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>GM</given-names>
</name>
</person-group>
<article-title>Computational analysis of core promoters in the Drosophila genome.</article-title>
<source>Genome Biol</source>
<year>2002</year>
<volume>3</volume>
<fpage>RESEARCH0087</fpage>
<pub-id pub-id-type="pmid">12537576</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2002-3-12-research0087</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>FitzGerald</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Shlyakhtenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mir</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Vinson</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Clustering of DNA sequences in human promoters.</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<fpage>1562</fpage>
<lpage>1574</lpage>
<pub-id pub-id-type="pmid">15256515</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.1953904</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kulbokas</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Golub</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Mootha</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Lindblad-Toh</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Kellis</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals.</article-title>
<source>Nature</source>
<year>2005</year>
<volume>434</volume>
<fpage>338</fpage>
<lpage>345</lpage>
<pub-id pub-id-type="pmid">15735639</pub-id>
<pub-id pub-id-type="doi">10.1038/nature03441</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="other">
<article-title>UCSC Genome Browser Downloads</article-title>
<ext-link ext-link-type="uri" xlink:href="http://hgdownload.cse.ucsc.edu/downloads.html"></ext-link>
</citation>
</ref>
<ref id="B14">
<citation citation-type="other">
<article-title>DBTSS Downloads</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.hgc.jp/pub/hgc/db/dbtss/Yamashita_NAR/"></ext-link>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corden</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wasylyk</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Buchwalder</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sassone-Corsi</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Kedinger</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chambon</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Promoter sequences of eukaryotic protein-coding genes.</article-title>
<source>Science</source>
<year>1980</year>
<volume>209</volume>
<fpage>1406</fpage>
<lpage>1414</lpage>
<pub-id pub-id-type="pmid">6251548</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grosschedl</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Birnstiel</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>Identification of regulatory sequences in the prelude sequences of an H2A histone gene by the study of specific deletion mutants in vivo.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1980</year>
<volume>77</volume>
<fpage>1432</fpage>
<lpage>1436</lpage>
<pub-id pub-id-type="pmid">6929494</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.77.3.1432</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Adhya</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Axiom of determining transcription start points by RNA polymerase in Escherichia coli.</article-title>
<source>Mol Microbiol</source>
<year>2004</year>
<volume>54</volume>
<fpage>692</fpage>
<lpage>701</lpage>
<pub-id pub-id-type="pmid">15491360</pub-id>
<pub-id pub-id-type="doi">10.1111/j.1365-2958.2004.04318.x</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hulf</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bellosta</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Furrer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Steiger</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Svensson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Barbour</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gallant</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Whole-genome analysis reveals a strong positional bias of conserved dMyc-dependent E-boxes.</article-title>
<source>Mol Cell Biol</source>
<year>2005</year>
<volume>25</volume>
<fpage>3401</fpage>
<lpage>3410</lpage>
<pub-id pub-id-type="pmid">15831447</pub-id>
<pub-id pub-id-type="doi">10.1128/MCB.25.9.3401-3410.2005</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>On ontologies for biologists: the Gene Ontology--untangling the web.</article-title>
<source>Novartis Found Symp</source>
<year>2002</year>
<volume>247</volume>
<fpage>66</fpage>
<lpage>80</lpage>
<comment>discussion 80-63, 84-90, 244-252</comment>
<pub-id pub-id-type="pmid">12539950</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parisi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nuttall</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Minor</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Naiman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Doctolero</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vainer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Malley</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A survey of ovary-, testis-, and soma-biased gene expression in Drosophila melanogaster adults.</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>5</volume>
<fpage>R40</fpage>
<pub-id pub-id-type="pmid">15186491</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2004-5-6-r40</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spellman</surname>
<given-names>PT</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>GM</given-names>
</name>
</person-group>
<article-title>Evidence for large domains of similarly expressed genes in the Drosophila genome.</article-title>
<source>J Biol</source>
<year>2002</year>
<volume>1</volume>
<fpage>5</fpage>
<pub-id pub-id-type="pmid">12144710</pub-id>
<pub-id pub-id-type="doi">10.1186/1475-4924-1-5</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yamashita</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Suzuki</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wakaguri</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Tsuritani</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Nakai</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Sugano</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>DBTSS: DataBase of Human Transcription Start Sites, progress report 2006.</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<issue>Database issue</issue>
<fpage>D86</fpage>
<lpage>89</lpage>
<pub-id pub-id-type="pmid">16381981</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkj129</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kutach</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Kadonaga</surname>
<given-names>JT</given-names>
</name>
</person-group>
<article-title>The downstream promoter element DPE appears to be as widely used as the TATA box in Drosophila core promoters.</article-title>
<source>Mol Cell Biol</source>
<year>2000</year>
<volume>20</volume>
<fpage>4754</fpage>
<lpage>4764</lpage>
<pub-id pub-id-type="pmid">10848601</pub-id>
<pub-id pub-id-type="doi">10.1128/MCB.20.13.4754-4764.2000</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lim</surname>
<given-names>CY</given-names>
</name>
<name>
<surname>Santoso</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Boulay</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ohler</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Kadonaga</surname>
<given-names>JT</given-names>
</name>
</person-group>
<article-title>The MTE, a new core promoter element for transcription by RNA polymerase II.</article-title>
<source>Genes Dev</source>
<year>2004</year>
<volume>18</volume>
<fpage>1606</fpage>
<lpage>1617</lpage>
<pub-id pub-id-type="pmid">15231738</pub-id>
<pub-id pub-id-type="doi">10.1101/gad.1193404</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mito</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Henikoff</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>Henikoff</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Genome-scale profiling of histone H3.3 replacement patterns.</article-title>
<source>Nat Genet</source>
<year>2005</year>
<volume>37</volume>
<fpage>1090</fpage>
<lpage>1097</lpage>
<pub-id pub-id-type="pmid">16155569</pub-id>
<pub-id pub-id-type="doi">10.1038/ng1637</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>YJ</given-names>
</name>
<name>
<surname>Dion</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Slack</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>LF</given-names>
</name>
<name>
<surname>Altschuler</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Rando</surname>
<given-names>OJ</given-names>
</name>
</person-group>
<article-title>Genome-scale identification of nucleosome positions in S. cerevisiae.</article-title>
<source>Science</source>
<year>2005</year>
<volume>309</volume>
<fpage>626</fpage>
<lpage>630</lpage>
<pub-id pub-id-type="pmid">15961632</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1112178</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Kamal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lindblad-Toh</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Bekiranov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>DK</given-names>
</name>
<name>
<surname>Huebert</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>McMahon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Karlsson</surname>
<given-names>EK</given-names>
</name>
<name>
<surname>Kulbokas</surname>
<given-names>EJ</given-names>
<suffix>3rd</suffix>
</name>
<name>
<surname>Gingeras</surname>
<given-names>TR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genomic maps and comparative analysis of histone modifications in human and mouse.</article-title>
<source>Cell</source>
<year>2005</year>
<volume>120</volume>
<fpage>169</fpage>
<lpage>181</lpage>
<pub-id pub-id-type="pmid">15680324</pub-id>
<pub-id pub-id-type="doi">10.1016/j.cell.2005.01.001</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bird</surname>
<given-names>AP</given-names>
</name>
</person-group>
<article-title>Functions for DNA methylation in vertebrates.</article-title>
<source>Cold Spring Harb Symp Quant Biol</source>
<year>1993</year>
<volume>58</volume>
<fpage>281</fpage>
<lpage>285</lpage>
<pub-id pub-id-type="pmid">7956040</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Caiafa</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zampieri</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>DNA methylation and chromatin structure: the puzzling CpG islands.</article-title>
<source>J Cell Biochem</source>
<year>2005</year>
<volume>94</volume>
<fpage>257</fpage>
<lpage>265</lpage>
<pub-id pub-id-type="pmid">15546139</pub-id>
<pub-id pub-id-type="doi">10.1002/jcb.20325</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perry</surname>
<given-names>RP</given-names>
</name>
</person-group>
<article-title>The architecture of mammalian ribosomal protein promoters.</article-title>
<source>BMC Evol Biol</source>
<year>2005</year>
<volume>5</volume>
<fpage>15</fpage>
<pub-id pub-id-type="pmid">15707503</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2148-5-15</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tanay</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Regev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Shamir</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>7203</fpage>
<lpage>7208</lpage>
<pub-id pub-id-type="pmid">15883364</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0502521102</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khaitovich</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hellmann</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Enard</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Nowick</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Leinweber</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Franz</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Weiss</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lachmann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Paabo</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees.</article-title>
<source>Science</source>
<year>2005</year>
<volume>309</volume>
<fpage>1850</fpage>
<lpage>1854</lpage>
<pub-id pub-id-type="pmid">16141373</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1108296</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fassler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Landsman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Acharya</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Moll</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Bonovich</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vinson</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>B-ZIP proteins encoded by the Drosophila genome: evaluation of potential dimerization partners.</article-title>
<source>Genome Res</source>
<year>2002</year>
<volume>12</volume>
<fpage>1190</fpage>
<lpage>1200</lpage>
<pub-id pub-id-type="pmid">12176927</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.67902</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karolchik</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Baertsch</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Diekhans</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>YT</given-names>
</name>
<name>
<surname>Roskin</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sugnet</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>DJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The UCSC Genome Browser Database.</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>51</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="pmid">12519945</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkg129</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Schaffer</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</article-title>
<source>Nucleic Acids Res</source>
<year>1997</year>
<volume>25</volume>
<fpage>3389</fpage>
<lpage>3402</lpage>
<pub-id pub-id-type="pmid">9254694</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rice</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Longden</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Bleasby</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>EMBOSS: the European Molecular Biology Open Software Suite.</article-title>
<source>Trends Genet</source>
<year>2000</year>
<volume>16</volume>
<fpage>276</fpage>
<lpage>277</lpage>
<pub-id pub-id-type="pmid">10827456</pub-id>
<pub-id pub-id-type="doi">10.1016/S0168-9525(00)02024-2</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mangalam</surname>
<given-names>HJ</given-names>
</name>
</person-group>
<article-title>tacg - a grep for DNA.</article-title>
<source>BMC Bioinformatics</source>
<year>2002</year>
<volume>3</volume>
<fpage>8</fpage>
<pub-id pub-id-type="pmid">11882250</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2105-3-8</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="other">
<article-title>Supplementary Data</article-title>
<ext-link ext-link-type="uri" xlink:href="http://genome.nci.nih.gov/publications/fly_promoters/"></ext-link>
</citation>
</ref>
<ref id="B39">
<citation citation-type="other">
<article-title>Grace Home Page</article-title>
<ext-link ext-link-type="uri" xlink:href="http://plasma-gate.weizmann.ac.il/Grace/"></ext-link>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schneider</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>RM</given-names>
</name>
</person-group>
<article-title>Sequence logos: a new way to display consensus sequences.</article-title>
<source>Nucleic Acids Res</source>
<year>1990</year>
<volume>18</volume>
<fpage>6097</fpage>
<lpage>6100</lpage>
<pub-id pub-id-type="pmid">2172928</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crooks</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chandonia</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>WebLogo: a sequence logo generator.</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<fpage>1188</fpage>
<lpage>1190</lpage>
<pub-id pub-id-type="pmid">15173120</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.849004</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="other">
<article-title>The Gene Ontology</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org/GO.current.annotations.shtml"></ext-link>
</citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhong</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Storch</surname>
<given-names>KF</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>WH</given-names>
</name>
</person-group>
<article-title>Comparative analysis of gene sets in the gene ontology space under the multiple hypothesis testing framework.</article-title>
<source>IEEE Computational Systems Bioinformatics Conference (CSB 2004): Stanford, California</source>
<year>2004</year>
<publisher-name>Piscataway: IEEE Publishing</publisher-name>
<fpage>425</fpage>
<lpage>435</lpage>
<comment>16-19 August 2004</comment>
<pub-id pub-id-type="pmid">16448035</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gentleman</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Carey</surname>
<given-names>VJ</given-names>
</name>
<name>
<surname>Bates</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Bolstad</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Dettling</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dudoit</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ellis</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Gautier</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Gentry</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Bioconductor: open software development for computational biology and bioinformatics.</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>5</volume>
<fpage>R80</fpage>
<pub-id pub-id-type="pmid">15461798</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2004-5-10-r80</pub-id>
</citation>
</ref>
</ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>The distribution of nucleotides across
<italic>Drosophila </italic>
and human promoters. The distribution of mononucleotides across the
<bold>(a) </bold>
1,500 bp region of 10,914
<italic>Drosophila </italic>
and
<bold>(b) </bold>
15,011 and
<bold>(c) </bold>
12,926 human promoters; the frequency of each mononucleotide is plotted against position (in 20 bp bins). The TSS occurs in bin 51 and its location is indicated.
<bold>(d) </bold>
The frequency of occurrence of the CA dinucleotide, at a single base-pair resolution across the 1,500 bp promoter region for all three datasets.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-1"></graphic>
</fig>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>The localization of all 65,536 8-mers in
<italic>Drosophila </italic>
and human promoters. The clustering factors (CF or CF
<sup>+</sup>
) calculated for 20 bp bins plotted at the position of the most populated bin for all 65,536 8-mers.
<bold>(a) </bold>
CF for 10,914
<italic>Drosophila </italic>
promoters;
<bold>(b) </bold>
CF for 15,011 human (UCSC) promoters;
<bold>(c) </bold>
CF for 12,926 human (DBTSS) promoters;
<bold>(d) </bold>
CF
<sup>+ </sup>
for 10,914
<italic>Drosophila </italic>
promoters;
<bold>(e) </bold>
CF
<sup>+ </sup>
for 15,011 human (UCSC) promoters;
<bold>(f) </bold>
CF
<sup>+ </sup>
for 12,926 human (DBTSS) promoters.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-2"></graphic>
</fig>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>Scatter plots showing the strand dependence of 8-mer localization, and the comparison of localization between different organisms (
<italic>Drosophila </italic>
and human). The clustering factors for all 8-mers, calculated for 20 bp bins, are plotted on the positive (CF
<sup>+</sup>
) versus the negative (CF
<sup>-</sup>
) strand for
<bold>(a) </bold>
<italic>Drosophila</italic>
,
<bold>(b) </bold>
human (UCSC), and
<bold>(c) </bold>
human (DBTSS) promoters. The 256 palindromic sequences have equivalent CF
<sup>+</sup>
/CF
<sup>- </sup>
values but are plotted with a CF
<sup>- </sup>
value of -1. Comparison of CF values of 8-mers for
<bold>(d) </bold>
human (UCSC) versus
<italic>Drosophila</italic>
,
<bold>(e) </bold>
human (DBTSS) versus
<italic>Drosophila</italic>
, and
<bold>(f) </bold>
human (UCSC) versus human (DBTSS). Common elements should lie along the diagonal.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-3"></graphic>
</fig>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption>
<p>8-mer localization in
<italic>Drosophila </italic>
expressed as a probability term, and characteristics of the most statistically relevant 8-mers.
<bold>(a) </bold>
The probability term P = -log
<sub>10</sub>
(1 -
<italic>p</italic>
) for the 13,552 8-mers with a maximum bin containing ≥15 members. The 298 DNA sequences above the line at
<italic>P </italic>
= 16, a 1 in 1 × 10
<sup>16 </sup>
(single sampling) chance of being random, were analyzed in more detail.
<bold>(b) </bold>
Clustering factors for both the positive (CF
<sup>+</sup>
) and negative strand (CF
<sup>-</sup>
) were plotted for the 298 most significant peaking 8-mers. The distribution falls into two distinct groupings; those that display a symmetric distribution on both strands (red circles) and those that cluster on only one strand (black circles).
<bold>(c) </bold>
A histogram showing the number of promoters containing each of the 15 motifs, grouped into three classes, DMp1 to 5, DMv1 to 5, and NDM1 to 5. We also present the common name and the consensus sequence.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-4"></graphic>
</fig>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption>
<p>The 15 DNA motifs derived from grouping 298 octamers whose probability of having a non-random distribution was less than 1 × 10
<sup>-16</sup>
. The table is grouped into two panels.
<bold>(a) </bold>
presents the 10 directional motifs, while
<bold>(b) </bold>
shows the five non-directional motifs. We present: the sequence logo; the consensus sequence using IUPAC letters to represent degenerate bases - R (G, A), W (A, T), Y (T, C), K (G, T), M(A, C), S (G, C), N (A, T, G, C); the name assigned in this work; the common name if it exists; designations from previous work [10]; the number of 8-mers that peaked that were placed in the family; peak location as base-pairs relative to the TSS; clustering factor (CF
<sup>+</sup>
) on the positive strand; clustering factor (CF
<sup>-</sup>
) on the negative strand; the bins that were pooled to define the peak; and the unique genes in the peak.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-5"></graphic>
</fig>
<fig position="float" id="F6">
<label>Figure 6</label>
<caption>
<p>The distribution of the 15 identified motifs in
<italic>Drosophila </italic>
promoters.
<bold>(a-o) </bold>
The number of occurrences of each motif, in each 20 bp bin, for the positive strand (solid red) and the negative strand (dashed black). The inserts show the same data plotted at a single nucleotide resolution from -100 bp to +100 bp relative to the TSS. Inserts for the directional motifs (DMp1 to 5 and DMv1 to 5) show the distribution on the positive strand only, while those for the non-directional motifs (NDM1 to 5) show the distribution for both strands. (a-e) The directional motifs that have a precise localization (DMp); (f-j) the directional motifs with a variable localization (DMv); (k-o) the non-directional motifs that all have a variable localization (NDM).</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-6"></graphic>
</fig>
<fig position="float" id="F7">
<label>Figure 7</label>
<caption>
<p>The localization, on the positive strand, of all 4,096 6-mers in
<italic>Drosophila </italic>
and human promoters. Clustering factor (CF
<sup>+</sup>
) for the positive strand, plotted at a single base-pair resolution, at the position of the most populated bp, for all 4,096 6-mers.
<bold>(a) </bold>
CF
<sup>+ </sup>
from 10,914
<italic>Drosophila </italic>
promoters;
<bold>(b) </bold>
CF
<sup>+ </sup>
from 15,011 human (UCSC);
<bold>(c) </bold>
CF
<sup>+ </sup>
from 12,926 human (DBTSS) promoters;
<bold>(d) </bold>
the exact placement of
<italic>Drosophila </italic>
TATA, INR variants, and DPE variants relative to each other. The sequence is broken into 10 bp segments.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-7"></graphic>
</fig>
<fig position="float" id="F8">
<label>Figure 8</label>
<caption>
<p>The distribution of 15 '
<italic>Drosophila </italic>
specific' motifs in
<italic>Drosophila </italic>
and human promoters.
<bold>(a-o) </bold>
The number of occurrences of each of the 15 identified
<italic>Drosophila </italic>
motifs in each 20 bp bin for
<italic>Drosophila </italic>
(dotted black), human (UCSC; solid red) and human (DBTSS; dashed blue) promoters. For the ten directional motifs, only the occurrences on the positive strand are represented. For the five non-directional elements, the occurrences on both the positive and negative strand are represented.
<bold>(x) </bold>
The distributions of the INR motif (TGACTY), from -100 to +100, for both
<italic>Drosophila </italic>
and human promoters at a single base-pair resolution. The number of occurrences of each element has been normalized, based on a dataset of 10,000 promoters, to compensate for the different sizes of the datasets.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-8"></graphic>
</fig>
<fig position="float" id="F9">
<label>Figure 9</label>
<caption>
<p>The distribution of 8 'human specific' motifs in
<italic>Drosophila </italic>
and human promoters.
<bold>(a-h) </bold>
The number of occurrences of each previously identified [11] human specific motif in each 20 bp bin for
<italic>Drosophila </italic>
(dotted black), human (UCSC; solid red) and human (DBTSS; dashed blue) promoters. The number of occurrences of each element has been normalized, based on a dataset of 10,000 promoters, to compensate for the different sizes of the datasets.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-9"></graphic>
</fig>
<fig position="float" id="F10">
<label>Figure 10</label>
<caption>
<p>E-box variants that peak in
<italic>Drosophila </italic>
and human promoters.
<bold>(a-d) </bold>
The number of occurrences of
<bold>(a) </bold>
CACGTG,
<bold>(b) </bold>
CAGCTG,
<bold>(c) </bold>
RCACGTGY and
<bold>(d) </bold>
YCACGTGR in each 20 bp bin for
<italic>Drosophila </italic>
(dotted black), human (UCSC; solid red), and human (DBTSS; dashed blue) promoters.</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-10"></graphic>
</fig>
<fig position="float" id="F11">
<label>Figure 11</label>
<caption>
<p>Correlations between DNA motifs in promoters and function (GO terms and mRNA expression properties). In both sections of the figure, promoter lists in blue are DMp, green are DMv, and red are NDM. Control groups with the DNA motifs not in the peak but between -1,000 bp and +499 bp are in black with an asterisk.
<bold>(a) </bold>
False-color image of representation bias in GO terms and mRNA expression clusters for the 15 DNA motifs, either in the peak or elsewhere in the promoter region. Values plotted are -log
<sub>10</sub>
(
<italic>p </italic>
value) calculated by Fisher's exact test. Data for the 54 most strongly correlated GO terms are shown (some redundant GO terms are removed). On the far left are results for over/under representation in self-organizing map (SOM) clusters identified from previously published expression data [20]. Over-represented categories are colored in red and under-represented categories are in blue. N values displayed at the top are total numbers of genes in the reference set assigned to that group.
<bold>(b) </bold>
False-color image of hierarchically clustered median percentile ranks of mRNA expression ratios, for previously published data for embryo and adult samples [21]. Each ratio represents expression relative to a global mean across arrays. Columns represent each of 89 array experiments, clustered so that embryo samples are at left and adult samples are at right. 'All Promoters' represents all genes and shows no preferences (median percentile rank = 50).</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-11"></graphic>
</fig>
<fig position="float" id="F12">
<label>Figure 12</label>
<caption>
<p>Correlations between five INR variants localized exactly at the TSS in promoters and function (GO terms and mRNA expression properties).
<bold>(a) </bold>
False-color image of representation bias in GO terms and mRNA expression clusters for the five variants of the INR motif in the peak. Values are calculated and displayed as in Figure 11a. The 42 most strongly correlated GO terms are shown. Note that each INR variant correlates with different GO terms.
<bold>(b) </bold>
False-color image of hierarchically clustered median percentile ranks of mRNA expression ratios, for previously published data for embryo and adult samples 21. Data are calculated and displayed as in Figure 1</p>
</caption>
<graphic xlink:href="gb-2006-7-7-r53-12"></graphic>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>The co-occurrence in the same promoter of DNA motifs that cluster</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="left">
<bold>Motif</bold>
</td>
<td></td>
<td></td>
<td align="center">
<bold>DMp1</bold>
</td>
<td align="center">
<bold>DMp2</bold>
</td>
<td align="center">
<bold>DMp3</bold>
</td>
<td align="center">
<bold>DMp4</bold>
</td>
<td align="center">
<bold>DMp5</bold>
</td>
<td align="center">
<bold>DMv1</bold>
</td>
<td align="center">
<bold>DMv2</bold>
</td>
<td align="center">
<bold>DMv3</bold>
</td>
<td align="center">
<bold>DMv4</bold>
</td>
<td align="center">
<bold>DMv5</bold>
</td>
<td align="center">
<bold>NDM1</bold>
</td>
<td align="center">
<bold>NDM2</bold>
</td>
<td align="center">
<bold>NDM3</bold>
</td>
<td align="center">
<bold>NDM4</bold>
</td>
<td align="center">
<bold>NDM5</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Ohler no.</bold>
</td>
<td></td>
<td></td>
<td align="right">
<bold>3</bold>
</td>
<td align="right">
<bold>4</bold>
</td>
<td></td>
<td align="right">
<bold>9</bold>
</td>
<td></td>
<td></td>
<td align="right">
<bold>8</bold>
</td>
<td align="right">
<bold>7</bold>
</td>
<td align="right">
<bold>1</bold>
</td>
<td align="right">
<bold>6</bold>
</td>
<td></td>
<td></td>
<td></td>
<td align="right">
<bold>2</bold>
</td>
<td align="right">
<bold>5</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Name</bold>
</td>
<td></td>
<td></td>
<td align="center">
<bold>TATA</bold>
</td>
<td align="center">
<bold>INR</bold>
</td>
<td align="center">
<bold>INR1</bold>
</td>
<td align="center">
<bold>DPE1</bold>
</td>
<td align="center">
<bold>DPE2</bold>
</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td align="center">
<bold>GAGA</bold>
</td>
<td></td>
<td></td>
<td align="center">
<bold>DRE</bold>
</td>
<td align="center">
<bold>E-box</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Totals</bold>
</td>
<td></td>
<td align="center">
<bold>8289</bold>
</td>
<td align="center">
<bold>511</bold>
</td>
<td align="center">
<bold>1501</bold>
</td>
<td align="center">
<bold>113</bold>
</td>
<td align="center">
<bold>80</bold>
</td>
<td align="center">
<bold>147</bold>
</td>
<td align="center">
<bold>311</bold>
</td>
<td align="center">
<bold>311</bold>
</td>
<td align="center">
<bold>604</bold>
</td>
<td align="center">
<bold>649</bold>
</td>
<td align="center">
<bold>287</bold>
</td>
<td align="center">
<bold>359</bold>
</td>
<td align="center">
<bold>424</bold>
</td>
<td align="center">
<bold>215</bold>
</td>
<td align="center">
<bold>1593</bold>
</td>
<td align="center">
<bold>1184</bold>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<bold>(a)</bold>
</td>
<td align="left">STATAAA</td>
<td align="left">DMp1</td>
<td align="center">511</td>
<td></td>
<td align="center">98</td>
<td align="center">9</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>8</underline>
</td>
<td align="center">
<underline>10</underline>
</td>
<td align="center">
<bold>
<underline>6</underline>
</bold>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">19</td>
<td align="center">28</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<bold>
<underline>21</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>26</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">TCAGTY</td>
<td align="left">DMp2</td>
<td align="center">1501</td>
<td align="center">98</td>
<td></td>
<td align="center">
<underline>12</underline>
</td>
<td align="center">25</td>
<td align="center">
<bold>43</bold>
</td>
<td align="center">
<bold>
<underline>15</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>18</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>34</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>17</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>12</underline>
</bold>
</td>
<td align="center">
<bold>100</bold>
</td>
<td align="center">
<bold>108</bold>
</td>
<td align="center">38</td>
<td align="center">
<bold>
<underline>67</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>112</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">TCATTCG</td>
<td align="left">DMp3</td>
<td align="center">113</td>
<td align="center">9</td>
<td align="center">
<underline>12</underline>
</td>
<td></td>
<td align="center">
<underline>0</underline>
</td>
<td align="center">5</td>
<td align="center">
<underline>3</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">10</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<underline>9</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">CGGACGT</td>
<td align="left">DMp4</td>
<td align="center">80</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">25</td>
<td align="center">
<underline>0</underline>
</td>
<td></td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">10</td>
<td align="center">6</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">9</td>
</tr>
<tr>
<td></td>
<td align="left">KCGGTTSK</td>
<td align="left">DMp5</td>
<td align="center">147</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<bold>43</bold>
</td>
<td align="center">5</td>
<td align="center">
<underline>1</underline>
</td>
<td></td>
<td align="center">
<underline>3</underline>
</td>
<td align="center">
<underline>0</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>3</underline>
</td>
<td align="center">14</td>
<td align="center">11</td>
<td align="center">7</td>
<td align="center">
<bold>
<underline>4</underline>
</bold>
</td>
<td align="center">18</td>
</tr>
<tr>
<td></td>
<td align="left">CARCCCT</td>
<td align="left">DMv1</td>
<td align="center">311</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<bold>
<underline>15</underline>
</bold>
</td>
<td align="center">
<underline>3</underline>
</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">
<underline>3</underline>
</td>
<td></td>
<td align="center">16</td>
<td align="center">
<underline>13</underline>
</td>
<td align="center">
<underline>18</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>5</underline>
</td>
<td align="center">
<underline>7</underline>
</td>
<td align="center">
<underline>7</underline>
</td>
<td align="center">
<bold>79</bold>
</td>
<td align="center">46</td>
</tr>
<tr>
<td></td>
<td align="left">TGGYAACR</td>
<td align="left">DMv2</td>
<td align="center">311</td>
<td align="center">
<underline>8</underline>
</td>
<td align="center">
<bold>
<underline>18</underline>
</bold>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>0</underline>
</td>
<td align="center">16</td>
<td></td>
<td align="center">
<underline>8</underline>
</td>
<td align="center">
<underline>15</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">59</td>
<td align="center">
<bold>64</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">CAYCNCTA</td>
<td align="left">DMv3</td>
<td align="center">604</td>
<td align="center">
<underline>10</underline>
</td>
<td align="center">
<bold>
<underline>34</underline>
</bold>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>13</underline>
</td>
<td align="center">
<underline>8</underline>
</td>
<td></td>
<td align="center">
<underline>18</underline>
</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<bold>
<underline>1</underline>
</bold>
</td>
<td align="center">
<underline>16</underline>
</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<bold>282</bold>
</td>
<td align="center">
<underline>63</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GGYCACAC</td>
<td align="left">DMv4</td>
<td align="center">649</td>
<td align="center">
<bold>
<underline>6</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>17</underline>
</bold>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>18</underline>
</td>
<td align="center">
<underline>15</underline>
</td>
<td align="center">
<underline>18</underline>
</td>
<td></td>
<td align="center">
<bold>64</bold>
</td>
<td align="center">
<underline>8</underline>
</td>
<td align="center">
<underline>12</underline>
</td>
<td align="center">
<underline>12</underline>
</td>
<td align="center">95</td>
<td align="center">
<underline>59</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">TGGTATTT</td>
<td align="left">DMv5</td>
<td align="center">287</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<bold>
<underline>12</underline>
</bold>
</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>3</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<bold>64</bold>
</td>
<td></td>
<td align="center">
<underline>0</underline>
</td>
<td align="center">
<underline>5</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">
<underline>26</underline>
</td>
<td align="center">
<underline>38</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GAGAGCG</td>
<td align="left">NDM1</td>
<td align="center">359</td>
<td align="center">19</td>
<td align="center">
<bold>100</bold>
</td>
<td align="center">10</td>
<td align="center">10</td>
<td align="center">14</td>
<td align="center">
<underline>5</underline>
</td>
<td align="center">
<underline>4</underline>
</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">
<underline>8</underline>
</td>
<td align="center">
<underline>0</underline>
</td>
<td></td>
<td align="center">26</td>
<td align="center">18</td>
<td align="center">
<bold>
<underline>6</underline>
</bold>
</td>
<td align="center">
<underline>28</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">CGMYGYCR</td>
<td align="left">NDM2</td>
<td align="center">424</td>
<td align="center">28</td>
<td align="center">
<bold>108</bold>
</td>
<td align="center">5</td>
<td align="center">6</td>
<td align="center">11</td>
<td align="center">
<underline>7</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>16</underline>
</td>
<td align="center">
<underline>12</underline>
</td>
<td align="center">
<underline>5</underline>
</td>
<td align="center">26</td>
<td></td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>33</underline>
</td>
<td align="center">
<underline>34</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GAAAGCT</td>
<td align="left">NDM3</td>
<td align="center">215</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">38</td>
<td align="center">5</td>
<td align="center">
<underline>1</underline>
</td>
<td align="center">7</td>
<td align="center">7</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<underline>12</underline>
</td>
<td align="center">
<underline>2</underline>
</td>
<td align="center">18</td>
<td align="center">
<underline>6</underline>
</td>
<td></td>
<td align="center">
<underline>22</underline>
</td>
<td align="center">33</td>
</tr>
<tr>
<td></td>
<td align="left">ATCGATA</td>
<td align="left">NDM4</td>
<td align="center">1593</td>
<td align="center">
<bold>
<underline>21</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>67</underline>
</bold>
</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">
<underline>6</underline>
</td>
<td align="center">
<bold>
<underline>4</underline>
</bold>
</td>
<td align="center">
<bold>79</bold>
</td>
<td align="center">59</td>
<td align="center">
<bold>282</bold>
</td>
<td align="center">
<underline>95</underline>
</td>
<td align="center">
<underline>26</underline>
</td>
<td align="center">
<bold>
<underline>6</underline>
</bold>
</td>
<td align="center">
<underline>33</underline>
</td>
<td align="center">
<underline>22</underline>
</td>
<td></td>
<td align="center">
<bold>265</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">CAGCTSWW</td>
<td align="left">NDM5</td>
<td align="center">1184</td>
<td align="center">
<bold>
<underline>26</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>112</underline>
</bold>
</td>
<td align="center">
<underline>9</underline>
</td>
<td align="center">9</td>
<td align="center">18</td>
<td align="center">46</td>
<td align="center">
<bold>64</bold>
</td>
<td align="center">
<underline>63</underline>
</td>
<td align="center">
<underline>59</underline>
</td>
<td align="center">38</td>
<td align="center">
<underline>28</underline>
</td>
<td align="center">
<underline>34</underline>
</td>
<td align="center">33</td>
<td align="center">
<bold>265</bold>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Unique</bold>
</td>
<td></td>
<td align="center">
<bold>4156</bold>
</td>
<td align="center">
<bold>304</bold>
</td>
<td align="center">
<bold>932</bold>
</td>
<td align="center">
<bold>58</bold>
</td>
<td align="center">
<bold>30</bold>
</td>
<td align="center">
<bold>48</bold>
</td>
<td align="center">
<bold>146</bold>
</td>
<td align="center">
<bold>146</bold>
</td>
<td align="center">
<bold>220</bold>
</td>
<td align="center">
<bold>366</bold>
</td>
<td align="center">
<bold>141</bold>
</td>
<td align="center">
<bold>165</bold>
</td>
<td align="center">
<bold>195</bold>
</td>
<td align="center">
<bold>88</bold>
</td>
<td align="center">
<bold>783</bold>
</td>
<td align="center">
<bold>534</bold>
</td>
</tr>
<tr>
<td colspan="19">
<hr></hr>
</td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Totals</bold>
</td>
<td></td>
<td align="center">
<bold>8289</bold>
</td>
<td align="center">
<bold>511</bold>
</td>
<td align="center">
<bold>1501</bold>
</td>
<td align="center">
<bold>113</bold>
</td>
<td align="center">
<bold>80</bold>
</td>
<td align="center">
<bold>147</bold>
</td>
<td align="center">
<bold>311</bold>
</td>
<td align="center">
<bold>311</bold>
</td>
<td align="center">
<bold>604</bold>
</td>
<td align="center">
<bold>649</bold>
</td>
<td align="center">
<bold>287</bold>
</td>
<td align="center">
<bold>359</bold>
</td>
<td align="center">
<bold>424</bold>
</td>
<td align="center">
<bold>215</bold>
</td>
<td align="center">
<bold>1593</bold>
</td>
<td align="center">
<bold>1184</bold>
</td>
</tr>
<tr>
<td align="left">
<bold>(b)</bold>
</td>
<td align="left">STATAAA</td>
<td align="left">DMp1</td>
<td align="center">511</td>
<td align="center">4.7</td>
<td align="center">6.5</td>
<td align="center">8.0</td>
<td align="center">
<underline>2.5</underline>
</td>
<td align="center">
<underline>2.7</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>2.6</underline>
</td>
<td align="center">
<underline>1.7</underline>
</td>
<td align="center">
<bold>
<underline>0.9</underline>
</bold>
</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">5.3</td>
<td align="center">6.6</td>
<td align="center">
<underline>4.2</underline>
</td>
<td align="center">
<bold>
<underline>1.3</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>2.2</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">TCAGTY</td>
<td align="left">DMp2</td>
<td align="center">1501</td>
<td align="center">19.2</td>
<td align="center">13.8</td>
<td align="center">
<underline>10.6</underline>
</td>
<td align="center">31.3</td>
<td align="center">
<bold>29.3</bold>
</td>
<td align="center">
<bold>
<underline>4.8</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>5.8</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>5.6</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>2.6</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>4.2</underline>
</bold>
</td>
<td align="center">
<bold>27.9</bold>
</td>
<td align="center">
<bold>25.5</bold>
</td>
<td align="center">17.7</td>
<td align="center">
<bold>
<underline>4.2</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>9.5</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">TCATTCG</td>
<td align="left">DMp3</td>
<td align="center">113</td>
<td align="center">1.8</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">1.0</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">3.4</td>
<td align="center">
<underline>1.0</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">2.8</td>
<td align="center">1.2</td>
<td align="center">2.3</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>0.8</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">CGGACGT</td>
<td align="left">DMp4</td>
<td align="center">80</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">1.7</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">0.7</td>
<td align="center">
<underline>0.7</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>0.7</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.7</underline>
</td>
<td align="center">2.8</td>
<td align="center">1.4</td>
<td align="center">
<underline>0.5</underline>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">0.8</td>
</tr>
<tr>
<td></td>
<td align="left">KCGGTTSK</td>
<td align="left">DMp5</td>
<td align="center">147</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">
<bold>2.9</bold>
</td>
<td align="center">4.4</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">1.4</td>
<td align="center">
<underline>1.0</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">3.9</td>
<td align="center">2.6</td>
<td align="center">3.3</td>
<td align="center">
<bold>
<underline>0.3</underline>
</bold>
</td>
<td align="center">1.5</td>
</tr>
<tr>
<td></td>
<td align="left">CARCCCT</td>
<td align="left">DMv1</td>
<td align="center">311</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">
<bold>
<underline>1.0</underline>
</bold>
</td>
<td align="center">
<underline>2.7</underline>
</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">
<underline>2.0</underline>
</td>
<td align="center">2.9</td>
<td align="center">5.1</td>
<td align="center">
<underline>2.2</underline>
</td>
<td align="center">
<underline>2.8</underline>
</td>
<td align="center">
<underline>2.1</underline>
</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">
<underline>1.7</underline>
</td>
<td align="center">3.3</td>
<td align="center">
<bold>5.0</bold>
</td>
<td align="center">3.9</td>
</tr>
<tr>
<td></td>
<td align="left">TGGYAACR</td>
<td align="left">DMv2</td>
<td align="center">311</td>
<td align="center">
<underline>1.6</underline>
</td>
<td align="center">
<bold>
<underline>1.2</underline>
</bold>
</td>
<td align="center">
<underline>1.8</underline>
</td>
<td align="center">
<underline>2.5</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">5.1</td>
<td align="center">2.9</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">
<underline>2.3</underline>
</td>
<td align="center">
<underline>2.1</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">
<underline>2.8</underline>
</td>
<td align="center">3.7</td>
<td align="center">
<bold>5.4</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">CAYCNCTA</td>
<td align="left">DMv3</td>
<td align="center">604</td>
<td align="center">
<underline>2.0</underline>
</td>
<td align="center">
<bold>
<underline>2.3</underline>
</bold>
</td>
<td align="center">
<underline>1.8</underline>
</td>
<td align="center">
<underline>5.0</underline>
</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">
<underline>4.2</underline>
</td>
<td align="center">
<underline>2.6</underline>
</td>
<td align="center">5.5</td>
<td align="center">
<underline>2.8</underline>
</td>
<td align="center">
<underline>3.1</underline>
</td>
<td align="center">
<bold>
<underline>0.3</underline>
</bold>
</td>
<td align="center">
<underline>3.8</underline>
</td>
<td align="center">
<underline>4.2</underline>
</td>
<td align="center">
<bold>17.7</bold>
</td>
<td align="center">
<underline>5.3</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GGYCACAC</td>
<td align="left">DMv4</td>
<td align="center">649</td>
<td align="center">
<bold>
<underline>1.2</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>1.1</underline>
</bold>
</td>
<td align="center">
<underline>3.5</underline>
</td>
<td align="center">
<underline>2.5</underline>
</td>
<td align="center">
<underline>2.7</underline>
</td>
<td align="center">
<underline>5.8</underline>
</td>
<td align="center">
<underline>4.8</underline>
</td>
<td align="center">
<underline>3.0</underline>
</td>
<td align="center">6.0</td>
<td align="center">
<bold>22.3</bold>
</td>
<td align="center">
<underline>2.2</underline>
</td>
<td align="center">
<underline>2.8</underline>
</td>
<td align="center">
<underline>5.6</underline>
</td>
<td align="center">6.0</td>
<td align="center">
<underline>5.0</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">TGGTATTT</td>
<td align="left">DMv5</td>
<td align="center">287</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">
<bold>
<underline>0.8</underline>
</bold>
</td>
<td align="center">
<underline>0.9</underline>
</td>
<td align="center">
<underline>2.5</underline>
</td>
<td align="center">
<underline>2.0</underline>
</td>
<td align="center">
<underline>1.9</underline>
</td>
<td align="center">
<underline>1.9</underline>
</td>
<td align="center">
<underline>1.5</underline>
</td>
<td align="center">
<bold>9.9</bold>
</td>
<td align="center">2.6</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>1.2</underline>
</td>
<td align="center">
<underline>0.9</underline>
</td>
<td align="center">
<underline>1.6</underline>
</td>
<td align="center">3.2</td>
</tr>
<tr>
<td></td>
<td align="left">GAGAGCG</td>
<td align="left">NDM1</td>
<td align="center">359</td>
<td align="center">3.7</td>
<td align="center">
<bold>6.7</bold>
</td>
<td align="center">8.9</td>
<td align="center">12.5</td>
<td align="center">9.5</td>
<td align="center">
<underline>1.6</underline>
</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">
<bold>
<underline>0.2</underline>
</bold>
</td>
<td align="center">
<underline>1.2</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">3.3</td>
<td align="center">6.1</td>
<td align="center">8.4</td>
<td align="center">
<bold>
<underline>0.4</underline>
</bold>
</td>
<td align="center">
<underline>2.4</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">CGMYGYCR</td>
<td align="left">NDM2</td>
<td align="center">424</td>
<td align="center">5.5</td>
<td align="center">
<bold>7.2</bold>
</td>
<td align="center">4.4</td>
<td align="center">7.5</td>
<td align="center">7.5</td>
<td align="center">
<underline>2.3</underline>
</td>
<td align="center">
<underline>1.9</underline>
</td>
<td align="center">
<underline>2.7</underline>
</td>
<td align="center">
<underline>1.9</underline>
</td>
<td align="center">
<underline>1.7</underline>
</td>
<td align="center">7.2</td>
<td align="center">3.9</td>
<td align="center">2.8</td>
<td align="center">
<underline>2.1</underline>
</td>
<td align="center">
<underline>2.9</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GAAAGCT</td>
<td align="left">NDM3</td>
<td align="center">215</td>
<td align="center">
<underline>1.8</underline>
</td>
<td align="center">2.5</td>
<td align="center">4.4</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">4.8</td>
<td align="center">2.3</td>
<td align="center">
<underline>1.9</underline>
</td>
<td align="center">
<underline>1.5</underline>
</td>
<td align="center">
<underline>1.9</underline>
</td>
<td align="center">
<underline>0.7</underline>
</td>
<td align="center">5.0</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">2.0</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">2.8</td>
</tr>
<tr>
<td></td>
<td align="left">ATCGATA</td>
<td align="left">NDM4</td>
<td align="center">1593</td>
<td align="center">
<bold>
<underline>4.1</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>4.5</underline>
</bold>
</td>
<td align="center">
<underline>8.0</underline>
</td>
<td align="center">
<underline>7.5</underline>
</td>
<td align="center">
<bold>
<underline>2.7</underline>
</bold>
</td>
<td align="center">
<bold>25.4</bold>
</td>
<td align="center">19.0</td>
<td align="center">
<bold>46.7</bold>
</td>
<td align="center">14.6</td>
<td align="center">
<underline>9.1</underline>
</td>
<td align="center">
<bold>
<underline>1.7</underline>
</bold>
</td>
<td align="center">
<underline>7.8</underline>
</td>
<td align="center">
<underline>10.2</underline>
</td>
<td align="center">14.6</td>
<td align="center">
<bold>22.4</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">CAGCTSWW</td>
<td align="left">NDM5</td>
<td align="center">1184</td>
<td align="center">
<bold>
<underline>5.1</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>7.5</underline>
</bold>
</td>
<td align="center">
<underline>8.0</underline>
</td>
<td align="center">
<underline>11.3</underline>
</td>
<td align="center">12.2</td>
<td align="center">14.8</td>
<td align="center">
<bold>20.6</bold>
</td>
<td align="center">
<underline>10.4</underline>
</td>
<td align="center">
<underline>9.1</underline>
</td>
<td align="center">13.2</td>
<td align="center">
<underline>7.8</underline>
</td>
<td align="center">
<underline>8.0</underline>
</td>
<td align="center">15.4</td>
<td align="center">
<bold>16.6</bold>
</td>
<td align="center">10.9</td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Unique</bold>
</td>
<td></td>
<td></td>
<td align="center">
<bold>59.5</bold>
</td>
<td align="center">
<bold>62.1</bold>
</td>
<td align="center">
<bold>51.3</bold>
</td>
<td align="center">
<bold>37.5</bold>
</td>
<td align="center">
<bold>32.7</bold>
</td>
<td align="center">
<bold>47.0</bold>
</td>
<td align="center">
<bold>47.0</bold>
</td>
<td align="center">
<bold>36.4</bold>
</td>
<td align="center">
<bold>56.4</bold>
</td>
<td align="center">
<bold>49.1</bold>
</td>
<td align="center">
<bold>46.0</bold>
</td>
<td align="center">
<bold>46.0</bold>
</td>
<td align="center">
<bold>40.9</bold>
</td>
<td align="center">
<bold>49.2</bold>
</td>
<td align="center">
<bold>45.1</bold>
</td>
</tr>
<tr>
<td colspan="19">
<hr></hr>
</td>
</tr>
<tr>
<td></td>
<td align="left">
<bold>Totals</bold>
</td>
<td></td>
<td align="center">
<bold>8289</bold>
</td>
<td align="center">
<bold>511</bold>
</td>
<td align="center">
<bold>1501</bold>
</td>
<td align="center">
<bold>113</bold>
</td>
<td align="center">
<bold>80</bold>
</td>
<td align="center">
<bold>147</bold>
</td>
<td align="center">
<bold>311</bold>
</td>
<td align="center">
<bold>311</bold>
</td>
<td align="center">
<bold>604</bold>
</td>
<td align="center">
<bold>649</bold>
</td>
<td align="center">
<bold>287</bold>
</td>
<td align="center">
<bold>359</bold>
</td>
<td align="center">
<bold>424</bold>
</td>
<td align="center">
<bold>215</bold>
</td>
<td align="center">
<bold>1593</bold>
</td>
<td align="center">
<bold>1184</bold>
</td>
</tr>
<tr>
<td align="left">
<bold>(c)</bold>
</td>
<td align="left">STATAAA</td>
<td align="left">DMp1</td>
<td align="center">511</td>
<td></td>
<td align="center">3.2</td>
<td align="center">0.8</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.5</underline>
</td>
<td align="center">
<underline>4.1</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<underline>4.1</underline>
</td>
<td align="center">
<bold>
<underline>7.3</underline>
</bold>
</td>
<td align="center">
<underline>2.4</underline>
</td>
<td align="center">0.2</td>
<td align="center">1.1</td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">
<bold>
<underline>14.2</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>5.4</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">TCAGTY</td>
<td align="left">DMp2</td>
<td align="center">1501</td>
<td align="center">3.2</td>
<td></td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">4.1</td>
<td align="center">
<bold>5.9</bold>
</td>
<td align="center">
<bold>
<underline>6.5</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>5.1</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>10.2</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>22.6</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>7.0</underline>
</bold>
</td>
<td align="center">
<bold>11.8</bold>
</td>
<td align="center">
<bold>10.1</bold>
</td>
<td align="center">0.9</td>
<td align="center">
<bold>
<underline>40.4</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>5.6</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">TCATTCG</td>
<td align="left">DMp3</td>
<td align="center">113</td>
<td align="center">0.8</td>
<td align="center">
<underline>0.4</underline>
</td>
<td></td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">1.4</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">
<underline>1.0</underline>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">2.1</td>
<td align="center">0.0</td>
<td align="center">0.8</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">
<underline>0.4</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">CGGACGT</td>
<td align="left">DMp4</td>
<td align="center">80</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">4.1</td>
<td align="center">
<underline>0.1</underline>
</td>
<td></td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.2</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">3.3</td>
<td align="center">0.7</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">0.0</td>
</tr>
<tr>
<td></td>
<td align="left">KCGGTTSK</td>
<td align="left">DMp5</td>
<td align="center">147</td>
<td align="center">
<underline>0.5</underline>
</td>
<td align="center">
<bold>5.9</bold>
</td>
<td align="center">1.4</td>
<td align="center">
<underline>0.0</underline>
</td>
<td></td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">
<underline>1.6</underline>
</td>
<td align="center">
<underline>1.7</underline>
</td>
<td align="center">
<underline>0.9</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">3.2</td>
<td align="center">1.3</td>
<td align="center">1.3</td>
<td align="center">
<bold>
<underline>5.5</underline>
</bold>
</td>
<td align="center">0.2</td>
</tr>
<tr>
<td></td>
<td align="left">CARCCCT</td>
<td align="left">DMv1</td>
<td align="center">311</td>
<td align="center">
<underline>4.1</underline>
</td>
<td align="center">
<bold>
<underline>6.5</underline>
</bold>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.2</underline>
</td>
<td align="center">
<underline>0.1</underline>
</td>
<td></td>
<td align="center">1.5</td>
<td align="center">
<underline>0.5</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.2</underline>
</td>
<td align="center">
<underline>1.0</underline>
</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">0.1</td>
<td align="center">
<bold>6.3</bold>
</td>
<td align="center">1.5</td>
</tr>
<tr>
<td></td>
<td align="left">TGGYAACR</td>
<td align="left">DMv2</td>
<td align="center">311</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<bold>
<underline>5.1</underline>
</bold>
</td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>1.6</underline>
</td>
<td align="center">1.5</td>
<td></td>
<td align="center">
<underline>1.8</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.2</underline>
</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">1.4</td>
<td align="center">
<bold>6.3</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">CAYCNCTA</td>
<td align="left">DMv3</td>
<td align="center">604</td>
<td align="center">
<underline>4.1</underline>
</td>
<td align="center">
<bold>
<underline>10.2</underline>
</bold>
</td>
<td align="center">
<underline>1.0</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>1.7</underline>
</td>
<td align="center">
<underline>0.5</underline>
</td>
<td align="center">
<underline>1.8</underline>
</td>
<td></td>
<td align="center">
<underline>3.1</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<bold>
<underline>7.4</underline>
</bold>
</td>
<td align="center">
<underline>0.9</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<bold>84.2</bold>
</td>
<td align="center">
<underline>0.1</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GGYCACAC</td>
<td align="left">DMv4</td>
<td align="center">649</td>
<td align="center">
<bold>
<underline>7.3</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>22.6</underline>
</bold>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">
<underline>0.6</underline>
</td>
<td align="center">
<underline>0.9</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>3.1</underline>
</td>
<td></td>
<td align="center">
<bold>19.9</bold>
</td>
<td align="center">
<underline>2.9</underline>
</td>
<td align="center">
<underline>2.4</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">0.0</td>
<td align="center">
<underline>0.8</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">TGGTATTT</td>
<td align="left">DMv5</td>
<td align="center">287</td>
<td align="center">
<underline>2.4</underline>
</td>
<td align="center">
<bold>
<underline>7.0</underline>
</bold>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.2</underline>
</td>
<td align="center">
<underline>0.2</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<bold>19.9</bold>
</td>
<td></td>
<td align="center">
<underline>3.9</underline>
</td>
<td align="center">
<underline>1.2</underline>
</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">
<underline>2.2</underline>
</td>
<td align="center">0.6</td>
</tr>
<tr>
<td></td>
<td align="left">GAGAGCG</td>
<td align="left">NDM1</td>
<td align="center">359</td>
<td align="center">0.2</td>
<td align="center">
<bold>11.8</bold>
</td>
<td align="center">2.1</td>
<td align="center">3.3</td>
<td align="center">3.2</td>
<td align="center">
<underline>1.0</underline>
</td>
<td align="center">
<underline>1.4</underline>
</td>
<td align="center">
<bold>
<underline>7.4</underline>
</bold>
</td>
<td align="center">
<underline>2.9</underline>
</td>
<td align="center">
<underline>3.9</underline>
</td>
<td></td>
<td align="center">2.5</td>
<td align="center">3.3</td>
<td align="center">
<bold>
<underline>16.8</underline>
</bold>
</td>
<td align="center">
<underline>1.2</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">CGMYGYCR</td>
<td align="left">NDM2</td>
<td align="center">424</td>
<td align="center">1.1</td>
<td align="center">
<bold>10.1</bold>
</td>
<td align="center">0.0</td>
<td align="center">0.7</td>
<td align="center">1.3</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<underline>0.9</underline>
</td>
<td align="center">
<underline>2.4</underline>
</td>
<td align="center">
<underline>1.2</underline>
</td>
<td align="center">2.5</td>
<td></td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>4.7</underline>
</td>
<td align="center">
<underline>1.2</underline>
</td>
</tr>
<tr>
<td></td>
<td align="left">GAAAGCT</td>
<td align="left">NDM3</td>
<td align="center">215</td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">0.9</td>
<td align="center">0.8</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">1.3</td>
<td align="center">0.1</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.3</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">3.3</td>
<td align="center">
<underline>0.3</underline>
</td>
<td></td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">1.3</td>
</tr>
<tr>
<td></td>
<td align="left">ATCGATA</td>
<td align="left">NDM4</td>
<td align="center">1593</td>
<td align="center">
<bold>
<underline>14.2</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>40.4</underline>
</bold>
</td>
<td align="center">
<underline>1.3</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td align="center">
<bold>
<underline>5.5</underline>
</bold>
</td>
<td align="center">
<bold>6.3</bold>
</td>
<td align="center">1.4</td>
<td align="center">
<bold>84.2</bold>
</td>
<td align="center">0.0</td>
<td align="center">
<underline>2.2</underline>
</td>
<td align="center">
<bold>
<underline>16.8</underline>
</bold>
</td>
<td align="center">
<underline>4.7</underline>
</td>
<td align="center">
<underline>1.1</underline>
</td>
<td></td>
<td align="center">
<bold>13.5</bold>
</td>
</tr>
<tr>
<td></td>
<td align="left">CAGCTSWW</td>
<td align="left">NDM5</td>
<td align="center">1184</td>
<td align="center">
<bold>
<underline>5.4</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>5.6</underline>
</bold>
</td>
<td align="center">
<underline>0.4</underline>
</td>
<td align="center">
<underline>0.0</underline>
</td>
<td align="center">0.2</td>
<td align="center">1.5</td>
<td align="center">
<bold>6.3</bold>
</td>
<td align="center">
<underline>0.1</underline>
</td>
<td align="center">
<underline>0.8</underline>
</td>
<td align="center">0.6</td>
<td align="center">
<underline>1.2</underline>
</td>
<td align="center">
<underline>1.2</underline>
</td>
<td align="center">1.3</td>
<td align="center">
<bold>13.5</bold>
</td>
<td></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The 15 motifs are grouped into three groups, DMp1 to 5, DMv1 to 5, and NDM1 to 5.
<bold>(a) </bold>
The number of promoters that contain two motifs, each that occurs in a peak, was determined. To the left are the 15 motifs followed by the number of their occurrences in the peak.
<bold>(b) </bold>
The frequency of promoters containing one motif also containing a second motif. DMp1 (TATA) for example, is found in 4.7% of all promoters but occurs in 6.5% of promoters that contain DMp2 (INR).
<bold>(c) </bold>
The probability. Throughout all three panels of the table, positive correlations are shown as normal numbers, negative correlations are underlined and if the probability term has a value
<italic>p </italic>
≤ 10
<sup>-5</sup>
, one in 100,000, then the numbers are in bold. For example, INR is found in 1,501 promoters, which is 13.8% of all promoters. However, in the 1,593 DRE promoters, the INR only occurs in 4.2% of them. This observed under-representation or negative correlation has a one in 10
<sup>40 </sup>
probability occurring by chance.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000560  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000560  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021