Clustering of DNA sequences in human promoters.
Identifieur interne : 003075 ( Main/Exploration ); précédent : 003074; suivant : 003076Clustering of DNA sequences in human promoters.
Auteurs : Peter C. Fitzgerald [États-Unis] ; Andrey Shlyakhtenko ; Alain A. Mir ; Charles VinsonSource :
- Genome research [ 1088-9051 ] ; 2004.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
Abstract
We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.
DOI: 10.1101/gr.1953904
PubMed: 15256515
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 002381
- to stream PubMed, to step Curation: 002381
- to stream PubMed, to step Checkpoint: 002280
- to stream Ncbi, to step Merge: 000290
- to stream Ncbi, to step Curation: 000290
- to stream Ncbi, to step Checkpoint: 000290
- to stream Main, to step Merge: 003107
- to stream Main, to step Curation: 003075
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Clustering of DNA sequences in human promoters.</title>
<author><name sortKey="Fitzgerald, Peter C" sort="Fitzgerald, Peter C" uniqKey="Fitzgerald P" first="Peter C" last="Fitzgerald">Peter C. Fitzgerald</name>
<affiliation wicri:level="1"><nlm:affiliation>Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892</wicri:regionArea>
<wicri:noRegion>Maryland 20892</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Shlyakhtenko, Andrey" sort="Shlyakhtenko, Andrey" uniqKey="Shlyakhtenko A" first="Andrey" last="Shlyakhtenko">Andrey Shlyakhtenko</name>
</author>
<author><name sortKey="Mir, Alain A" sort="Mir, Alain A" uniqKey="Mir A" first="Alain A" last="Mir">Alain A. Mir</name>
</author>
<author><name sortKey="Vinson, Charles" sort="Vinson, Charles" uniqKey="Vinson C" first="Charles" last="Vinson">Charles Vinson</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2004">2004</date>
<idno type="RBID">pubmed:15256515</idno>
<idno type="pmid">15256515</idno>
<idno type="doi">10.1101/gr.1953904</idno>
<idno type="wicri:Area/PubMed/Corpus">002381</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002381</idno>
<idno type="wicri:Area/PubMed/Curation">002381</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002381</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002280</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002280</idno>
<idno type="wicri:Area/Ncbi/Merge">000290</idno>
<idno type="wicri:Area/Ncbi/Curation">000290</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000290</idno>
<idno type="wicri:doubleKey">1088-9051:2004:Fitzgerald P:clustering:of:dna</idno>
<idno type="wicri:Area/Main/Merge">003107</idno>
<idno type="wicri:Area/Main/Curation">003075</idno>
<idno type="wicri:Area/Main/Exploration">003075</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Clustering of DNA sequences in human promoters.</title>
<author><name sortKey="Fitzgerald, Peter C" sort="Fitzgerald, Peter C" uniqKey="Fitzgerald P" first="Peter C" last="Fitzgerald">Peter C. Fitzgerald</name>
<affiliation wicri:level="1"><nlm:affiliation>Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Genome Analysis Unit, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892</wicri:regionArea>
<wicri:noRegion>Maryland 20892</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Shlyakhtenko, Andrey" sort="Shlyakhtenko, Andrey" uniqKey="Shlyakhtenko A" first="Andrey" last="Shlyakhtenko">Andrey Shlyakhtenko</name>
</author>
<author><name sortKey="Mir, Alain A" sort="Mir, Alain A" uniqKey="Mir A" first="Alain A" last="Mir">Alain A. Mir</name>
</author>
<author><name sortKey="Vinson, Charles" sort="Vinson, Charles" uniqKey="Vinson C" first="Charles" last="Vinson">Charles Vinson</name>
</author>
</analytic>
<series><title level="j">Genome research</title>
<idno type="ISSN">1088-9051</idno>
<imprint><date when="2004" type="published">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>Computational Biology (methods)</term>
<term>Consensus Sequence</term>
<term>Humans</term>
<term>Models, Genetic</term>
<term>Molecular Sequence Data</term>
<term>Promoter Regions, Genetic</term>
<term>Transcription Initiation Site</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Analyse de regroupements</term>
<term>Biologie informatique ()</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Modèles génétiques</term>
<term>Régions promotrices (génétique)</term>
<term>Site d'initiation de la transcription</term>
<term>Séquence consensus</term>
<term>Séquence nucléotidique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>Consensus Sequence</term>
<term>Humans</term>
<term>Models, Genetic</term>
<term>Molecular Sequence Data</term>
<term>Promoter Regions, Genetic</term>
<term>Transcription Initiation Site</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Analyse de regroupements</term>
<term>Biologie informatique</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Modèles génétiques</term>
<term>Régions promotrices (génétique)</term>
<term>Site d'initiation de la transcription</term>
<term>Séquence consensus</term>
<term>Séquence nucléotidique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We have determined the distribution of each of the 65,536 DNA sequences that are eight bases long (8-mer) in a set of 13,010 human genomic promoter sequences aligned relative to the putative transcription start site (TSS). A limited number of 8-mers have peaks in their distribution (cluster), and most cluster within 100 bp of the TSS. The 156 DNA sequences exhibiting the greatest statistically significant clustering near the TSS can be placed into nine groups of related sequences. Each group is defined by a consensus sequence, and seven of these consensus sequences are known binding sites for the transcription factors (TFs) SP1, NF-Y, ETS, CREB, TBP, USF, and NRF-1. One sequence, which we named Clus1, is not a known TF binding site. The ninth sequence group is composed of the strand-specific Kozak sequence that clusters downstream of the TSS. An examination of the co-occurrence of these TF consensus sequences indicates a positive correlation for most of them except for sequences bound by TBP (the TATA box). Human mRNA expression data from 29 tissues indicate that the ETS, NRF-1, and Clus1 sequences that cluster are predominantly found in the promoters of housekeeping genes (e.g., ribosomal genes). In contrast, TATA is more abundant in the promoters of tissue-specific genes. This analysis identified eight DNA sequences in 5082 promoters that we suggest are important for regulating gene expression.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
</list>
<tree><noCountry><name sortKey="Mir, Alain A" sort="Mir, Alain A" uniqKey="Mir A" first="Alain A" last="Mir">Alain A. Mir</name>
<name sortKey="Shlyakhtenko, Andrey" sort="Shlyakhtenko, Andrey" uniqKey="Shlyakhtenko A" first="Andrey" last="Shlyakhtenko">Andrey Shlyakhtenko</name>
<name sortKey="Vinson, Charles" sort="Vinson, Charles" uniqKey="Vinson C" first="Charles" last="Vinson">Charles Vinson</name>
</noCountry>
<country name="États-Unis"><noRegion><name sortKey="Fitzgerald, Peter C" sort="Fitzgerald, Peter C" uniqKey="Fitzgerald P" first="Peter C" last="Fitzgerald">Peter C. Fitzgerald</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003075 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 003075 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:15256515 |texte= Clustering of DNA sequences in human promoters. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:15256515" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |