Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites

Identifieur interne : 000F63 ( Pmc/Curation ); précédent : 000F62; suivant : 000F64

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites

Auteurs : Brian T. Naughton ; Eugene Fratkin ; Serafim Batzoglou ; Douglas L. Brutlag

Source :

RBID : PMC:1635261

Abstract

Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs.


Url:
DOI: 10.1093/nar/gkl585
PubMed: 17041233
PubMed Central: 1635261

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:1635261

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites</title>
<author>
<name sortKey="Naughton, Brian T" sort="Naughton, Brian T" uniqKey="Naughton B" first="Brian T." last="Naughton">Brian T. Naughton</name>
</author>
<author>
<name sortKey="Fratkin, Eugene" sort="Fratkin, Eugene" uniqKey="Fratkin E" first="Eugene" last="Fratkin">Eugene Fratkin</name>
</author>
<author>
<name sortKey="Batzoglou, Serafim" sort="Batzoglou, Serafim" uniqKey="Batzoglou S" first="Serafim" last="Batzoglou">Serafim Batzoglou</name>
</author>
<author>
<name sortKey="Brutlag, Douglas L" sort="Brutlag, Douglas L" uniqKey="Brutlag D" first="Douglas L." last="Brutlag">Douglas L. Brutlag</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17041233</idno>
<idno type="pmc">1635261</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635261</idno>
<idno type="RBID">PMC:1635261</idno>
<idno type="doi">10.1093/nar/gkl585</idno>
<date when="2006">2006</date>
<idno type="wicri:Area/Pmc/Corpus">000F63</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F63</idno>
<idno type="wicri:Area/Pmc/Curation">000F63</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000F63</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites</title>
<author>
<name sortKey="Naughton, Brian T" sort="Naughton, Brian T" uniqKey="Naughton B" first="Brian T." last="Naughton">Brian T. Naughton</name>
</author>
<author>
<name sortKey="Fratkin, Eugene" sort="Fratkin, Eugene" uniqKey="Fratkin E" first="Eugene" last="Fratkin">Eugene Fratkin</name>
</author>
<author>
<name sortKey="Batzoglou, Serafim" sort="Batzoglou, Serafim" uniqKey="Batzoglou S" first="Serafim" last="Batzoglou">Serafim Batzoglou</name>
</author>
<author>
<name sortKey="Brutlag, Douglas L" sort="Brutlag, Douglas L" uniqKey="Brutlag D" first="Douglas L." last="Brutlag">Douglas L. Brutlag</name>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar
<italic>k</italic>
-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate
<italic>k</italic>
-mer is part of a motif or not, we base our decision not on how well the
<italic>k</italic>
-mer conforms to a model of the motif as a whole, but how similar it is to specific, known
<italic>k</italic>
-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">Nucleic Acids Research</journal-id>
<journal-title>Nucleic Acids Research</journal-title>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17041233</article-id>
<article-id pub-id-type="pmc">1635261</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkl585</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computational Biology</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Naughton</surname>
<given-names>Brian T.</given-names>
</name>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fratkin</surname>
<given-names>Eugene</given-names>
</name>
<xref rid="au1" ref-type="aff">1</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Batzoglou</surname>
<given-names>Serafim</given-names>
</name>
<xref rid="au1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Brutlag</surname>
<given-names>Douglas L.</given-names>
</name>
</contrib>
<aff>
<institution>Department of Biochemistry</institution>
<addr-line>CA 94305, USA</addr-line>
</aff>
<aff id="au1">
<sup>1</sup>
<institution>Department of Computer Science, Stanford University</institution>
<addr-line>CA 94305, USA</addr-line>
</aff>
</contrib-group>
<author-notes>
<corresp id="cor1">
<sup>*</sup>
To whom correspondence should be addressed. Tel: 650 723 5976; Fax: 650 723 6783; Email:
<email>briannau@stanford.edu</email>
</corresp>
<fn>
<p>
<sup>*</sup>
Correspondence may also be addressed to Eugene Fratkin. Email:
<email>fratkin@cs.stanford.edu</email>
</p>
</fn>
</author-notes>
<pmc-comment>For NAR: both ppub and collection dates generated for PMC processing 1/27/05 beck</pmc-comment>
<pub-date pub-type="collection">
<month>11</month>
<year>2006</year>
</pub-date>
<pub-date pub-type="ppub">
<month>11</month>
<year>2006</year>
</pub-date>
<pub-date pub-type="epub">
<day>13</day>
<month>11</month>
<year>2006</year>
</pub-date>
<volume>34</volume>
<issue>20</issue>
<fpage>5730</fpage>
<lpage>5739</lpage>
<history>
<date date-type="received">
<day>08</day>
<month>6</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>25</day>
<month>7</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>7</month>
<year>2006</year>
</date>
</history>
<copyright-statement>© 2006 The Author(s)</copyright-statement>
<copyright-year>2006</copyright-year>
<license license-type="openaccess">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
</license>
<abstract>
<p>Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar
<italic>k</italic>
-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate
<italic>k</italic>
-mer is part of a motif or not, we base our decision not on how well the
<italic>k</italic>
-mer conforms to a model of the motif as a whole, but how similar it is to specific, known
<italic>k</italic>
-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs.</p>
</abstract>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F63 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000F63 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:1635261
   |texte=   A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:17041233" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021