Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Learning "graph-mer" motifs that predict gene expression trajectories in development.

Identifieur interne : 001F52 ( PubMed/Corpus ); précédent : 001F51; suivant : 001F53

Learning "graph-mer" motifs that predict gene expression trajectories in development.

Auteurs : Xuejing Li ; Casandra Panea ; Chris H. Wiggins ; Valerie Reinke ; Christina Leslie

Source :

RBID : pubmed:20454681

English descriptors

Abstract

A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.

DOI: 10.1371/journal.pcbi.1000761
PubMed: 20454681

Links to Exploration step

pubmed:20454681

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Learning "graph-mer" motifs that predict gene expression trajectories in development.</title>
<author>
<name sortKey="Li, Xuejing" sort="Li, Xuejing" uniqKey="Li X" first="Xuejing" last="Li">Xuejing Li</name>
<affiliation>
<nlm:affiliation>Department of Physics, Columbia University, New York, New York, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Panea, Casandra" sort="Panea, Casandra" uniqKey="Panea C" first="Casandra" last="Panea">Casandra Panea</name>
</author>
<author>
<name sortKey="Wiggins, Chris H" sort="Wiggins, Chris H" uniqKey="Wiggins C" first="Chris H" last="Wiggins">Chris H. Wiggins</name>
</author>
<author>
<name sortKey="Reinke, Valerie" sort="Reinke, Valerie" uniqKey="Reinke V" first="Valerie" last="Reinke">Valerie Reinke</name>
</author>
<author>
<name sortKey="Leslie, Christina" sort="Leslie, Christina" uniqKey="Leslie C" first="Christina" last="Leslie">Christina Leslie</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="RBID">pubmed:20454681</idno>
<idno type="pmid">20454681</idno>
<idno type="doi">10.1371/journal.pcbi.1000761</idno>
<idno type="wicri:Area/PubMed/Corpus">001F52</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F52</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Learning "graph-mer" motifs that predict gene expression trajectories in development.</title>
<author>
<name sortKey="Li, Xuejing" sort="Li, Xuejing" uniqKey="Li X" first="Xuejing" last="Li">Xuejing Li</name>
<affiliation>
<nlm:affiliation>Department of Physics, Columbia University, New York, New York, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Panea, Casandra" sort="Panea, Casandra" uniqKey="Panea C" first="Casandra" last="Panea">Casandra Panea</name>
</author>
<author>
<name sortKey="Wiggins, Chris H" sort="Wiggins, Chris H" uniqKey="Wiggins C" first="Chris H" last="Wiggins">Chris H. Wiggins</name>
</author>
<author>
<name sortKey="Reinke, Valerie" sort="Reinke, Valerie" uniqKey="Reinke V" first="Valerie" last="Reinke">Valerie Reinke</name>
</author>
<author>
<name sortKey="Leslie, Christina" sort="Leslie, Christina" uniqKey="Leslie C" first="Christina" last="Leslie">Christina Leslie</name>
</author>
</analytic>
<series>
<title level="j">PLoS computational biology</title>
<idno type="eISSN">1553-7358</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Caenorhabditis elegans (genetics)</term>
<term>Gene Expression Profiling (methods)</term>
<term>Least-Squares Analysis</term>
<term>Male</term>
<term>Multigene Family</term>
<term>Multivariate Analysis</term>
<term>Oocytes</term>
<term>Principal Component Analysis</term>
<term>Promoter Regions, Genetic</term>
<term>Regression Analysis</term>
<term>Reproducibility of Results</term>
<term>Spermatozoa</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Caenorhabditis elegans</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Gene Expression Profiling</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Least-Squares Analysis</term>
<term>Male</term>
<term>Multigene Family</term>
<term>Multivariate Analysis</term>
<term>Oocytes</term>
<term>Principal Component Analysis</term>
<term>Promoter Regions, Genetic</term>
<term>Regression Analysis</term>
<term>Reproducibility of Results</term>
<term>Spermatozoa</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">20454681</PMID>
<DateCompleted>
<Year>2010</Year>
<Month>07</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">1553-7358</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>6</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2010</Year>
<Month>Apr</Month>
<Day>29</Day>
</PubDate>
</JournalIssue>
<Title>PLoS computational biology</Title>
<ISOAbbreviation>PLoS Comput. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Learning "graph-mer" motifs that predict gene expression trajectories in development.</ArticleTitle>
<Pagination>
<MedlinePgn>e1000761</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pcbi.1000761</ELocationID>
<Abstract>
<AbstractText>A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Li</LastName>
<ForeName>Xuejing</ForeName>
<Initials>X</Initials>
<AffiliationInfo>
<Affiliation>Department of Physics, Columbia University, New York, New York, United States of America.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Panea</LastName>
<ForeName>Casandra</ForeName>
<Initials>C</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Wiggins</LastName>
<ForeName>Chris H</ForeName>
<Initials>CH</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Reinke</LastName>
<ForeName>Valerie</ForeName>
<Initials>V</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Leslie</LastName>
<ForeName>Christina</ForeName>
<Initials>C</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>U54 CA121852</GrantID>
<Acronym>CA</Acronym>
<Agency>NCI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>U54-CA1218523</GrantID>
<Acronym>CA</Acronym>
<Agency>NCI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2010</Year>
<Month>04</Month>
<Day>29</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>PLoS Comput Biol</MedlineTA>
<NlmUniqueID>101238922</NlmUniqueID>
<ISSNLinking>1553-734X</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017173" MajorTopicYN="N">Caenorhabditis elegans</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020869" MajorTopicYN="N">Gene Expression Profiling</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016018" MajorTopicYN="N">Least-Squares Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D005810" MajorTopicYN="Y">Multigene Family</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015999" MajorTopicYN="N">Multivariate Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D009865" MajorTopicYN="N">Oocytes</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D025341" MajorTopicYN="N">Principal Component Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011401" MajorTopicYN="Y">Promoter Regions, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012044" MajorTopicYN="N">Regression Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015203" MajorTopicYN="N">Reproducibility of Results</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D013094" MajorTopicYN="N">Spermatozoa</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2009</Year>
<Month>06</Month>
<Day>22</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2010</Year>
<Month>03</Month>
<Day>24</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2010</Year>
<Month>5</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2010</Year>
<Month>5</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2010</Year>
<Month>7</Month>
<Day>8</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">20454681</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pcbi.1000761</ArticleId>
<ArticleId IdType="pmc">PMC2861633</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Mol Cells. 1999 Oct 31;9(5):535-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10597043</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2003 Nov;13(11):2498-504</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14597658</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cell. 2004 Apr 16;117(2):185-98</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15084257</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">7584402</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Endocrinol. 1999 May;13(5):774-86</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10319327</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2005 Jan;23(1):137-44</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15637633</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2006;7(5):R36</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16686963</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2006;34(20):5730-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17041233</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Brief Bioinform. 2007 Jan;8(1):32-44</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16772269</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Syst Biol. 2007;3:74</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17224918</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2007;8:35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17270037</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2007 Feb 15;23(4):493-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17138590</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2007 Jun 15;23(12):1486-94</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17463025</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2008 Jan 31;451(7178):535-40</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18172436</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2008 Nov;4(11):e1000224</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19008939</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2009 Jan;5(1):e1000269</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19180174</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Pac Symp Biocomput. 2000;:455-66</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10902193</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2000 Oct 27;290(5492):809-12</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11052945</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Genet. 2001 Feb;27(2):167-71</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11175784</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2002 Dec 5;420(6915):520-62</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12466850</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Genet. 2003 Jun;34(2):166-76</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12740579</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2003 Jan 13;4:2</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12525261</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Development. 2004 Jan;131(2):311-23</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14668411</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F52 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001F52 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:20454681
   |texte=   Learning "graph-mer" motifs that predict gene expression trajectories in development.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:20454681" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021