MersV1, Pmc, Corpus, bibRecord, 0005459

***** Acces problem to record *****\

Identifieur interne : 0005459 ( Pmc/Corpus ); précédent : 0005458; suivant : 0005460 ***** probable Xml problem with record *****

Links to Exploration step

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A <italic>Caenorhabditis </italic>
motif compendium for studying transcriptional gene regulation</title>
<author><name sortKey="Dieterich, Christoph" sort="Dieterich, Christoph" uniqKey="Dieterich C" first="Christoph" last="Dieterich">Christoph Dieterich</name>
<affiliation><nlm:aff id="I1">Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstraße 35 - 37, Tübingen, Germany</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Sommer, Ralf J" sort="Sommer, Ralf J" uniqKey="Sommer R" first="Ralf J" last="Sommer">Ralf J. Sommer</name>
<affiliation><nlm:aff id="I1">Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstraße 35 - 37, Tübingen, Germany</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">18215260</idno>
<idno type="pmc">2248174</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2248174</idno>
<idno type="RBID">PMC:2248174</idno>
<idno type="doi">10.1186/1471-2164-9-30</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000545</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000545</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A <italic>Caenorhabditis </italic>
motif compendium for studying transcriptional gene regulation</title>
<author><name sortKey="Dieterich, Christoph" sort="Dieterich, Christoph" uniqKey="Dieterich C" first="Christoph" last="Dieterich">Christoph Dieterich</name>
<affiliation><nlm:aff id="I1">Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstraße 35 - 37, Tübingen, Germany</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Sommer, Ralf J" sort="Sommer, Ralf J" uniqKey="Sommer R" first="Ralf J" last="Sommer">Ralf J. Sommer</name>
<affiliation><nlm:aff id="I1">Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstraße 35 - 37, Tübingen, Germany</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint><date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Controlling gene expression is fundamental to biological complexity. The nematode <italic>Caenorhabditis elegans </italic>
is an important model for studying principles of gene regulation in multi-cellular organisms. A comprehensive parts list of putative regulatory motifs was yet missing for this model system. In this study, we compile a set of putative regulatory motifs by combining evidence from conservation and expression data.</p>
</sec>
<sec><title>Description</title>
<p>We present an unbiased comparative approach to a regulatory motif compendium for <italic>Caenorhabditis </italic>
species. This involves the assembly of a new nematode genome, whole genome alignments and assessment of conserved <italic>k-</italic>
mers counts. Candidate motifs are selected from a set of 9,500 randomly picked genes by three different motif discovery strategies. Motif candidates have to pass a conservation enrichment filter. Motif degeneracy and length are optimized. Retained motif descriptions are evaluated by expression data using a non-parametric test, which assesses expression changes due to the presence/absence of individual motifs. Finally, we also provide condition-specific motif ensembles by conditional tree analysis.</p>
</sec>
<sec><title>Conclusion</title>
<p>The nematode genomes align surprisingly well despite high neutral substitution rates. Our pipeline delivers motif sets by three alternative strategies. Each set contains less than 400 motifs, which are significantly conserved and correlated with 214 out of 270 tested gene expression conditions. This motif compendium is an entry point to comprehensive studies on nematode gene regulation. The website: http://corg.eb.tuebingen.mpg.de/CMC has extensive query capabilities, supplements this article and supports the experimental list.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-title>BMC Genomics</journal-title>
<issn pub-type="epub">1471-2164</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">18215260</article-id>
<article-id pub-id-type="pmc">2248174</article-id>
<article-id pub-id-type="publisher-id">1471-2164-9-30</article-id>
<article-id pub-id-type="doi">10.1186/1471-2164-9-30</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Database</subject>
</subj-group>
</article-categories>
<title-group><article-title>A <italic>Caenorhabditis </italic>
motif compendium for studying transcriptional gene regulation</article-title>
</title-group>
<contrib-group><contrib id="A1" corresp="yes" contrib-type="author"><name><surname>Dieterich</surname>
<given-names>Christoph</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>christoph.dieterich@tuebingen.mpg.de</email>
</contrib>
<contrib id="A2" contrib-type="author"><name><surname>Sommer</surname>
<given-names>Ralf J</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>ralf.sommer@tuebingen.mpg.de</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstraße 35 - 37, Tübingen, Germany</aff>
<pub-date pub-type="collection"><year>2008</year>
</pub-date>
<pub-date pub-type="epub"><day>23</day>
<month>1</month>
<year>2008</year>
</pub-date>
<volume>9</volume>
<fpage>30</fpage>
<lpage>30</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2164/9/30"></ext-link>
<history><date date-type="received"><day>6</day>
<month>8</month>
<year>2007</year>
</date>
<date date-type="accepted"><day>23</day>
<month>1</month>
<year>2008</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2008 Dieterich and Sommer; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2008</copyright-year>
<copyright-holder>Dieterich and Sommer; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> 
 Dieterich 
 Christoph 
  
 christoph.dieterich@tuebingen.mpg.de 
  
 A Caenorhabditis motif compendium for studying transcriptional gene regulation 
 2008BMC Genomics 9(1): 30-. (2008)1471-2164(2008)9:1<30>urn:ISSN:1471-2164</pmc-comment>
        </license>
</permissions>
<abstract><sec><title>Background</title>
<p>Controlling gene expression is fundamental to biological complexity. The nematode <italic>Caenorhabditis elegans </italic>
is an important model for studying principles of gene regulation in multi-cellular organisms. A comprehensive parts list of putative regulatory motifs was yet missing for this model system. In this study, we compile a set of putative regulatory motifs by combining evidence from conservation and expression data.</p>
</sec>
<sec><title>Description</title>
<p>We present an unbiased comparative approach to a regulatory motif compendium for <italic>Caenorhabditis </italic>
species. This involves the assembly of a new nematode genome, whole genome alignments and assessment of conserved <italic>k-</italic>
mers counts. Candidate motifs are selected from a set of 9,500 randomly picked genes by three different motif discovery strategies. Motif candidates have to pass a conservation enrichment filter. Motif degeneracy and length are optimized. Retained motif descriptions are evaluated by expression data using a non-parametric test, which assesses expression changes due to the presence/absence of individual motifs. Finally, we also provide condition-specific motif ensembles by conditional tree analysis.</p>
</sec>
<sec><title>Conclusion</title>
<p>The nematode genomes align surprisingly well despite high neutral substitution rates. Our pipeline delivers motif sets by three alternative strategies. Each set contains less than 400 motifs, which are significantly conserved and correlated with 214 out of 270 tested gene expression conditions. This motif compendium is an entry point to comprehensive studies on nematode gene regulation. The website: http://corg.eb.tuebingen.mpg.de/CMC has extensive query capabilities, supplements this article and supports the experimental list.</p>
</sec>
</abstract>
</article-meta>
</front>
<body><sec><title>Background</title>
<p>The era of whole genome sequencing has boosted functional analysis of eukaryotic genomes. Upon completion of model organism genomes like <italic>Saccharomyces cerevisiae</italic>
, <italic>Caenorhabditis elegans </italic>
and others, comparative sequencing has gradually moved into the sequencing focus. These sequencing efforts delivered and continue to deliver valuable insights into the evolution of function and species.</p>
<p>We are interested in transcriptional gene regulation exerted by genomic sequence and promoter regions in particular. Promoter regions play a crucial role in initiating transcription of a gene. Protein/DNA interactions regulate transcription initiation and confer specificity to this process. For a long time, yeast has been the primary model organism for research on eukaryotic gene regulation. From a bioinformatics perspective, gene regulation is far better understood in yeast than in any other eukaryote (e.g. [<xref ref-type="bibr" rid="B1">1</xref>
]). Here, we consider the case of a multi-cellular organism, <italic>Caenorhabditis elegans</italic>
. In this work, we compile a compendium of putative regulatory upstream elements by using sequence and functional genomics data (see website [<xref ref-type="bibr" rid="B2">2</xref>
]). We define candidate motifs on conserved upstream regions of <italic>C. elegans </italic>
genes as given in Wormbase 140. These candidate motifs are tested for their enrichment in conserved regions. This approach was previously pioneered for mammalian genomes [<xref ref-type="bibr" rid="B3">3</xref>
] and yeast genomes ([<xref ref-type="bibr" rid="B4">4</xref>
] and [<xref ref-type="bibr" rid="B5">5</xref>
]). Subsequently, motifs are optimized with respect to length and specificity. Finally, motif candidates are evaluated based on the impact of motif's presence/absence pattern on gene expression as defined by experimental evidence (microarray data). The discriminative power of motif combinations is assessed with conditional trees.</p>
<sec><title>Species selection</title>
<p><italic>Caenorhabditis elegans </italic>
is a prime candidate for addressing questions of gene regulation in a multi-cellular setting. Most notably, its fixed cell lineage and thus defined number of cells render experiments comparable to the single cell level.</p>
<p>Comparative approaches depend heavily on the available sequence data. Our goal is to create a compendium of short regulatory motifs (6 – 12 mers). This requires multiple alignments of nucleotide sequences. Recently, an initiative to sequence additional nematode genomes has gained momentum [<xref ref-type="bibr" rid="B6">6</xref>
]. Genome sequencing of four species of the <italic>Caenorhabditis </italic>
clade [<xref ref-type="bibr" rid="B7">7</xref>
] (see Figure <xref ref-type="fig" rid="F1">1</xref>
) is either completed (<italic>Caenorhabditis elegans </italic>
and <italic>Caenorhabditis briggsae</italic>
) or at an advanced stage (<italic>Caenorhabditis remanei </italic>
and <italic>Caenorhabditis brenneri</italic>
). We built our own assembly of the <italic>Caenorhabditis remanei </italic>
and <italic>Caenorhabditis brenneri </italic>
genome given the sufficient genome coverage (> 8-fold) of the ongoing sequencing projects.</p>
<fig position="float" id="F1"><label>Figure 1</label>
<caption><p><bold>Slanted cladogram of five <italic>Caenorhabditis </italic>
species represented by living strains and corresponding whole genome projects</bold>
. The four top species form the <italic>Elegans </italic>
group, which we consider in our analysis. This figure is adapted from [28].</p>
</caption>
<graphic xlink:href="1471-2164-9-30-1"></graphic>
</fig>
<p>To assess the suitability of the aforementioned species for phylogenetic footprinting, we estimated the neutral background substitution rate (<italic>K</italic>
<sub><italic>s</italic>
</sub>
) from synonymous substitutions in a multiple alignment of the RNAP2 gene (<italic>ama-1</italic>
) [<xref ref-type="bibr" rid="B7">7</xref>
]. Estimated values are 1.5029 for <italic>C.elegans – C.remanei</italic>
, 1.7964 for <italic>C. elegans – C. brenneri </italic>
and 2.2239 for <italic>C.elegans – C.briggsae </italic>
using codeml [<xref ref-type="bibr" rid="B8">8</xref>
]. Stein et al. [<xref ref-type="bibr" rid="B9">9</xref>
] report similar values for the whole proteome comparison of <italic>C.elegans – C.briggsae</italic>
. The molecular phylogeny based on a nucleotide sequence alignment of RNAP2 genes (<italic>ama-1</italic>
) is in agreement with the one published by Kiontke et al. [<xref ref-type="bibr" rid="B7">7</xref>
] (see Figure <xref ref-type="fig" rid="F1">1</xref>
). They additionally used the SSU rRNA, the LSU rRNA as well as parts of the coding regions of <italic>par-6 </italic>
and <italic>pkc-3</italic>
. This phylogeny will guide us in building multiple alignments from pairwise ones. Intriguingly, the four <italic>Caenorhabditis </italic>
genomes align pretty well despite the high estimates of the neutral background substitution rate (see Table <xref ref-type="table" rid="T1">1</xref>
). We first computed pairwise whole genome alignments of <italic>C. elegans </italic>
and the other species. Subsequently, we merged pairwise alignments into a multiple alignment of all four species. Motif candidates are selected from multiple alignments whereas pairwise local alignments are retained for evaluating lineage specific motif abundance, which we will not discuss here. Future considerations will address issues like species-specific motifs and phylogenetic profiling of motifs in the satellite species <italic>Pristionchus pacifcus </italic>
and distantly related species such as the human parasites <italic>Brugia malayi </italic>
and <italic>Trichinela spiralis</italic>
.</p>
<table-wrap position="float" id="T1"><label>Table 1</label>
<caption><p>Whole Genome Alignment coverage of the C. elegans genome</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><td align="center">Species pair</td>
<td align="center">Length</td>
<td align="center">Coverage (%)</td>
</tr>
</thead>
<tbody><tr><td align="center">C. elegans – C. brenneri</td>
<td align="center">39,781,786</td>
<td align="center">~ 40%</td>
</tr>
<tr><td align="center">C. elegans – C. remanei</td>
<td align="center">40,670,546</td>
<td align="center">~ 41%</td>
</tr>
<tr><td align="center">C. elegans – C. briggsae</td>
<td align="center">26,918,113</td>
<td align="center">~ 27%</td>
</tr>
<tr><td align="center">H. sapiens – M. musculus</td>
<td align="center">-</td>
<td align="center">~ 39% [14]</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec><title>Construction and content</title>
<sec><title>Genome assembly of <italic>Caenorhabditis remanei </italic>
and <italic>Caenorhabditis brenneri</italic>
</title>
<p>We downloaded a recent snapshot of the ongoing sequencing efforts from the NCBI trace archive [<xref ref-type="bibr" rid="B10">10</xref>
]. We used the PCAP-REP assembler [<xref ref-type="bibr" rid="B11">11</xref>
] to obtain a draft assembly for whole-genome alignment. Key features of the assemblies are median contig sizes of 17, 658 bp for <italic>C. remanei </italic>
and 11, 912 bp for <italic>C. brenneri </italic>
and median supercontig sizes of 202, 125 bp for <italic>C. remanei </italic>
and 63, 873 bp for <italic>C. brenneri</italic>
. Additional details are part of the Supplementary Materials. The preliminary assemblies were not manually refined and directly submitted to the following genome alignment step. The genome assemblies of <italic>C. elegans </italic>
and <italic>C. briggsae </italic>
were obtained from [<xref ref-type="bibr" rid="B12">12</xref>
].</p>
</sec>
<sec><title>Whole Genome Alignments</title>
<p>Pairwise comparisons of <italic>C.elegans – C.briggsae </italic>
have been previously used for phylogenetic footprinting [<xref ref-type="bibr" rid="B13">13</xref>
]. The two additional <italic>Caenorhabditis </italic>
species are framed by this species pair in the molecular phylogeny we use (Figure <xref ref-type="fig" rid="F1">1</xref>
). The whole set of four nematode genomes is consequently in an ideal range of sequence divergence for phylogenetic footprinting. This assumption is further supported by analyzing the alignments (see below).</p>
<p>We computed pairwise whole genome alignments of the <italic>C. elegans </italic>
reference genome to the 3 other genomes. Pairwise whole genome alignments were computed using blastz [<xref ref-type="bibr" rid="B14">14</xref>
] with default parameters except Y = 3400 and H = 2000. Multiple whole genome alignments were progressively built from pairwise alignments with multiz [<xref ref-type="bibr" rid="B15">15</xref>
]: Sequences of <italic>C. brenneri</italic>
, <italic>C. remanei </italic>
and <italic>C.briggsae </italic>
were merged to the <italic>C.elegans </italic>
reference sequence in this order. Pairwise alignment coverage relative to <italic>C. elegans </italic>
is given in Table <xref ref-type="table" rid="T1">1</xref>
. Alignment coverage of the <italic>C. brenneri </italic>
or <italic>C. remanei </italic>
to <italic>C. elegans </italic>
is at a similar level as man-mouse comparisons.</p>
<p><italic>C.elegans </italic>
gene annotations from Wormbase release 140 [<xref ref-type="bibr" rid="B16">16</xref>
] were projected onto the whole genome alignment to define upstream regions. Upstream sequences extend maximally over a range of 2 kb. If curated exonic sequence falls into that region, sequences are trimmed accordingly.</p>
</sec>
<sec><title>Compilation of a motif compendium</title>
<p>We define motifs as strings composed of nucleotide IUPAC (International Union of Pure and Applied Chemistry) symbols, which contains atomic nucleotide symbols and redundant symbols.</p>
<p>To account for possible biases in motif discovery approaches, candidate motifs lists were generated from a set of 9,500 randomly selected upstream regions (almost 50% of all protein coding genes) with three different strategies (see Figure <xref ref-type="fig" rid="F2">2</xref>
):</p>
<fig position="float" id="F2"><label>Figure 2</label>
<caption><p><bold>Motif candidate compilation</bold>
. We employ three different strategies to extract motif candidates from genome sequences. A: Local alignments of 4 species are translated into IUPAC symbols. Only ungapped motifs (in capital letters) are collected with a sliding window approach. B: All subsequences that are covered by local alignments are collected and GEMODA is run on this file. C: FootPrinter is run on upstream regions where the gene start (first exon) is conserved in all four species.</p>
</caption>
<graphic xlink:href="1471-2164-9-30-2"></graphic>
</fig>
<sec><title>Strategy 1 – Kmers from 4-species local alignments</title>
<p>We collected all multiple alignments that contained at least four species and translated them into single IUPAC sequence representations using the alphabet ∑<sub><italic>DNA' </italic>
</sub>
= {<italic>A</italic>
, <italic>C</italic>
, <italic>G</italic>
, <italic>T</italic>
, <italic>N</italic>
} where N is a wildcard character, which represents any of the other characters (see Figure <xref ref-type="fig" rid="F2">2A</xref>
). Alignment columns that contain gaps are translated into lower case letters whereas columns without gaps are translated into upper case letters. We collected all motifs of 6 to 12 base pair length from ungapped (upper case) alignment columns. Each motif could contain maximally two wildcard characters in total. Motif descriptions that start or end with two consecutive wildcard characters were excluded from the candidate set before the expression filtering step.</p>
</sec>
<sec><title>Strategy 2 – Motif discovery in local alignments</title>
<p>Motif candidates were sampled from upstream sequences that are covered by local alignments of at least two species (see Figure <xref ref-type="fig" rid="F2">2B</xref>
). All conserved sequences of an individual sequence regions are subject to a motif discovery step using GEMODA. We used the following program parameters: -m dna_idmat, -l 6, -k 4, -g 5. GEMODA computes short multiple sequence alignments as motif descriptions in three distinct phases: comparison, clustering and convolution. During the comparison phase, short overlapping windows (6 mers) in the dataset are compared. During clustering, these windows are grouped together to form elementary motifs. We used the clique finding option to group motifs. Finally, during convolution, these motifs are stitched together to form maximal motifs. Further details are given in the original publication [<xref ref-type="bibr" rid="B17">17</xref>
]. Motif candidates are retained if they have a P-value of < 0.05, a self-similarity of < 0.5 and a length of ≤ 12.</p>
</sec>
<sec><title>Strategy 3 – FootPrinter</title>
<p>The FootPrinter Motif Discovery software [<xref ref-type="bibr" rid="B18">18</xref>
] does not use alignments as input. Instead, FootPrinter is run on homologous upstream regions. We consider upstream regions as homologous if they have a conserved gene start (first exon) in all four Canorhabditis species. FootPrinter uses a phylogenetic tree to evaluate the parsimony score of each potential motif. We used the tree shown in Figure <xref ref-type="fig" rid="F1">1</xref>
. The Program parameters are set to default values except -sequence_type upstream, -subregion_size 100, -triple_filtering. All reported footprints are extracted per nematode sequence and clustered with GEMODA (same parameters as above) to yield a motif description.</p>
<p>Motif discovery parameters were selected in such a way that known motif description from Wormbook [<xref ref-type="bibr" rid="B19">19</xref>
] meet these criteria.</p>
<p>We only consider motifs from 6 to 12 bp coming from these three discovery pipelines. Strategy 1 uses only multiple alignment across all four species (see Table <xref ref-type="table" rid="T2">2</xref>
 for the sequence space). Strategy 2 uses all available alignment information (pairwise and multiple alignments) whereas strategy 3 does not use any alignment information in the actual motif discovery process. Table <xref ref-type="table" rid="T3">3</xref>
 summarizes the different stages in the motif discovery process for each strategy.</p>
<table-wrap position="float" id="T2"><label>Table 2</label>
<caption><p>Detailed Alignment coverage for the set of 9500 randomly selected genes</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><td align="left">No. Species</td>
<td align="center">No. genes</td>
<td align="center">Length of alignments</td>
</tr>
</thead>
<tbody><tr><td align="left">≥ 2</td>
<td align="center">8,526</td>
<td align="center">5,559,056 bp</td>
</tr>
<tr><td align="left">≥ 3</td>
<td align="center">5,026</td>
<td align="center">1,796,951 bp</td>
</tr>
<tr><td align="left">4</td>
<td align="center">3,361</td>
<td align="center">1,258,422 bp</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3"><label>Table 3</label>
<caption><p>Conserved motif counts and motif processing</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><td align="center" colspan="4">Conserved Motif counts</td>
</tr>
</thead>
<tbody><tr><td align="left">Processing step</td>
<td align="center">Kmer</td>
<td align="center">GEMODA</td>
<td align="center">FootPrinter</td>
</tr>
<tr><td colspan="4"><hr></hr>
</td>
</tr>
<tr><td align="left">Initial candidates</td>
<td align="center">404,546</td>
<td align="center">256,688</td>
<td align="center">41,747</td>
</tr>
<tr><td align="left">Degeneracy optimization</td>
<td align="center">193,491</td>
<td align="center">82,672</td>
<td align="center">24,247</td>
</tr>
<tr><td align="left">Z-score and P-value</td>
<td align="center">4,442</td>
<td align="center">5,477</td>
<td align="center">5,312</td>
</tr>
<tr><td align="left">Expression data filter</td>
<td align="center" colspan="3">Condition dependent (< 1,000)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec><title>Motif conservation enrichment</title>
<p>Each motif library is tested separately for motif specific enrichment in conservation. Genomic upstream sequences from <italic>C.elegans </italic>
constitute the motif background set. We scanned the respective upstream sequence alignments for conserved occurrences of candidate motifs. Alignment columns that contain gaps are not considered.</p>
<p>We employ a Z-score statistic to rank our motifs according to their enrichment in conserved regions.</p>
<p><disp-formula id="bmcM1"><mml:math id="M1" name="1471-2164-9-30-i1" overflow="scroll"><mml:semantics><mml:mrow><mml:mi>Z</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>n</mml:mi>
<mml:msub><mml:mi>p</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mrow><mml:msqrt><mml:mrow><mml:mi>n</mml:mi>
<mml:msub><mml:mi>p</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub><mml:mi>p</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:semantics>
</mml:math>
</disp-formula>
</p>
<p>where <italic>x </italic>
is the number of conserved instances of a motif minus the expected number of conserved instances divided by the standard deviation. The expected number of conserved motifs is the product of the number of occurences in genomic sequence (<italic>n</italic>
) and the probability for a motif of being conserved (<italic>p</italic>
<sub>0</sub>
), which is the ratio of all conserved versus genomic occurences. P-values are computed for an exact test of the simple null hypothesis that <italic>x </italic>
is <italic>B</italic>
(<italic>n</italic>
, <italic>p</italic>
<sub>0</sub>
) distributed. All motifs descriptions with a Z-score > 3 are retaine data 5% FDR level.</p>
<p>We prune the list of motif candidates by removing degenerate motifs based on their Z-score and P-values. This step halves the number of motif candidates (see Table <xref ref-type="table" rid="T3">3</xref>
). An overview of the entire processing pipeline is given in Figure <xref ref-type="fig" rid="F3">3</xref>
.</p>
<fig position="float" id="F3"><label>Figure 3</label>
<caption><p><bold>Overview of motif extraction pipeline</bold>
. Schematic overview of motif processing steps. Gene structure annotations are projected across the whole genome alignments. Motif candidates are identified on a subset of 9,500 randomly picked upstream regions. Degenerate motif descriptions are removed if the set of atomic motifs, which they represent, scores better in terms of conservation enrichment. The greatest reduction in the number of candidate motifs is attained by scoring conservation (Z-Score and P-value filter with a 5% FDR level cutoff). Additionally, larger motifs are removed if smaller substrings (≥ 6 bp) of these motifs score better in terms of conservation. Motif candidates are then evaluated by a non-parametric test, which assesses their influence on gene expression. Finally, conditional trees are employed to select motif ensembles, which possibly have a joint regulatory function.</p>
</caption>
<graphic xlink:href="1471-2164-9-30-3"></graphic>
</fig>
</sec>
<sec><title>Motif length selection</title>
<p>We further reduce our list of motif candidates by selecting for optimal motif length. Briefly, longer possibly degenerate motif descriptions are removed if a substring of the considered motif scores better in terms of Z-score and P-value. This step reduces the number of motif candidates to ~ 5,000 for each pipeline.</p>
</sec>
<sec><title>Motif significance filtering by expression profiles</title>
<p>We used a whole genome set of expression profiles for 270 conditions from Wormbase [<xref ref-type="bibr" rid="B16">16</xref>
] to assess the individual importance of the presence of a motif on gene expression. We use the presence (copy number ≥ 1) or absence of a motif as indicator variable to split gene expression values for a particular condition into two sets.</p>
<p>The two subsets are compared with the non-parametric, two-sample Wilcoxon rank sum test. Here, the null hypothesis states that the two distributions differ by a location shift of zero. We collect all motifs for which we could reject the null hypothesis at a 5% FDR level. The Venn diagram in Figure <xref ref-type="fig" rid="F4">4A</xref>
 summarizes the results for the three different motif discovery pipelines. In total, we could select significant motif candidate sets for 214 expression conditions by combining all three strategies. In essence, all strategies cover a large core set (n = 159) of gene expression conditions. However, a small set of 29 conditions is only covered by one of the three methods.</p>
<fig position="float" id="F4"><label>Figure 4</label>
<caption><p><bold>Motif finder assessment</bold>
. <bold>A: </bold>
We employ three different strategies to extract motif candidates from genome sequences. The statistical significance of a motif's presence has been tested on an expression data set containing 270 conditions. Motif sets have been reported by at least one approach for 214 conditions at a 5% FDR level. The distribution of the significant motif sets from all discovery pipelines is represented by the Venn diagram. <bold>B: </bold>
Pairwise similarity comparison of motif sets from 159 expression conditions that are covered by predictions from all motif discovery pipelines. The scatterplot shows the distribution of 159 condition-specific average similarity values for each pairwise comparison of motif discovery strategies.</p>
</caption>
<graphic xlink:href="1471-2164-9-30-4"></graphic>
</fig>
</sec>
<sec><title>Motif set comparisons</title>
<p>We used an alignment approach to compare the motif descriptions from all three motif discovery pipelines on the large core set of expression conditions. Herein, pairwise motif set comparisons are carried out by alignment. Given two motif sets <italic>A </italic>
= {<italic>a</italic>
<sub>1</sub>
, ..., <italic>a</italic>
<sub><italic>n</italic>
</sub>
} and <italic>B </italic>
= {<italic>b</italic>
<sub>1</sub>
, ..., <italic>b</italic>
<sub><italic>m</italic>
</sub>
}. We select the smaller of the two sets: A if <italic>n </italic>
<<italic>m </italic>
or B else. We take the larger set as database <italic>D </italic>
and perform all pairwise global alignments of the smaller set to <italic>D</italic>
. Global motif alignments are computed with an implementation of the Needleman-Wunsch algorithm (EMBOSS program needle) and an extended DNA scoring scheme (Matrix NUC4.4 from [<xref ref-type="bibr" rid="B20">20</xref>
]). Gap opening penalty is set to -10. Gap extension penalty is set to -0.5. The best matching pairs are retained. We normalize the scores according to this formula:</p>
<p><disp-formula id="bmcM2"><mml:math id="M2" name="1471-2164-9-30-i2" overflow="scroll"><mml:semantics><mml:mrow><mml:mtext>Score</mml:mtext>
<mml:msup><mml:mo>′</mml:mo>
</mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mn>2</mml:mn>
<mml:mo>×</mml:mo>
<mml:mtext>Score</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow><mml:mtext>Score</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mtext>Score</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub><mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:semantics>
</mml:math>
</disp-formula>
</p>
<p>with 1 ≤ <italic>i </italic>
≤ <italic>n </italic>
and 1 ≤ <italic>j </italic>
≤ <italic>m</italic>
. The mean score of the set of best scores is kept for each expression condition. The three-dimensional scatterplot in Figure <xref ref-type="fig" rid="F4">4B</xref>
 shows the distribution of average pairwise similarities of the motif predictions. The pairwise similarity of two condition-specific motif sets is expressed as the average of normalized best alignment scores (see above). Figure <xref ref-type="fig" rid="F4">4B</xref>
 indicates that condition-specific motif sets from different prediction pipelines show high similarities of ≥ 80% on average. In summary, the major share of our motif sets is found by three independent methods.</p>
</sec>
</sec>
<sec><title>Expression signature analysis by conditional trees</title>
<p>Conditional trees [<xref ref-type="bibr" rid="B21">21</xref>
] were used to study the discriminatory power of our motif sets. The objective was to discover presence/absence pattern of several motifs that are significantly correlated with the expression level of a gene set. Significant split points support the hypothesis that a set of particular motifs influences the selected expression condition.</p>
<p>Mining for condition-specific motif patterns is effected with a recursive partitioning strategy. Only motifs that are conserved across all four species are taken into account. In other words, conditional trees estimate a regression relationship by binary recursive partitioning in a conditional inference framework [<xref ref-type="bibr" rid="B21">21</xref>
]. In our case, conditional trees perform a regression over the motif counts as predictor variables.</p>
<p>The algorithm works as follows:</p>
<sec><title>Conditional trees</title>
<p>1. Test the global null hypothesis of independence between any of the input variables and the response (presence or absence of a motif). Stop if this hypothesis cannot be rejected. Otherwise select the input variable with strongest association to the response. This association is measured by a P-value corresponding to a test for the partial null hypothesis of a single input variable and the response.</p>
<p>2. Implement a binary split in the selected input variable.</p>
<p>3. Recursively repeate steps 1) and 2).</p>
<p>We use the R implementation as in the party package (see [<xref ref-type="bibr" rid="B22">22</xref>
] for details).</p>
<p>A high proportion of tested expression conditions (121 for the GEMODA strategy, 181 for the FootPrinter strategy and 171 for the Kmers strategy) shows significant associations with upstream motif patterns. All in all, we could assign 191 GEMODA motif descriptions, 255 Kmer motif descriptions and 340 FootPrinter motif descriptions to gene expression conditions by the conditional tree framework.</p>
<p>All conditional trees are deposited as Supplementary Material on [<xref ref-type="bibr" rid="B2">2</xref>
].</p>
</sec>
</sec>
<sec><title>Utility</title>
<p>In our approach, sequence conservation is an indicator of functional relevance as many known examples of functional DNA motifs are under negative selection. This concept is also known as <bold>phylogenetic footprinting </bold>
[<xref ref-type="bibr" rid="B23">23</xref>
] and was successfully applied in the context of motif finding.</p>
<p>A closer look at the <italic>myo-2 </italic>
enhancer, a well studied example of organ- and cell type-specific regulatory elements, demonstrates the utility of this approach. Figure <xref ref-type="fig" rid="F5">5</xref>
 shows a schematic overview of the region in question and the corresponding display in our web service. The <italic>myo-2 </italic>
enhancer is located ~ 300 bp upstream of the gene start. Transcriptional activity of <italic>myo-2 </italic>
heavily depends on two elements B and C [<xref ref-type="bibr" rid="B24">24</xref>
]. Okkema and Fire could pinpoint cell-specific and organ-specific activity to subelements (B207, C181 and C183) all of which are located in a small region of perfect sequence similarity among all four species. Nucleotide level views of multiple whole genome alignments of all four <italic>Caenorhabditis </italic>
genomes are available via our accompanying web resource [<xref ref-type="bibr" rid="B2">2</xref>
]. The web interface renders these alignments accessible either by scanning for a particular motif (browse by motif) or by studying a particular genomic loci (browse by gene) as shown with the <italic>myo-2 </italic>
enhancer. A more coarse-grained view on motif occurrences is also provided via a GBrowse interface [<xref ref-type="bibr" rid="B25">25</xref>
].</p>
<fig position="float" id="F5"><label>Figure 5</label>
<caption><p><bold>Alignment of the myo-2 enhancer and corresponding web page view</bold>
. <bold>Left: </bold>
Functional subelements of the <italic>myo-2 </italic>
enhancer are highlighted by yellow boxes. The cell-type-specific subelement B207, which is identical in all species, binds and is activated by the pharyngeal muscle specific NK-2 family homeodomain factor CEH-22 [24] [29]. The organ-specific subelements C181 and C183 bind and are activated by the pan-pharyngeal FoxA family transcription factor PHA-4 [30], which is required for formation of pharyngeal muscle and all other pharyngeal cell types during embryonic development. The C elements are a little less conserved than B207, but the PHA-4 binding site matches the high-affinity consensus sequence TGTTTRC [31]. <bold>Right: </bold>
Web page view of the same genomic region. The high-affinity consensus sequence TGTTTRC for PHA-4 binding is highlighted in red.</p>
</caption>
<graphic xlink:href="1471-2164-9-30-5"></graphic>
</fig>
</sec>
<sec><title>Browse by gene</title>
<p>In this view, multiple alignments of gene loci are shown along with gene structure annotation (exons) and highlighted motif matches (see Figure <xref ref-type="fig" rid="F5">5</xref>
 right). The user is free to scan the genomic region with any motif description as expressed by a IUPAC nucleotide symbol sequence. Surrounding upstream and downstream regions can be considered if desired. A complementary genome browser can be also accessed via the website.</p>
</sec>
<sec><title>Browse by motif</title>
<p>A different access point is provided by scanning the whole data set with a user-provided motif description. The conservation level (conserved/not conserved) and scan region (upstream/intronic) can be selected in advance. Two alternative output options either list each individual motif match or summarize motif matches by gene.</p>
</sec>
</sec>
<sec><title>Discussion</title>
<p>We selected a time-course expression profiling experiment of the transition from the dauer state to the non-dauer state and the expression changes after feeding starved L1 animals [<xref ref-type="bibr" rid="B26">26</xref>
] as an example (see Additional Files <xref ref-type="supplementary-material" rid="S1">1</xref>
, <xref ref-type="supplementary-material" rid="S2">2</xref>
, <xref ref-type="supplementary-material" rid="S3">3</xref>
, <xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<sec><title>Feeding of starved L1 animal</title>
<p>At the initial time point (3 hours after inoculation on OP50, Additional File <xref ref-type="supplementary-material" rid="S1">1</xref>
), all three pipelines report a weakly similar motif as the initial split point:</p>
<p>TANCCN Kmer pipeline (reverse complement)</p>
<p>AATCNAT GEMODA pipeline</p>
<p>ATHAAT FootPrinter pipeline</p>
<p>The motif that is reported by the GEMODA pipeline is apparently the one that defines the gene set with the most pronounced up-regulation in expression (0.234; set size: n = 88). The conditional tree of the Kmer pipeline reports the motif set, which induces the gene set with the most pronounced down-regulation (-0.1; set size: n = 112).</p>
<p>If we consider the gene expression profile at 6 hours after inoculation (Additional File <xref ref-type="supplementary-material" rid="S2">2</xref>
), we first notice the rapid increase of motif candidates that passed the expression significance filter. This increase is conveniently handled by the conditional tree framework, which automatically corrects for multiple testing. All conditional trees pick up motif combinations that are predominantly linked to groups of down-regulated genes.</p>
</sec>
<sec><title>Transition from the dauer state to the non-dauer state</title>
<p>For the initial condition (time point 3 hrs, Additional File <xref ref-type="supplementary-material" rid="S3">3</xref>
), all three motif discovery pipelines report again a similar first split point:</p>
<p>GCNCTN Kmer pipeline (reverse complement)</p>
<p>GYACTT GEMODA pipeline</p>
<p>GCDCTT FootPrinter pipeline</p>
<p>TGCACT. DAF-12</p>
<p>This sequence resembles the binding site description of DAF-12 [<xref ref-type="bibr" rid="B27">27</xref>
], a member of the steroid hormone receptor superfamily that affects dauer formation. The set sizes of up-regulated genes carrying these motifs stay the same at a later time point (6 hours, Additional File <xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<p>The example shows that our motif discovery approach is able to detect known and novel motifs. Hence, we deem it useful for a wide audience of experimentalists.</p>
</sec>
</sec>
<sec><title>Conclusion</title>
<p>We presented an approach to build a motif compendium in <italic>Caenorhabditis </italic>
species. To this end, we have computed pairwise alignments of the <italic>Caenorhabditis elegans </italic>
genome to three closely related nematode genomes (one finished, one in draft assembly and one newly assembled). The degree of conservation is drastically higher than one would expect from the neutral substitution rate.</p>
<p>From these pairwise alignments we build a multiple alignment and generated alternative motif candidate sets by three different motif discovery strategies. All strategies produce largely overlapping motif candidate lists. That is why, we conclude that the actual motif discovery strategy does have a major effect as long as motifs are evaluated by conservation and expression data.</p>
<p>Our web resource serves as a starting point for biologists to study regulatory elements on a gene by gene basis. Likewise, genome-wide screens for putative gene targets of a particular transcription factor as defined by a consensus motif are easily performed.</p>
<p>Given our set of conserved putative regulatory sequences for the <italic>Elegans </italic>
group, it will be exciting to mine for species-specific motif inventions. Phylogenetic profiling on the motif level will be feasible with the advent of more genomes from satellite species (e.g. <italic>Pristionchus pacificus</italic>
) and distantly related species (e.g. <italic>Brugia malayi </italic>
and <italic>Trichinella spiralis</italic>
).</p>
</sec>
<sec><title>Availability and requirements</title>
<p><bold>Project name: </bold>
The <italic>Caenorhabditis </italic>
Motif Compendium;</p>
<p><bold>Project home page: </bold>
<ext-link ext-link-type="uri" xlink:href="http://corg.eb.tuebingen.mpg.de/CMC"></ext-link>
;</p>
<p><bold>Operating system: </bold>
Web service running on Linux;</p>
<p><bold>Programming language: </bold>
Perl and R;</p>
<p><bold>License: </bold>
GNU LGPL;</p>
<p><bold>Any restrictions to use by non-academics: </bold>
There are no restrictions on the web site use by non-academics.</p>
</sec>
<sec><title>Authors' contributions</title>
<p>CD designed the project and carried out all programming and data analysis. RJS provided conceptual support. CD has written the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material"><title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1"><caption><title>Additional File 1</title>
<p><bold>Feeding of starved L1 animals – a time course – time point 3 hr</bold>
. Starved animals were inoculated onto E. coli seeded plates and grown for 3 hours. <bold>Panel A </bold>
shows the conditional tree from the Kmer pipeline. The conditional tree was built from 38 motif candidates. <bold>Panel B </bold>
shows the conditional tree from the GEMODA pipeline. The conditional tree was built from 16 motif candidates. <bold>Panel C </bold>
shows the conditional tree from the FootPrinter pipeline. The conditional tree was built from 24 motif candidates. Vertices show split point numbers, the motif description and the corresponding P-value of the split (Bonferroni corrected). Edges are labeled with the split conditions.</p>
</caption>
<media xlink:href="1471-2164-9-30-S1.PDF" mimetype="text" mime-subtype="plain"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2"><caption><title>Additional File 2</title>
<p><bold>Feeding of starved L1 animals – a time course – time point 6 hr</bold>
. Starved animals were inoculated onto E. coli seeded plates and grown for 6 hours. <bold>Panel A </bold>
shows the conditional tree from the FootPrinter pipeline. <bold>Panel B </bold>
shows the conditional tree from the Kmer pipeline. <bold>Panel C </bold>
shows the conditional tree from the Gemoda pipeline. All conditional trees were built from 1,000 motif candidates. Vertices show split point numbers, the motif description and the corresponding P-value of the split (Bonferroni corrected). Edges are labeled with the split conditions.</p>
</caption>
<media xlink:href="1471-2164-9-30-S2.PDF" mimetype="text" mime-subtype="plain"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3"><caption><title>Additional File 3</title>
<p><bold>Transition from the dauer state to the non-dauer state – a time course – time point 3 hr</bold>
. Dauers were inoculated onto E. coli seeded plates and grown for 3 hours. <bold>Panel A </bold>
shows the conditional tree from the FootPrinter pipeline. <bold>Panel B </bold>
shows the conditional tree from the Kmer pipeline. <bold>Panel C </bold>
shows the conditional tree from the GEMODA pipeline. Vertices show split point numbers, the motif description and the corresponding P-value of the split (Bonferroni corrected). Edges are labeled with the split conditions. Conditional trees were built from motif candidate sets of size 1,000 (A), 856 (B) and 1,000 (C).</p>
</caption>
<media xlink:href="1471-2164-9-30-S3.PDF" mimetype="text" mime-subtype="plain"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4"><caption><title>Additional File 4</title>
<p><bold>Transition from the dauer state to the non-dauer state – a time course – time point 6 hr</bold>
. Dauers were inoculated onto E. coli seeded plates and grown for 6 hours. <bold>Panel A </bold>
shows the conditional tree from the FootPrinter pipeline. <bold>Panel B </bold>
shows the conditional tree from the GEMODA pipeline. <bold>Panel C </bold>
shows the conditional tree from the Kmer pipeline. Vertices show split point numbers, the motif description and the corresponding P-value of the split (Bonferroni corrected). Edges are labeled with the split conditions. Conditional trees were built from motif candidate sets of size 117 (A), 132 (B) and 475 (C). More supplementary data can be retrieved from [<xref ref-type="bibr" rid="B2">2</xref>
].</p>
</caption>
<media xlink:href="1471-2164-9-30-S4.PDF" mimetype="text" mime-subtype="plain"><caption><p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back><ack><sec><title>Acknowledgements</title>
<p>We thank Adrian Streit and Benjamin Schlager for valuable discussions. We also thank Michael Han for sharing a script to display multiple alignments.</p>
</sec>
</ack>
<ref-list><ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beer</surname>
<given-names>MA</given-names>
</name>
<name><surname>Tavazoie</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Predicting gene expression from sequence</article-title>
<source>Cell</source>
<year>2004</year>
<volume>117</volume>
<fpage>185</fpage>
<lpage>198</lpage>
<pub-id pub-id-type="pmid">15084257</pub-id>
<pub-id pub-id-type="doi">10.1016/S0092-8674(04)00304-6</pub-id>
</citation>
</ref>
<ref id="B2"><citation citation-type="other"><article-title>Caenorhabditis motif compendium</article-title>
<ext-link ext-link-type="uri" xlink:href="http://corg.eb.tuebingen.mpg.de/CMC"></ext-link>
</citation>
</ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname>
<given-names>X</given-names>
</name>
<name><surname>Lu</surname>
<given-names>J</given-names>
</name>
<name><surname>Kulbokas</surname>
<given-names>EJ</given-names>
</name>
<name><surname>Golub</surname>
<given-names>TR</given-names>
</name>
<name><surname>Mootha</surname>
<given-names>V</given-names>
</name>
<name><surname>Lindblad-Toh</surname>
<given-names>K</given-names>
</name>
<name><surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name><surname>Kellis</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals</article-title>
<source>Nature</source>
<year>2005</year>
<volume>434</volume>
<fpage>338</fpage>
<lpage>345</lpage>
<pub-id pub-id-type="pmid">15735639</pub-id>
<pub-id pub-id-type="doi">10.1038/nature03441</pub-id>
</citation>
</ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chiang</surname>
<given-names>DY</given-names>
</name>
<name><surname>Moses</surname>
<given-names>AM</given-names>
</name>
<name><surname>Kellis</surname>
<given-names>M</given-names>
</name>
<name><surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name><surname>Eisen</surname>
<given-names>MB</given-names>
</name>
</person-group>
<article-title>Phylogenetically and spatially conserved word pairsassociated with gene-expression changes in yeasts</article-title>
<source>Genome Biol</source>
<year>2003</year>
<volume>4</volume>
<fpage>R43</fpage>
<pub-id pub-id-type="pmid">12844359</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2003-4-7-r43</pub-id>
</citation>
</ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kellis</surname>
<given-names>M</given-names>
</name>
<name><surname>Patterson</surname>
<given-names>N</given-names>
</name>
<name><surname>Endrizzi</surname>
<given-names>M</given-names>
</name>
<name><surname>Birren</surname>
<given-names>B</given-names>
</name>
<name><surname>Lander</surname>
<given-names>ES</given-names>
</name>
</person-group>
<article-title>Sequencing and comparison of yeast species to identify genesand regulatory elements</article-title>
<source>Nature</source>
<year>2003</year>
<volume>423</volume>
<fpage>241</fpage>
<lpage>254</lpage>
<pub-id pub-id-type="pmid">12748633</pub-id>
<pub-id pub-id-type="doi">10.1038/nature01644</pub-id>
</citation>
</ref>
<ref id="B6"><citation citation-type="other"><article-title>Invertebrate Genome Index of the Genome Sequencing Center (Washington University in St. Louis)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://genome.wustl.edu/genome_group.cgi?GROUP=6"></ext-link>
</citation>
</ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiontke</surname>
<given-names>K</given-names>
</name>
<name><surname>Gavin</surname>
<given-names>NP</given-names>
</name>
<name><surname>Raynes</surname>
<given-names>Y</given-names>
</name>
<name><surname>Roehrig</surname>
<given-names>C</given-names>
</name>
<name><surname>Piano</surname>
<given-names>F</given-names>
</name>
<name><surname>Fitch</surname>
<given-names>DHA</given-names>
</name>
</person-group>
<article-title>Caenorhabditis phylogeny predicts convergence ofhermaphroditism and extensive intron loss</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2004</year>
<volume>101</volume>
<fpage>9003</fpage>
<lpage>9008</lpage>
<pub-id pub-id-type="pmid">15184656</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0403094101</pub-id>
</citation>
</ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>PAML: a program package for phylogenetic analysis by maximum likelihood</article-title>
<source>Comput Appl Biosci</source>
<year>1997</year>
<volume>13</volume>
<fpage>555</fpage>
<lpage>556</lpage>
<pub-id pub-id-type="pmid">9367129</pub-id>
</citation>
</ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stein</surname>
<given-names>LD</given-names>
</name>
<name><surname>Bao</surname>
<given-names>Z</given-names>
</name>
<name><surname>Blasiar</surname>
<given-names>D</given-names>
</name>
<name><surname>Blumenthal</surname>
<given-names>T</given-names>
</name>
<name><surname>Brent</surname>
<given-names>MR</given-names>
</name>
<name><surname>Chen</surname>
<given-names>N</given-names>
</name>
<name><surname>Chinwalla</surname>
<given-names>A</given-names>
</name>
<name><surname>Clarke</surname>
<given-names>L</given-names>
</name>
<name><surname>Clee</surname>
<given-names>C</given-names>
</name>
<name><surname>Coghlan</surname>
<given-names>A</given-names>
</name>
<name><surname>Coulson</surname>
<given-names>A</given-names>
</name>
<name><surname>D'Eustachio</surname>
<given-names>P</given-names>
</name>
<name><surname>Fitch</surname>
<given-names>DHA</given-names>
</name>
<name><surname>Fulton</surname>
<given-names>LA</given-names>
</name>
<name><surname>Fulton</surname>
<given-names>RE</given-names>
</name>
<name><surname>Griffiths-Jones</surname>
<given-names>S</given-names>
</name>
<name><surname>Harris</surname>
<given-names>TW</given-names>
</name>
<name><surname>Hillier</surname>
<given-names>LW</given-names>
</name>
<name><surname>Kamath</surname>
<given-names>R</given-names>
</name>
<name><surname>Kuwabara</surname>
<given-names>PE</given-names>
</name>
<name><surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<name><surname>Marra</surname>
<given-names>MA</given-names>
</name>
<name><surname>Miner</surname>
<given-names>TL</given-names>
</name>
<name><surname>Minx</surname>
<given-names>P</given-names>
</name>
<name><surname>Mullikin</surname>
<given-names>JC</given-names>
</name>
<name><surname>Plumb</surname>
<given-names>RW</given-names>
</name>
<name><surname>Rogers</surname>
<given-names>J</given-names>
</name>
<name><surname>Schein</surname>
<given-names>JE</given-names>
</name>
<name><surname>Sohrmann</surname>
<given-names>M</given-names>
</name>
<name><surname>Spieth</surname>
<given-names>J</given-names>
</name>
<name><surname>Stajich</surname>
<given-names>JE</given-names>
</name>
<name><surname>Wei</surname>
<given-names>C</given-names>
</name>
<name><surname>Willey</surname>
<given-names>D</given-names>
</name>
<name><surname>Wilson</surname>
<given-names>RK</given-names>
</name>
<name><surname>Durbin</surname>
<given-names>R</given-names>
</name>
<name><surname>Waterston</surname>
<given-names>RH</given-names>
</name>
</person-group>
<article-title>The genome sequence of Caenorhabditis briggsae: a platformfor comparative genomics</article-title>
<source>PLoS Biol</source>
<year>2003</year>
<volume>1</volume>
<fpage>E45</fpage>
<pub-id pub-id-type="pmid">14624247</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pbio.0000045</pub-id>
</citation>
</ref>
<ref id="B10"><citation citation-type="other"><article-title>NCBI Trace Archive</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/Traces"></ext-link>
</citation>
</ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname>
<given-names>X</given-names>
</name>
<name><surname>Yang</surname>
<given-names>SP</given-names>
</name>
<name><surname>Chinwalla</surname>
<given-names>AT</given-names>
</name>
<name><surname>Hillier</surname>
<given-names>LW</given-names>
</name>
<name><surname>Minx</surname>
<given-names>P</given-names>
</name>
<name><surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<name><surname>Wilson</surname>
<given-names>RK</given-names>
</name>
</person-group>
<article-title>Application of a superword array in genome assembly</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<fpage>201</fpage>
<lpage>205</lpage>
<pub-id pub-id-type="pmid">16397298</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkj419</pub-id>
</citation>
</ref>
<ref id="B12"><citation citation-type="other"><article-title>Wormbase FTP Server</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.wormbase.org"></ext-link>
</citation>
</ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bigelow</surname>
<given-names>HR</given-names>
</name>
<name><surname>Wenick</surname>
<given-names>AS</given-names>
</name>
<name><surname>Wong</surname>
<given-names>A</given-names>
</name>
<name><surname>Hobert</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>CisOrtho: a program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>27</fpage>
<pub-id pub-id-type="pmid">15113408</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2105-5-27</pub-id>
</citation>
</ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name><surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<name><surname>Smit</surname>
<given-names>A</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name><surname>Baertsch</surname>
<given-names>R</given-names>
</name>
<name><surname>Hardison</surname>
<given-names>RC</given-names>
</name>
<name><surname>Haussler</surname>
<given-names>D</given-names>
</name>
<name><surname>Miller</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Human-mouse alignments with BLASTZ</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<fpage>103</fpage>
<lpage>107</lpage>
<pub-id pub-id-type="pmid">12529312</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.809403</pub-id>
</citation>
</ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blanchette</surname>
<given-names>M</given-names>
</name>
<name><surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<name><surname>Riemer</surname>
<given-names>C</given-names>
</name>
<name><surname>Elnitski</surname>
<given-names>L</given-names>
</name>
<name><surname>Smit</surname>
<given-names>AFA</given-names>
</name>
<name><surname>Roskin</surname>
<given-names>KM</given-names>
</name>
<name><surname>Baertsch</surname>
<given-names>R</given-names>
</name>
<name><surname>Rosenbloom</surname>
<given-names>K</given-names>
</name>
<name><surname>Clawson</surname>
<given-names>H</given-names>
</name>
<name><surname>Green</surname>
<given-names>ED</given-names>
</name>
<name><surname>Haussler</surname>
<given-names>D</given-names>
</name>
<name><surname>Miller</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Aligning multiple genomic sequences with the threaded blockset aligner</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<fpage>708</fpage>
<lpage>715</lpage>
<pub-id pub-id-type="pmid">15060014</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.1933104</pub-id>
</citation>
</ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwarz</surname>
<given-names>EM</given-names>
</name>
<name><surname>Antoshechkin</surname>
<given-names>I</given-names>
</name>
<name><surname>Bastiani</surname>
<given-names>C</given-names>
</name>
<name><surname>Bieri</surname>
<given-names>T</given-names>
</name>
<name><surname>Blasiar</surname>
<given-names>D</given-names>
</name>
<name><surname>Canaran</surname>
<given-names>P</given-names>
</name>
<name><surname>Chan</surname>
<given-names>J</given-names>
</name>
<name><surname>Chen</surname>
<given-names>N</given-names>
</name>
<name><surname>Chen</surname>
<given-names>WJ</given-names>
</name>
<name><surname>Davis</surname>
<given-names>P</given-names>
</name>
<name><surname>Fiedler</surname>
<given-names>TJ</given-names>
</name>
<name><surname>Girard</surname>
<given-names>L</given-names>
</name>
<name><surname>Harris</surname>
<given-names>TW</given-names>
</name>
<name><surname>Kenny</surname>
<given-names>EE</given-names>
</name>
<name><surname>Kishore</surname>
<given-names>R</given-names>
</name>
<name><surname>Lawson</surname>
<given-names>D</given-names>
</name>
<name><surname>Lee</surname>
<given-names>R</given-names>
</name>
<name><surname>Mueller</surname>
<given-names>HM</given-names>
</name>
<name><surname>Nakamura</surname>
<given-names>C</given-names>
</name>
<name><surname>Ozersky</surname>
<given-names>P</given-names>
</name>
<name><surname>Petcherski</surname>
<given-names>A</given-names>
</name>
<name><surname>Rogers</surname>
<given-names>A</given-names>
</name>
<name><surname>Spooner</surname>
<given-names>W</given-names>
</name>
<name><surname>Tuli</surname>
<given-names>MA</given-names>
</name>
<name><surname>Auken</surname>
<given-names>KV</given-names>
</name>
<name><surname>Wang</surname>
<given-names>D</given-names>
</name>
<name><surname>Durbin</surname>
<given-names>R</given-names>
</name>
<name><surname>Spieth</surname>
<given-names>J</given-names>
</name>
<name><surname>Stein</surname>
<given-names>LD</given-names>
</name>
<name><surname>Sternberg</surname>
<given-names>PW</given-names>
</name>
</person-group>
<article-title>WormBase: better software, richer content</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<fpage>D475</fpage>
<lpage>D478</lpage>
<pub-id pub-id-type="pmid">16381915</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkj061</pub-id>
</citation>
</ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jensen</surname>
<given-names>KL</given-names>
</name>
<name><surname>Styczynski</surname>
<given-names>MP</given-names>
</name>
<name><surname>Rigoutsos</surname>
<given-names>I</given-names>
</name>
<name><surname>Stephanopoulos</surname>
<given-names>GN</given-names>
</name>
</person-group>
<article-title>A generic motif discovery algorithm for sequential data</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>21</fpage>
<lpage>28</lpage>
<pub-id pub-id-type="pmid">16257985</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti745</pub-id>
</citation>
</ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blanchette</surname>
<given-names>M</given-names>
</name>
<name><surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>FootPrinter: A program designed for phylogenetic footprinting</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>3840</fpage>
<lpage>3842</lpage>
<pub-id pub-id-type="pmid">12824433</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkg606</pub-id>
</citation>
</ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okkema</surname>
<given-names>PG</given-names>
</name>
<name><surname>Krause</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Transcriptional regulation</article-title>
<source>WormBook</source>
<year>2005</year>
<fpage>1</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="pmid">18050428</pub-id>
</citation>
</ref>
<ref id="B20"><citation citation-type="other"><article-title>Blast scoring matrices</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/blast/matrices/"></ext-link>
</citation>
</ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hothorn</surname>
<given-names>T</given-names>
</name>
<name><surname>Hornik</surname>
<given-names>K</given-names>
</name>
<name><surname>Zeileis</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Unbiased Recursive Partitioning: A Conditional Inference Framework</article-title>
<source>Journal of Computational and Graphical Statistics</source>
<year>2006</year>
<volume>15</volume>
<fpage>651</fpage>
<lpage>674</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.ingentaconnect.com/content/asa/jcgs/2006/00000015/00000003/art00009"></ext-link>
<pub-id pub-id-type="doi">10.1198/106186006X133933</pub-id>
</citation>
</ref>
<ref id="B22"><citation citation-type="other"><article-title>The Comprehensive R Archive Network</article-title>
<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org"></ext-link>
</citation>
</ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sauer</surname>
<given-names>T</given-names>
</name>
<name><surname>Shelest</surname>
<given-names>E</given-names>
</name>
<name><surname>Wingender</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Evaluating phylogeneticfootprinting for human-rodent comparisons</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>430</fpage>
<lpage>437</lpage>
<pub-id pub-id-type="pmid">16332706</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti819</pub-id>
</citation>
</ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okkema</surname>
<given-names>PG</given-names>
</name>
<name><surname>Fire</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>The Caenorhabditis elegans NK-2 class homeoprotein CEH-22 is involved in combinatorial activation of gene expression in pharyngeal muscle</article-title>
<source>Development</source>
<year>1994</year>
<volume>120</volume>
<fpage>2175</fpage>
<lpage>2186</lpage>
<pub-id pub-id-type="pmid">7925019</pub-id>
</citation>
</ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stein</surname>
<given-names>LD</given-names>
</name>
<name><surname>Mungall</surname>
<given-names>C</given-names>
</name>
<name><surname>Shu</surname>
<given-names>S</given-names>
</name>
<name><surname>Caudy</surname>
<given-names>M</given-names>
</name>
<name><surname>Mangone</surname>
<given-names>M</given-names>
</name>
<name><surname>Day</surname>
<given-names>A</given-names>
</name>
<name><surname>Nickerson</surname>
<given-names>E</given-names>
</name>
<name><surname>Stajich</surname>
<given-names>JE</given-names>
</name>
<name><surname>Harris</surname>
<given-names>TW</given-names>
</name>
<name><surname>Arva</surname>
<given-names>A</given-names>
</name>
<name><surname>Lewis</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>The generic genomebrowser: a building block for a model organism system database</article-title>
<source>Genome Res</source>
<year>2002</year>
<volume>12</volume>
<fpage>1599</fpage>
<lpage>1610</lpage>
<pub-id pub-id-type="pmid">12368253</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.403602</pub-id>
</citation>
</ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname>
<given-names>J</given-names>
</name>
<name><surname>Kim</surname>
<given-names>SK</given-names>
</name>
</person-group>
<article-title>Global analysis of dauer gene expression in Caenorhabditis elegans</article-title>
<source>Development</source>
<year>2003</year>
<volume>130</volume>
<fpage>1621</fpage>
<lpage>1634</lpage>
<pub-id pub-id-type="pmid">12620986</pub-id>
<pub-id pub-id-type="doi">10.1242/dev.00363</pub-id>
</citation>
</ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shostak</surname>
<given-names>Y</given-names>
</name>
<name><surname>Gilst</surname>
<given-names>MRV</given-names>
</name>
<name><surname>Antebi</surname>
<given-names>A</given-names>
</name>
<name><surname>Yamamoto</surname>
<given-names>KR</given-names>
</name>
</person-group>
<article-title>Identification of C. elegans DAF-12-binding sites, response elements, and target genes</article-title>
<source>Genes Dev</source>
<year>2004</year>
<volume>18</volume>
<fpage>2529</fpage>
<lpage>2544</lpage>
<pub-id pub-id-type="pmid">15489294</pub-id>
<pub-id pub-id-type="doi">10.1101/gad.1218504</pub-id>
</citation>
</ref>
<ref id="B28"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Kiontke</surname>
<given-names>K</given-names>
</name>
<name><surname>Fitch</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>The Phylogenetic relationships ofCaenorhabditis and other rhabditids</article-title>
<source>WormBook, ed The Celegans Research Community, WormBook</source>
<year>2005</year>
<ext-link ext-link-type="uri" xlink:href="http://www.wormbook.org/"></ext-link>
</citation>
</ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okkema</surname>
<given-names>PG</given-names>
</name>
<name><surname>Ha</surname>
<given-names>E</given-names>
</name>
<name><surname>Haun</surname>
<given-names>C</given-names>
</name>
<name><surname>Chen</surname>
<given-names>W</given-names>
</name>
<name><surname>Fire</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>The Caenorhabditis elegans NK-2 homeobox gene ceh-22 activates pharyngeal muscle gene expression in combination with pha-1 and is required for normal pharyngeal development</article-title>
<source>Development</source>
<year>1997</year>
<volume>124</volume>
<fpage>3965</fpage>
<lpage>3973</lpage>
<pub-id pub-id-type="pmid">9374394</pub-id>
</citation>
</ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalb</surname>
<given-names>JM</given-names>
</name>
<name><surname>Lau</surname>
<given-names>KK</given-names>
</name>
<name><surname>Goszczynski</surname>
<given-names>B</given-names>
</name>
<name><surname>Fukushige</surname>
<given-names>T</given-names>
</name>
<name><surname>Moons</surname>
<given-names>D</given-names>
</name>
<name><surname>Okkema</surname>
<given-names>PG</given-names>
</name>
<name><surname>McGhee</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>pha-4 is Ce-fkh-1, a fork head/HNF-3alpha, beta, gamma homolog that functions in organogenesis of the C. elegans pharynx</article-title>
<source>Development</source>
<year>1998</year>
<volume>125</volume>
<fpage>2171</fpage>
<lpage>2180</lpage>
<pub-id pub-id-type="pmid">9584117</pub-id>
</citation>
</ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gaudet</surname>
<given-names>J</given-names>
</name>
<name><surname>Mango</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>Regulation of organogenesis by the Caenorhabditis elegans FoxA protein PHA-4</article-title>
<source>Science</source>
<year>2002</year>
<volume>295</volume>
<fpage>821</fpage>
<lpage>825</lpage>
<pub-id pub-id-type="pmid">11823633</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1065175</pub-id>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0005459 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0005459 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri