Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas

Identifieur interne : 000275 ( Pmc/Corpus ); précédent : 000274; suivant : 000276

Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas

Auteurs : Christophe Liseron-Monfils ; Tim Lewis ; Daniel Ashlock ; Paul D. Mcnicholas ; François Fauteux ; Martina Strömvik ; Manish N. Raizada

Source :

RBID : PMC:3658923

Abstract

Background

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize.

Results

A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize.

Conclusions

An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.


Url:
DOI: 10.1186/1471-2229-13-42
PubMed: 23497159
PubMed Central: 3658923

Links to Exploration step

PMC:3658923

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas</title>
<author>
<name sortKey="Liseron Monfils, Christophe" sort="Liseron Monfils, Christophe" uniqKey="Liseron Monfils C" first="Christophe" last="Liseron-Monfils">Christophe Liseron-Monfils</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lewis, Tim" sort="Lewis, Tim" uniqKey="Lewis T" first="Tim" last="Lewis">Tim Lewis</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ashlock, Daniel" sort="Ashlock, Daniel" uniqKey="Ashlock D" first="Daniel" last="Ashlock">Daniel Ashlock</name>
<affiliation>
<nlm:aff id="I2">Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mcnicholas, Paul D" sort="Mcnicholas, Paul D" uniqKey="Mcnicholas P" first="Paul D" last="Mcnicholas">Paul D. Mcnicholas</name>
<affiliation>
<nlm:aff id="I2">Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fauteux, Francois" sort="Fauteux, Francois" uniqKey="Fauteux F" first="François" last="Fauteux">François Fauteux</name>
<affiliation>
<nlm:aff id="I3">Department of Plant Sciences, McGill University, Ste. Anne de Bellevue, QC H9X 3V9, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stromvik, Martina" sort="Stromvik, Martina" uniqKey="Stromvik M" first="Martina" last="Strömvik">Martina Strömvik</name>
<affiliation>
<nlm:aff id="I3">Department of Plant Sciences, McGill University, Ste. Anne de Bellevue, QC H9X 3V9, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Raizada, Manish N" sort="Raizada, Manish N" uniqKey="Raizada M" first="Manish N" last="Raizada">Manish N. Raizada</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23497159</idno>
<idno type="pmc">3658923</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3658923</idno>
<idno type="RBID">PMC:3658923</idno>
<idno type="doi">10.1186/1471-2229-13-42</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000275</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas</title>
<author>
<name sortKey="Liseron Monfils, Christophe" sort="Liseron Monfils, Christophe" uniqKey="Liseron Monfils C" first="Christophe" last="Liseron-Monfils">Christophe Liseron-Monfils</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lewis, Tim" sort="Lewis, Tim" uniqKey="Lewis T" first="Tim" last="Lewis">Tim Lewis</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ashlock, Daniel" sort="Ashlock, Daniel" uniqKey="Ashlock D" first="Daniel" last="Ashlock">Daniel Ashlock</name>
<affiliation>
<nlm:aff id="I2">Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mcnicholas, Paul D" sort="Mcnicholas, Paul D" uniqKey="Mcnicholas P" first="Paul D" last="Mcnicholas">Paul D. Mcnicholas</name>
<affiliation>
<nlm:aff id="I2">Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fauteux, Francois" sort="Fauteux, Francois" uniqKey="Fauteux F" first="François" last="Fauteux">François Fauteux</name>
<affiliation>
<nlm:aff id="I3">Department of Plant Sciences, McGill University, Ste. Anne de Bellevue, QC H9X 3V9, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stromvik, Martina" sort="Stromvik, Martina" uniqKey="Stromvik M" first="Martina" last="Strömvik">Martina Strömvik</name>
<affiliation>
<nlm:aff id="I3">Department of Plant Sciences, McGill University, Ste. Anne de Bellevue, QC H9X 3V9, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Raizada, Manish N" sort="Raizada, Manish N" uniqKey="Raizada M" first="Manish N" last="Raizada">Manish N. Raizada</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Plant Biology</title>
<idno type="eISSN">1471-2229</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The discovery of genetic networks and
<italic>cis</italic>
-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (
<italic>Zea mays</italic>
L.) has facilitated
<italic>in silico</italic>
searches for regulatory motifs. Several algorithms exist to predict
<italic>cis</italic>
-acting elements, but none have been adapted for maize.</p>
</sec>
<sec>
<title>Results</title>
<p>A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at
<ext-link ext-link-type="uri" xlink:href="http://www.promzea.org">http://www.promzea.org</ext-link>
and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated
<italic>in silico</italic>
using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated
<italic>in silico</italic>
by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Vandepoele, K" uniqKey="Vandepoele K">K Vandepoele</name>
</author>
<author>
<name sortKey="Quimbaya, M" uniqKey="Quimbaya M">M Quimbaya</name>
</author>
<author>
<name sortKey="Casneuf, T" uniqKey="Casneuf T">T Casneuf</name>
</author>
<author>
<name sortKey="De Veylder, L" uniqKey="De Veylder L">L De Veylder</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maclean, D" uniqKey="Maclean D">D MacLean</name>
</author>
<author>
<name sortKey="Jerome, C" uniqKey="Jerome C">C Jerome</name>
</author>
<author>
<name sortKey="Brown, A" uniqKey="Brown A">A Brown</name>
</author>
<author>
<name sortKey="Gray, J" uniqKey="Gray J">J Gray</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schnable, Ps" uniqKey="Schnable P">PS Schnable</name>
</author>
<author>
<name sortKey="Ware, D" uniqKey="Ware D">D Ware</name>
</author>
<author>
<name sortKey="Fulton, Rs" uniqKey="Fulton R">RS Fulton</name>
</author>
<author>
<name sortKey="Stein, Jc" uniqKey="Stein J">JC Stein</name>
</author>
<author>
<name sortKey="Wei, F" uniqKey="Wei F">F Wei</name>
</author>
<author>
<name sortKey="Pasternak, S" uniqKey="Pasternak S">S Pasternak</name>
</author>
<author>
<name sortKey="Liang, C" uniqKey="Liang C">C Liang</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Fulton, L" uniqKey="Fulton L">L Fulton</name>
</author>
<author>
<name sortKey="Graves, Ta" uniqKey="Graves T">TA Graves</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yilmaz, A" uniqKey="Yilmaz A">A Yilmaz</name>
</author>
<author>
<name sortKey="Nishiyama, My" uniqKey="Nishiyama M">MY Nishiyama</name>
</author>
<author>
<name sortKey="Fuentes, Bg" uniqKey="Fuentes B">BG Fuentes</name>
</author>
<author>
<name sortKey="Souza, Gm" uniqKey="Souza G">GM Souza</name>
</author>
<author>
<name sortKey="Janies, D" uniqKey="Janies D">D Janies</name>
</author>
<author>
<name sortKey="Gray, J" uniqKey="Gray J">J Gray</name>
</author>
<author>
<name sortKey="Grotewold, E" uniqKey="Grotewold E">E Grotewold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chang, W C" uniqKey="Chang W">W-C Chang</name>
</author>
<author>
<name sortKey="Lee, T Y" uniqKey="Lee T">T-Y Lee</name>
</author>
<author>
<name sortKey="Huang, H D" uniqKey="Huang H">H-D Huang</name>
</author>
<author>
<name sortKey="Huang, H Y" uniqKey="Huang H">H-Y Huang</name>
</author>
<author>
<name sortKey="Pan, R L" uniqKey="Pan R">R-L Pan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Higo, K" uniqKey="Higo K">K Higo</name>
</author>
<author>
<name sortKey="Ugawa, Y" uniqKey="Ugawa Y">Y Ugawa</name>
</author>
<author>
<name sortKey="Iwamoto, M" uniqKey="Iwamoto M">M Iwamoto</name>
</author>
<author>
<name sortKey="Korenaga, T" uniqKey="Korenaga T">T Korenaga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Jin, J" uniqKey="Jin J">J Jin</name>
</author>
<author>
<name sortKey="Tang, L" uniqKey="Tang L">L Tang</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Gu, X" uniqKey="Gu X">X Gu</name>
</author>
<author>
<name sortKey="Gao, G" uniqKey="Gao G">G Gao</name>
</author>
<author>
<name sortKey="Luo, J" uniqKey="Luo J">J Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavesi, G" uniqKey="Pavesi G">G Pavesi</name>
</author>
<author>
<name sortKey="Zambelli, F" uniqKey="Zambelli F">F Zambelli</name>
</author>
<author>
<name sortKey="Pesole, G" uniqKey="Pesole G">G Pesole</name>
</author>
<author>
<name sortKey="Weeder, H" uniqKey="Weeder H">H Weeder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Brutlag, D" uniqKey="Brutlag D">D Brutlag</name>
</author>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lawrence, Ce" uniqKey="Lawrence C">CE Lawrence</name>
</author>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Boguski, Ms" uniqKey="Boguski M">MS Boguski</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
<author>
<name sortKey="Neuwald, Af" uniqKey="Neuwald A">AF Neuwald</name>
</author>
<author>
<name sortKey="Wootton, Jc" uniqKey="Wootton J">JC Wootton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, J" uniqKey="Hu J">J Hu</name>
</author>
<author>
<name sortKey="Yang, Y" uniqKey="Yang Y">Y Yang</name>
</author>
<author>
<name sortKey="Kihara, D" uniqKey="Kihara D">D Kihara</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Che, D" uniqKey="Che D">D Che</name>
</author>
<author>
<name sortKey="Jensen, S" uniqKey="Jensen S">S Jensen</name>
</author>
<author>
<name sortKey="Cai, L" uniqKey="Cai L">L Cai</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wijaya, E" uniqKey="Wijaya E">E Wijaya</name>
</author>
<author>
<name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author>
<name sortKey="Son, Nt" uniqKey="Son N">NT Son</name>
</author>
<author>
<name sortKey="Kanagasabai, R" uniqKey="Kanagasabai R">R Kanagasabai</name>
</author>
<author>
<name sortKey="Sung, W K" uniqKey="Sung W">W-K Sung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sandve, G" uniqKey="Sandve G">G Sandve</name>
</author>
<author>
<name sortKey="Abul, O" uniqKey="Abul O">O Abul</name>
</author>
<author>
<name sortKey="Walseng, V" uniqKey="Walseng V">V Walseng</name>
</author>
<author>
<name sortKey="Drablos, F" uniqKey="Drablos F">F Drablos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dooner, Hk" uniqKey="Dooner H">HK Dooner</name>
</author>
<author>
<name sortKey="Robbins, Tp" uniqKey="Robbins T">TP Robbins</name>
</author>
<author>
<name sortKey="Jorgensen, Ra" uniqKey="Jorgensen R">RA Jorgensen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grotewold, E" uniqKey="Grotewold E">E Grotewold</name>
</author>
<author>
<name sortKey="Drummond, Bj" uniqKey="Drummond B">BJ Drummond</name>
</author>
<author>
<name sortKey="Bowen, B" uniqKey="Bowen B">B Bowen</name>
</author>
<author>
<name sortKey="Peterson, T" uniqKey="Peterson T">T Peterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lesnick, Ml" uniqKey="Lesnick M">ML Lesnick</name>
</author>
<author>
<name sortKey="Chandler, Vl" uniqKey="Chandler V">VL Chandler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tuerck, Ja" uniqKey="Tuerck J">JA Tuerck</name>
</author>
<author>
<name sortKey="Fromm, Me" uniqKey="Fromm M">ME Fromm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grotewold, E" uniqKey="Grotewold E">E Grotewold</name>
</author>
<author>
<name sortKey="Sainz, Mb" uniqKey="Sainz M">MB Sainz</name>
</author>
<author>
<name sortKey="Tagliani, L" uniqKey="Tagliani L">L Tagliani</name>
</author>
<author>
<name sortKey="Hernandez, Jm" uniqKey="Hernandez J">JM Hernandez</name>
</author>
<author>
<name sortKey="Bowen, B" uniqKey="Bowen B">B Bowen</name>
</author>
<author>
<name sortKey="Chandler, Vl" uniqKey="Chandler V">VL Chandler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sainz, Mb" uniqKey="Sainz M">MB Sainz</name>
</author>
<author>
<name sortKey="Grotewold, E" uniqKey="Grotewold E">E Grotewold</name>
</author>
<author>
<name sortKey="Chandler, Vl" uniqKey="Chandler V">VL Chandler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sekhon, Rs" uniqKey="Sekhon R">RS Sekhon</name>
</author>
<author>
<name sortKey="Lin, H" uniqKey="Lin H">H Lin</name>
</author>
<author>
<name sortKey="Childs, Kl" uniqKey="Childs K">KL Childs</name>
</author>
<author>
<name sortKey="Hansey, Cn" uniqKey="Hansey C">CN Hansey</name>
</author>
<author>
<name sortKey="Buell, Cr" uniqKey="Buell C">CR Buell</name>
</author>
<author>
<name sortKey="De Leon, N" uniqKey="De Leon N">N de Leon</name>
</author>
<author>
<name sortKey="Kaeppler, Sm" uniqKey="Kaeppler S">SM Kaeppler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karolchik, D" uniqKey="Karolchik D">D Karolchik</name>
</author>
<author>
<name sortKey="Hinrichs, As" uniqKey="Hinrichs A">AS Hinrichs</name>
</author>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
<author>
<name sortKey="Roskin, Km" uniqKey="Roskin K">KM Roskin</name>
</author>
<author>
<name sortKey="Sugnet, Cw" uniqKey="Sugnet C">CW Sugnet</name>
</author>
<author>
<name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmid, Cd" uniqKey="Schmid C">CD Schmid</name>
</author>
<author>
<name sortKey="Bucher, P" uniqKey="Bucher P">P Bucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
<author>
<name sortKey="Vaughn, M" uniqKey="Vaughn M">M Vaughn</name>
</author>
<author>
<name sortKey="Mckay, S" uniqKey="Mckay S">S McKay</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Stapleton, Ae" uniqKey="Stapleton A">AE Stapleton</name>
</author>
<author>
<name sortKey="Gessler, D" uniqKey="Gessler D">D Gessler</name>
</author>
<author>
<name sortKey="Matasci, N" uniqKey="Matasci N">N Matasci</name>
</author>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L Wang</name>
</author>
<author>
<name sortKey="Hanlon, M" uniqKey="Hanlon M">M Hanlon</name>
</author>
<author>
<name sortKey="Lenards, A" uniqKey="Lenards A">A Lenards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grant, Ce" uniqKey="Grant C">CE Grant</name>
</author>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zambelli, F" uniqKey="Zambelli F">F Zambelli</name>
</author>
<author>
<name sortKey="Pesole, G" uniqKey="Pesole G">G Pesole</name>
</author>
<author>
<name sortKey="Pavesi, G" uniqKey="Pavesi G">G Pavesi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Yu, L" uniqKey="Yu L">L Yu</name>
</author>
<author>
<name sortKey="Chen, Jf" uniqKey="Chen J">JF Chen</name>
</author>
<author>
<name sortKey="Hansen, U" uniqKey="Hansen U">U Hansen</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clarke, Nd" uniqKey="Clarke N">ND Clarke</name>
</author>
<author>
<name sortKey="Granek, Ja" uniqKey="Granek J">JA Granek</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lamesch, P" uniqKey="Lamesch P">P Lamesch</name>
</author>
<author>
<name sortKey="Berardini, Tz" uniqKey="Berardini T">TZ Berardini</name>
</author>
<author>
<name sortKey="Li, D" uniqKey="Li D">D Li</name>
</author>
<author>
<name sortKey="Swarbreck, D" uniqKey="Swarbreck D">D Swarbreck</name>
</author>
<author>
<name sortKey="Wilks, C" uniqKey="Wilks C">C Wilks</name>
</author>
<author>
<name sortKey="Sasidharan, R" uniqKey="Sasidharan R">R Sasidharan</name>
</author>
<author>
<name sortKey="Muller, R" uniqKey="Muller R">R Muller</name>
</author>
<author>
<name sortKey="Dreher, K" uniqKey="Dreher K">K Dreher</name>
</author>
<author>
<name sortKey="Alexander, Dl" uniqKey="Alexander D">DL Alexander</name>
</author>
<author>
<name sortKey="Garcia Hernandez, M" uniqKey="Garcia Hernandez M">M Garcia-Hernandez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Levine, M" uniqKey="Levine M">M Levine</name>
</author>
<author>
<name sortKey="Tjian, R" uniqKey="Tjian R">R Tjian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zheng, Z" uniqKey="Zheng Z">Z Zheng</name>
</author>
<author>
<name sortKey="Kawagoe, Y" uniqKey="Kawagoe Y">Y Kawagoe</name>
</author>
<author>
<name sortKey="Xiao, S" uniqKey="Xiao S">S Xiao</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Okita, T" uniqKey="Okita T">T Okita</name>
</author>
<author>
<name sortKey="Hau, Tl" uniqKey="Hau T">TL Hau</name>
</author>
<author>
<name sortKey="Lin, A" uniqKey="Lin A">A Lin</name>
</author>
<author>
<name sortKey="Murai, N" uniqKey="Murai N">N Murai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crooks, Ge" uniqKey="Crooks G">GE Crooks</name>
</author>
<author>
<name sortKey="Hon, G" uniqKey="Hon G">G Hon</name>
</author>
<author>
<name sortKey="Chandonia, J M" uniqKey="Chandonia J">J-M Chandonia</name>
</author>
<author>
<name sortKey="Brenner, Se" uniqKey="Brenner S">SE Brenner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Watson, Cg" uniqKey="Watson C">CG Watson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahony, S" uniqKey="Mahony S">S Mahony</name>
</author>
<author>
<name sortKey="Benos, Pv" uniqKey="Benos P">PV Benos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kankainen, M" uniqKey="Kankainen M">M Kankainen</name>
</author>
<author>
<name sortKey="Loytynoja, A" uniqKey="Loytynoja A">A Loytynoja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hartmann, U" uniqKey="Hartmann U">U Hartmann</name>
</author>
<author>
<name sortKey="Valentine, Wj" uniqKey="Valentine W">WJ Valentine</name>
</author>
<author>
<name sortKey="Christie, Jm" uniqKey="Christie J">JM Christie</name>
</author>
<author>
<name sortKey="Hays, J" uniqKey="Hays J">J Hays</name>
</author>
<author>
<name sortKey="Jenkins, Gi" uniqKey="Jenkins G">GI Jenkins</name>
</author>
<author>
<name sortKey="Weisshaar, B" uniqKey="Weisshaar B">B Weisshaar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hatton, D" uniqKey="Hatton D">D Hatton</name>
</author>
<author>
<name sortKey="Sablowski, R" uniqKey="Sablowski R">R Sablowski</name>
</author>
<author>
<name sortKey="Yung, Mh" uniqKey="Yung M">MH Yung</name>
</author>
<author>
<name sortKey="Smith, C" uniqKey="Smith C">C Smith</name>
</author>
<author>
<name sortKey="Schuch, W" uniqKey="Schuch W">W Schuch</name>
</author>
<author>
<name sortKey="Bevan, M" uniqKey="Bevan M">M Bevan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lam, E" uniqKey="Lam E">E Lam</name>
</author>
<author>
<name sortKey="Chua, Nh" uniqKey="Chua N">NH Chua</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaubet, N" uniqKey="Chaubet N">N Chaubet</name>
</author>
<author>
<name sortKey="Flenet, M" uniqKey="Flenet M">M Flenet</name>
</author>
<author>
<name sortKey="Clement, B" uniqKey="Clement B">B Clement</name>
</author>
<author>
<name sortKey="Brignon, P" uniqKey="Brignon P">P Brignon</name>
</author>
<author>
<name sortKey="Gigot, C" uniqKey="Gigot C">C Gigot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baucom, Rs" uniqKey="Baucom R">RS Baucom</name>
</author>
<author>
<name sortKey="Estill, Jc" uniqKey="Estill J">JC Estill</name>
</author>
<author>
<name sortKey="Chaparro, C" uniqKey="Chaparro C">C Chaparro</name>
</author>
<author>
<name sortKey="Upshaw, N" uniqKey="Upshaw N">N Upshaw</name>
</author>
<author>
<name sortKey="Jogi, A" uniqKey="Jogi A">A Jogi</name>
</author>
<author>
<name sortKey="Deragon, J M" uniqKey="Deragon J">J-M Deragon</name>
</author>
<author>
<name sortKey="Westerman, Rp" uniqKey="Westerman R">RP Westerman</name>
</author>
<author>
<name sortKey="Sanmiguel, Pj" uniqKey="Sanmiguel P">PJ SanMiguel</name>
</author>
<author>
<name sortKey="Bennetzen, Jl" uniqKey="Bennetzen J">JL Bennetzen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, E Y" uniqKey="Kim E">E-Y Kim</name>
</author>
<author>
<name sortKey="Kim, S Y" uniqKey="Kim S">S-Y Kim</name>
</author>
<author>
<name sortKey="Ashlock, D" uniqKey="Ashlock D">D Ashlock</name>
</author>
<author>
<name sortKey="Nam, D" uniqKey="Nam D">D Nam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcnicholas, Pd" uniqKey="Mcnicholas P">PD McNicholas</name>
</author>
<author>
<name sortKey="Murphy, Tb" uniqKey="Murphy T">TB Murphy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carey, Cc" uniqKey="Carey C">CC Carey</name>
</author>
<author>
<name sortKey="Strahle, Jt" uniqKey="Strahle J">JT Strahle</name>
</author>
<author>
<name sortKey="Selinger, Da" uniqKey="Selinger D">DA Selinger</name>
</author>
<author>
<name sortKey="Chandler, Vl" uniqKey="Chandler V">VL Chandler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bodeau, Jp" uniqKey="Bodeau J">JP Bodeau</name>
</author>
<author>
<name sortKey="Walbot, V" uniqKey="Walbot V">V Walbot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cone, Kc" uniqKey="Cone K">KC Cone</name>
</author>
<author>
<name sortKey="Burr, Fa" uniqKey="Burr F">FA Burr</name>
</author>
<author>
<name sortKey="Burr, B" uniqKey="Burr B">B Burr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="East, Em" uniqKey="East E">EM East</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Styles, Ed" uniqKey="Styles E">ED Styles</name>
</author>
<author>
<name sortKey="Ceska, O" uniqKey="Ceska O">O Ceska</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="product-review" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Plant Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Plant Biol</journal-id>
<journal-title-group>
<journal-title>BMC Plant Biology</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2229</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23497159</article-id>
<article-id pub-id-type="pmc">3658923</article-id>
<article-id pub-id-type="publisher-id">1471-2229-13-42</article-id>
<article-id pub-id-type="doi">10.1186/1471-2229-13-42</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Liseron-Monfils</surname>
<given-names>Christophe</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>cliseron@uoguelph.ca</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Lewis</surname>
<given-names>Tim</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>Timothy.D.F.L@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A3">
<name>
<surname>Ashlock</surname>
<given-names>Daniel</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>dashlock@uoguelph.ca</email>
</contrib>
<contrib contrib-type="author" id="A4">
<name>
<surname>McNicholas</surname>
<given-names>Paul D</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>pmcnicho@uoguelph.ca</email>
</contrib>
<contrib contrib-type="author" id="A5">
<name>
<surname>Fauteux</surname>
<given-names>François</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>francois.fauteux2@mail.mcgill.ca</email>
</contrib>
<contrib contrib-type="author" id="A6">
<name>
<surname>Strömvik</surname>
<given-names>Martina</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>martina.stromvik@mcgill.ca</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A7">
<name>
<surname>Raizada</surname>
<given-names>Manish N</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>raizada@uoguelph.ca</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada</aff>
<aff id="I2">
<label>2</label>
Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada</aff>
<aff id="I3">
<label>3</label>
Department of Plant Sciences, McGill University, Ste. Anne de Bellevue, QC H9X 3V9, Canada</aff>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>15</day>
<month>3</month>
<year>2013</year>
</pub-date>
<volume>13</volume>
<fpage>42</fpage>
<lpage>42</lpage>
<history>
<date date-type="received">
<day>1</day>
<month>4</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>8</day>
<month>3</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2013 Liseron-Monfils et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Liseron-Monfils et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2229/13/42"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>The discovery of genetic networks and
<italic>cis</italic>
-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (
<italic>Zea mays</italic>
L.) has facilitated
<italic>in silico</italic>
searches for regulatory motifs. Several algorithms exist to predict
<italic>cis</italic>
-acting elements, but none have been adapted for maize.</p>
</sec>
<sec>
<title>Results</title>
<p>A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at
<ext-link ext-link-type="uri" xlink:href="http://www.promzea.org">http://www.promzea.org</ext-link>
and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated
<italic>in silico</italic>
using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated
<italic>in silico</italic>
by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Promoter</kwd>
<kwd>
<italic>cis</italic>
-acting</kwd>
<kwd>Motif</kwd>
<kwd>Maize</kwd>
<kwd>Anthocyanin</kwd>
<kwd>Phlobaphene</kwd>
<kwd>Bioprospector</kwd>
<kwd>MEME</kwd>
<kwd>Weeder</kwd>
<kwd>C1</kwd>
<kwd>P</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>A key objective of global gene expression studies is the identification of transcription factors and their DNA binding sites responsible for co-expression of genes. DNA binding sites can be predicted
<italic>in silico</italic>
by searching regulatory regions of co-expressed genes for overrepresented motifs [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. Recently, the genome sequence of maize (
<italic>Zea mays</italic>
L.) was released [
<xref ref-type="bibr" rid="B3">3</xref>
], facilitating searches for
<italic>cis</italic>
-acting motifs in one of the world’s most important crops. Useful motif discovery tools already exist for maize including Grassius [
<xref ref-type="bibr" rid="B4">4</xref>
] and PlantPAN [
<xref ref-type="bibr" rid="B5">5</xref>
], but they retrieve only known, experimentally defined motifs from databases such as PLACE [
<xref ref-type="bibr" rid="B6">6</xref>
] or PlantTFDB [
<xref ref-type="bibr" rid="B7">7</xref>
]. There remains a need for software that predicts
<italic>de novo</italic>
motifs from co-expressed genes in maize including from microarray data.</p>
<p>In general, two major types of algorithms exist to search co-regulated genes for
<italic>de novo</italic>
motifs. The first approach, consensus searching, consists of searching sets of genes for similar sequences. This consensus method limits motif searches to 12 bases in length (because of the calculation time necessary to search longer motifs) and allows for a few substitutions [
<xref ref-type="bibr" rid="B8">8</xref>
]. Weeder [
<xref ref-type="bibr" rid="B8">8</xref>
] is a widely used program that applies consensus-based sampling. The second type of search algorithm is probabilistic and uses a position weight matrix (PWM) to define a motif [
<xref ref-type="bibr" rid="B9">9</xref>
]. In the PWM, the probability of occurrence of each of the four possible nucleotides is calculated for every position within a predicted motif. Motif PWMs are first identified by scanning regulatory sequences for similar motifs. Predicted motifs are reported if the probability of the motif occurrence is statistically non-random compared to the background. Widely used software programs that apply a probabilistic algorithm are BioProspector [
<xref ref-type="bibr" rid="B10">10</xref>
] and MEME (Multiple Expectation-maximization for Motif Elicitation) [
<xref ref-type="bibr" rid="B11">11</xref>
]. These programs employ different statistical approaches. BioProspector uses Gibbs sampling [
<xref ref-type="bibr" rid="B12">12</xref>
] which randomly picks subsequences of a defined length and iteratively searches within input promoters until a high probability match is found, defined as having PWM values that are significantly different from the input background sequences. By contrast, MEME divides sequences into sub-segments, and all sub-segments are systematically processed as a possible motif. The probability that each sub-segment occurs non-randomly within input promoters is calculated based on its PWM values (Expectation, E) which is then refined based on the probability of occurrence of each nucleotide at each position within the sub-segment (Maximization, M). The sub-segment with the highest probability after EM is chosen and modified by iterating the EM algorithm until a candidate motif cannot be improved [
<xref ref-type="bibr" rid="B11">11</xref>
].</p>
<p>The various motif discovery programs have significant limitations. For example, one limit of Gibbs sampling and hence BioProspector [
<xref ref-type="bibr" rid="B10">10</xref>
], is that different motifs are often obtained at each run. In contrast, MEME predictions are consistent [
<xref ref-type="bibr" rid="B11">11</xref>
]. The main problem with all the current motif discovery programs is their low accuracy. The best motif discovery program thus far was shown to be only 17.4% accurate, in
<italic>E.coli</italic>
, with many known motifs being missed [
<xref ref-type="bibr" rid="B13">13</xref>
]. In order to overcome the problem of low prediction accuracy, motif discovery programs have been combined to increase their effectiveness, creating what has been termed an ensemble algorithm [
<xref ref-type="bibr" rid="B13">13</xref>
]. One of the first ensemble algorithms was the BEST program [
<xref ref-type="bibr" rid="B14">14</xref>
] which combined the advantages of three motif discovery programs. Other ensemble tools also exist to define
<italic>de novo</italic>
motifs in Arabidopsis and rice, for example MotifVoter [
<xref ref-type="bibr" rid="B15">15</xref>
] that clusters the best motifs from 10 motif discovery tools. However, most ensemble algorithms are conservative because they report only motifs that are retrieved by more than one of the motif discovery programs [
<xref ref-type="bibr" rid="B15">15</xref>
]. To help researchers evaluate motif discovery programs objectively, benchmark data sets have been created, in which known motifs are embedded into diverse sequences [
<xref ref-type="bibr" rid="B16">16</xref>
]. Each motif discovery program can then be compared based on the rate of true and false predictions.</p>
<p>Ideally, a motif discovery program for maize should be validated by its ability to retrieve transcription factor binding sites that have been experimentally validated. Some of the best studied transcription factor targets in maize are those of C1 and P, transcription factors which upregulate the biosynthetic enzymes responsible for production of the red-purple pigments, anthocyanin and phlobaphene, respectively [
<xref ref-type="bibr" rid="B17">17</xref>
-
<xref ref-type="bibr" rid="B20">20</xref>
]. C1 and P are homologous proteins belonging to the R2R3 Myb family of regulators [
<xref ref-type="bibr" rid="B21">21</xref>
], and they have been shown to interact with identical
<italic>cis</italic>
-acting motifs in the
<italic>A1</italic>
promoter [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
].</p>
<p>In this study, first, a benchmark data set was used to compare and evaluate the accuracy of the three most used motif discovery programs, Weeder, BioProspector and MEME. Improvements were then created to reduce the limitations of each program. These improvements were incorporated into a comprehensive motif discovery pipeline customized for maize called Promzea. Promzea was then validated by asking whether it could retrieve known binding sites of maize C1 and P transcription factors [
<xref ref-type="bibr" rid="B18">18</xref>
-
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
].</p>
<p>Promzea accurately identified these binding sites, in particular those for P, using only a small number of input genes from these pathways. Interestingly, in a genome-wide scan, Promzea retrieved these binding sites in additional genes, including upstream genes that may help to regulate these pathways. Promzea was also tested against the Maize Development Atlas, a tissue-specific microarray dataset resource for maize [
<xref ref-type="bibr" rid="B23">23</xref>
].</p>
</sec>
<sec>
<title>Implementation</title>
<sec>
<title>Overview of Promzea</title>
<p>An online pipeline called Promzea was developed to discover
<italic>de novo cis</italic>
-acting elements in maize (Figure 
<xref ref-type="fig" rid="F1">1</xref>
) using a user-friendly interface created in Perl. Promzea is publicly available at
<ext-link ext-link-type="uri" xlink:href="http://www.promzea.org">http://www.promzea.org</ext-link>
. The tool was subsequently expanded to include rice and Arabidopsis. For rationale and complete methodological details, see Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
. Here only an overview of Promzea is provided, along with key parameters below. Briefly, using the online interface, the user first submits either a list of co-expressed cDNA FASTA sequence files, a microarray probe-set ID (in the case of maize), gene ID list or a BED file [
<xref ref-type="bibr" rid="B24">24</xref>
], for example with chromosome coordinates corresponding to peaks from ChIP-seq experiments [
<xref ref-type="bibr" rid="B25">25</xref>
]. In the case of a cDNA file, the sequences are BLAST searched against the chosen plant genome. A list of corresponding promoters to the user input is retrieved from a maize promoter database (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). A command line version of the program is also available in the Discovery Environment of the iPlant Collaborative [
<xref ref-type="bibr" rid="B26">26</xref>
]; in this version, users can use as input a BED file allowing them to search for motifs within peaks discovered by ChIP-seq or ChIP-chip experiments [
<xref ref-type="bibr" rid="B25">25</xref>
]. The promoter data set is then searched for shared motifs using three motif discovery programs: MEME, BioProspector and Weeder (Table 
<xref ref-type="table" rid="T1">1</xref>
). These motif discovery programs were chosen based on using algorithms that allowed for fast and accurate and/or complimentary searching. The justification for combining multiple motif discovery programs is described in Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
. The motif results are filtered, combined from all three programs, ranked and then displayed for the user along with a ranking score (MNCP, see below; Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). Finally, Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the complete gene list and corresponding gene annotations, along with other forms of validation for the user to analyze (see Generating Promzea, below).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Flow chart of the Promzea motif discovery pipeline.</bold>
Abbreviations: HG, hypergeometric distribution; MNCP, Mean Normalized Conditional Probability score.</p>
</caption>
<graphic xlink:href="1471-2229-13-42-1"></graphic>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Software programs used in Promzea</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Tool</bold>
</th>
<th align="left">
<bold>Description and download site</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="top">MEME
<hr></hr>
</td>
<td align="left" valign="bottom">Multiple EM (Expectation Maximixation) for Motif Elicitation is a probabilistic
<italic>de novo</italic>
motif finding algorithm. It divides sequences into substrings and calculates the probability of each substring being a motif compared to the background. Each motif probability is recalculated during re-running using an expectation-maximisation algorithm. (
<ext-link ext-link-type="uri" xlink:href="http://meme.nbcr.net/downloads/meme_4.6.0.tar.gz">http://meme.nbcr.net/downloads/meme_4.6.0.tar.gz</ext-link>
)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="top">Bioprospector
<hr></hr>
</td>
<td align="left" valign="bottom">Gibbs sampling algorithm. Motif width is user-defined. The sequences are randomly searched to find similar motifs. Newly discovered PWM motifs are scored relative to the background. The operation is repeated until conversion of the results. Results are different at each run. (
<ext-link ext-link-type="uri" xlink:href="http://motif.stanford.edu/distributions/bioprospector">http://motif.stanford.edu/distributions/bioprospector</ext-link>
)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="top">Weeder
<hr></hr>
</td>
<td align="left" valign="bottom">Consensus enumeration program; finds similar consensus sequences in data allowing 1 to 3 mismatches. The search is extended to the adjacent bases of the word to define the final motif. (
<ext-link ext-link-type="uri" xlink:href="http://159.149.160.51/modtools">http://159.149.160.51/modtools</ext-link>
)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="top">PSCAN
<hr></hr>
</td>
<td align="left" valign="bottom">Determines the probability that a defined PWM motif exists in each database sequence relative to its best score. (
<ext-link ext-link-type="uri" xlink:href="http://159.149.160.51/pscan/">http://159.149.160.51/pscan/</ext-link>
)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="top">FIMO
<hr></hr>
</td>
<td align="left" valign="bottom">Finds occurrence of each defined PWM in a sequence database using a p-value calculation relative to the Markov background. (
<ext-link ext-link-type="uri" xlink:href="http://meme.nbcr.net/downloads/meme_4.6.0.tar.gz">http://meme.nbcr.net/downloads/meme_4.6.0.tar.gz</ext-link>
)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="top">Clover</td>
<td align="left">Finds occurrence of each defined PWM in a sequence database using PWM best scores compared to the background. (
<ext-link ext-link-type="uri" xlink:href="http://zlab.bu.edu/clover">http://zlab.bu.edu/clover</ext-link>
)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Parameters of motif discovery programs used in Promzea</title>
<p>MEME was set to search for ten motifs with a maximum length of 10 nucleotides on both DNA strands. BioProspector was set to search for 10-nucleotide long motifs and retain only the first ten motifs found. Weeder was set to search for motifs ranging in length from 6–10 nucleotides (medium option). In addition, FIMO [
<xref ref-type="bibr" rid="B27">27</xref>
], PSCAN [
<xref ref-type="bibr" rid="B28">28</xref>
] and Clover [
<xref ref-type="bibr" rid="B29">29</xref>
] were used to retrieve motifs from the maize genome.</p>
</sec>
<sec>
<title>Defining filters for each standalone program within Promzea using benchmark data sets</title>
<p>As noted above, within Promzea, a custom filter was designed for each of the three motif discovery programs employed; the purpose was to reduce the false discovery ratio (nFDR) while preserving the true positives as measured using the nucleotide Correlation Coefficient (nCC score). Both nFDR and nCC are defined in Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
. The filter parameters were optimized using the Sandve et al. (2007) benchmark data set [
<xref ref-type="bibr" rid="B16">16</xref>
] based on limiting the probability (pB or pH, respectively for Binomial or hypergeometric test p-values - see Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
) that a motif prediction could occur randomly; the best filters were chosen based on their impact on the nFDR and nCC scores. For BioProspector, pB thresholds at 0.3, 0.5 and 0.7 significantly reduced the average nFDR score (from 0.92 with unfiltered motif discovery data to 0.82, 0.86 and 0.86, respectively, Friedman’s test p-value <0.01; Figure 
<xref ref-type="fig" rid="F2">2</xref>
A). Though the average nCC scores between the filtered data were not significantly different from one another, the filter pB = 0.7 was chosen for BioProspector as it caused the least absolute reduction in the nCC score average compared to the unfiltered data (from 0.097 to 0.084; Figure 
<xref ref-type="fig" rid="F2">2</xref>
A). For MEME, a significance level of 0.05 was chosen as it achieved the best balance between a significant reduction in the nFDR average (from 0.96 to 0.85, Friedman’s test p-value < 0.05) and a significant increase in the nCC average (from 0.065 to 0.073, p-value < 0.01; Figure 
<xref ref-type="fig" rid="F2">2</xref>
B). For Weeder, a significance level of 0.3 was selected as it similarly achieved the best balance between a significant reduction in the average nFDR score (from 0.97 to 0.95, p-value < 0.001) and the largest absolute increase in the average nCC score (from 0.054 to 0.071, p-value < 0.001; Figure 
<xref ref-type="fig" rid="F2">2</xref>
C).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Optimization of motif filtering for each standalone motif discovery program.</bold>
The performance of each motif discovery program, applied to the Sandve et al. (2007) benchmark data set, was measured using the nucleotide Correlation Coefficient score mean (nCC, grey bar) and the nucleotide False Discovery Ratio mean (nFDR, black line). Shown is the performance of each original program (unfiltered) and after motif filtering at three probability cut-offs (p) for: (
<bold>A</bold>
) BioProspector, using the binomial distribution; (
<bold>B</bold>
) MEME using the hypergeometric distribution; and (
<bold>C</bold>
) Weeder using the binomial distribution. FDR and nCC error bars indicate the mean confidence intervals.</p>
</caption>
<graphic xlink:href="1471-2229-13-42-2"></graphic>
</fig>
</sec>
<sec>
<title>Defining the ranking of post-filtered motifs</title>
<p>In order to rank the predicted remaining motifs after filtering and then combining the results of all three motif discovery programs, Promzea incorporates a published metric, the Mean Normalized Conditional Probability or MNCP [
<xref ref-type="bibr" rid="B30">30</xref>
] (for details, see Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). Briefly MNCP is based on the biological principle that if a promoter/first intron contains multiple occurrences of a given motif, then the chance that motif is non-random is higher. Specifically, the MNCP score allows one to determine if the mean occurrence of any given motif in the data set (where the motif has been defined) is higher than its mean occurrence in a random set of promoters/first introns (e.g. whole genome). A motif with a higher MNCP score has a lower probability of being false.</p>
</sec>
<sec>
<title>Generating the Promzea software pipeline</title>
<p>The above filtering and ranking principles were integrated into the Promzea software pipeline (Figure 
<xref ref-type="fig" rid="F1">1</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Supplementary materials and methods). To match the user input cDNA to the maize genome, full-length cDNAs were retrieved from the maize, rice and Arabidopsis genomes using their GFF files and respective genome data [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
]. For each predicted gene, the corresponding promoters were compiled into a list: the flat file containing ≤1 kb of upstream sequences consisted of 39,656 predicted promoters in the case of maize, 27,416 promoters for Arabidopsis and 58,058 promoters for rice (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S1). At least 70% of the maize genome and 35% of the rice genome are composed of transposable elements [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B31">31</xref>
] which could generate false-positives. In order to overcome this problem, repeat-masked sequences were used to create the promoter flat files. Another problem in motif prediction is the presence of distal
<italic>cis</italic>
-acting elements possibly located up to 50 kb from the transcription starting site [
<xref ref-type="bibr" rid="B33">33</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
]. However, a maximum length of 1 kb was chosen because motif discovery algorithms struggle with larger search spaces which dilute the signal strength, and it is difficult to anticipate the exact position of a distal
<italic>cis</italic>
-acting element. Taking these limitations into account, for motif discovery in Promzea, we applied the same parameters for motif discovery and filtering as used in the Sandve et al. (2007) benchmark validation (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Supplementary materials and methods). In Promzea, the final filtered set of motifs is represented for the user as consensus sequence logos using Weblogo Software [
<xref ref-type="bibr" rid="B35">35</xref>
]. The predicted motifs are ranked using their MNCP scores (see above, and Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). As false positives were observed in the predictions using the benchmark data set, Promzea gives the user quality control visualizations to validate each predicted motif. One such validation is whether the motif is located at a similar position(s) within promoters of different genes. The frequency of motif occurrence at each position, as defined by each motif discovery program, is shown as a graphic using the Chart: Clicker Perl module [
<xref ref-type="bibr" rid="B36">36</xref>
]. Another validation is whether Promzea retrieves promoters of genes consistent with a common genetic pathway, by searching the maize genome for promoters containing each candidate motif. For this form of validation using gene annotations, all the genes having a defined Gene Ontology annotation were compiled into flat files using data from the Gene Ontology project of each genome.</p>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>
<italic>In silico</italic>
validation of filtering then combining motif discovery programs using benchmark data sets</title>
<p>To generate a motif discovery tool, the effectiveness of existing motif discovery tools was first analyzed using benchmark data sets containing known motifs from Sandve et al. (2007). When BioProspector (alone, unfiltered) was applied to the three types of benchmark data sets from Sandve et al. (2007), the average number of true positive motifs (nTPs) predicted was 1191 while the number of false positives (nFPs) was 10,785 (Figure 
<xref ref-type="fig" rid="F3">3</xref>
A-C, Table 
<xref ref-type="table" rid="T2">2</xref>
). Unfiltered MEME predicted an average of 1145 nTPs correctly, but also 29,982 nFPs. By contrast, unfiltered Weeder predicted two-fold more nTPs (2083 on average) but a very high average number of nFPs (99,561; Table 
<xref ref-type="table" rid="T2">2</xref>
). However, each of the three standalone motif discovery programs appeared to identify different sets of motifs (see Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
). It was thus hypothesized that combining the programs (an ensemble-type algorithm) would increase the total number of true positives. In fact, combining the programs increased the number of nTPs to 3185, a >50% increase compared to the best standalone program, Weeder, under the software parameters chosen (Figure 
<xref ref-type="fig" rid="F3">3</xref>
A-C, Table 
<xref ref-type="table" rid="T2">2</xref>
). However, combining the programs also increased the number of nFPs compared to each standalone program. Filtering each motif discovery program separately (from Figure 
<xref ref-type="fig" rid="F2">2</xref>
, earlier) before combining the results reduced the average nFPs by 25.7% compared to the combined unfiltered data yet only reduced nTPs by 8.7% (Figure 
<xref ref-type="fig" rid="F3">3</xref>
A-C, Table 
<xref ref-type="table" rid="T2">2</xref>
). The nCC score after combining all three filtered programs was not significantly different compared to each standalone program, likely because nTPs and nFPs both increased (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Effectiveness of combining different motif discovery programs.</bold>
(
<bold>A</bold>
-
<bold>C</bold>
) The performance of each motif discovery program, applied to the Sandve et al. (2007) benchmark data set, was measured using the total number of true positive nucleotides (nTP, grey bars) and the total number of false positive nucleotides (nFP, black lines). Shown are scores for the three types of data sets that comprise the Sandve dataset: (
<bold>A</bold>
) synthetic (Algorithm Markov), (
<bold>B</bold>
) semi-synthetic (Algorithm Real), and (
<bold>C</bold>
) real promoters (Model Real). Shown are the scores of each standalone unfiltered program, as well as the scores after combining the outputs of the three programs without filtering (combined) or with filtering (combined filt). (
<bold>D</bold>
) The performance of each standalone program or the combined programs was compared using the average nucleotide sensitivity (nSn). Shown are the mean nSn scores for the synthetic data (AM: Algorithm Markov), semi-synthetic data (AR: Algorithm Real) and real data (MR: Model Real). The asterisks (***) indicate that the average nSn score of the combined filtered programs is statistically higher than the average nSn score using Weeder alone at p < 0.01. Each error bar represents the 95% mean confidence interval. (
<bold>E</bold>
) The partition of final true positives found by the three motif discovery tools after filtering is shown. Shared results are motif nucleotides retrieved by at least two of the standalone programs. Filtering and combining the standalone programs are the basis of Promzea.</p>
</caption>
<graphic xlink:href="1471-2229-13-42-3"></graphic>
</fig>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>Combination of motif discovery programs based on measures of true positive and false positive nucleotides</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="center" valign="bottom">
<bold>Tools</bold>
<hr></hr>
</th>
<th colspan="2" align="center" valign="bottom">
<bold>Synthetic data (AM)</bold>
<hr></hr>
</th>
<th colspan="2" align="center" valign="bottom">
<bold>Semi-synthetic data (AR)</bold>
<hr></hr>
</th>
<th colspan="2" align="center" valign="bottom">
<bold>Real data (MR)</bold>
<hr></hr>
</th>
<th colspan="2" align="center" valign="bottom">
<bold>Averages</bold>
<hr></hr>
</th>
</tr>
<tr>
<th align="center"> </th>
<th align="center">
<bold>nTP</bold>
</th>
<th align="center">
<bold>nFP</bold>
</th>
<th align="center">
<bold>nTP</bold>
</th>
<th align="center">
<bold>nFP</bold>
</th>
<th align="center">
<bold>nTP</bold>
</th>
<th align="center">
<bold>nFP</bold>
</th>
<th align="center">
<bold>Average nTP</bold>
</th>
<th align="center">
<bold>Average nFP</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center" valign="bottom">Bioprospector
<hr></hr>
</td>
<td align="center" valign="bottom">995
<hr></hr>
</td>
<td align="center" valign="bottom">10668
<hr></hr>
</td>
<td align="center" valign="bottom">940
<hr></hr>
</td>
<td align="center" valign="bottom">9889
<hr></hr>
</td>
<td align="center" valign="bottom">1638
<hr></hr>
</td>
<td align="center" valign="bottom">11797
<hr></hr>
</td>
<td align="center" valign="bottom">1191
<hr></hr>
</td>
<td align="center" valign="bottom">10785
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">MEME
<hr></hr>
</td>
<td align="center" valign="bottom">1503
<hr></hr>
</td>
<td align="center" valign="bottom">21861
<hr></hr>
</td>
<td align="center" valign="bottom">1134
<hr></hr>
</td>
<td align="center" valign="bottom">25832
<hr></hr>
</td>
<td align="center" valign="bottom">798
<hr></hr>
</td>
<td align="center" valign="bottom">42253
<hr></hr>
</td>
<td align="center" valign="bottom">1145
<hr></hr>
</td>
<td align="center" valign="bottom">29982
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">Weeder
<hr></hr>
</td>
<td align="center" valign="bottom">2104
<hr></hr>
</td>
<td align="center" valign="bottom">86064
<hr></hr>
</td>
<td align="center" valign="bottom">2251
<hr></hr>
</td>
<td align="center" valign="bottom">74945
<hr></hr>
</td>
<td align="center" valign="bottom">1895
<hr></hr>
</td>
<td align="center" valign="bottom">53365
<hr></hr>
</td>
<td align="center" valign="bottom">2083
<hr></hr>
</td>
<td align="center" valign="bottom">99561
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">Combined
<hr></hr>
</td>
<td align="center" valign="bottom">3067
<hr></hr>
</td>
<td align="center" valign="bottom">110825
<hr></hr>
</td>
<td align="center" valign="bottom">2876
<hr></hr>
</td>
<td align="center" valign="bottom">102531
<hr></hr>
</td>
<td align="center" valign="bottom">3462
<hr></hr>
</td>
<td align="center" valign="bottom">110089
<hr></hr>
</td>
<td align="center" valign="bottom">3135
<hr></hr>
</td>
<td align="center" valign="bottom">107815
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Combined filt.</td>
<td align="center">2813</td>
<td align="center">85186</td>
<td align="center">2676</td>
<td align="center">73534</td>
<td align="center">3078</td>
<td align="center">81756</td>
<td align="center">2856</td>
<td align="center">80159</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The Table shows the numbers illustrated in Figure 
<xref ref-type="fig" rid="F3">3</xref>
A-C. Each value is the average result of three runs for each standalone unfiltered program, as well as the scores after combining the outputs of the three programs without filtering (combined) or with filtering (combined filt).</p>
</table-wrap-foot>
</table-wrap>
<p>Compared to each standalone program, combining all three filtered programs also significantly improved the ratio of software-predicted true positives versus the actual number of real motif nucleotides (sensitivity, nSn; Dunn’s Multiple Comparisons Test, p < 0.01). The nSn increased by 22% compared to the most sensitive standalone program, Weeder, under the conditions used (Figure 
<xref ref-type="fig" rid="F3">3</xref>
D; in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S2).</p>
<p>The effectiveness of our strategy was further demonstrated by examining the origin of the final predicted nTPs after all three filtered results had been combined. Of the final number of nTPs retrieved from the benchmark data set, 41% were found to have been discovered by Weeder alone, 16% from MEME alone and 10% from BioProspector alone (Figure 
<xref ref-type="fig" rid="F3">3</xref>
E). Only 33% of nTPs had been found by two or three of the standalone programs. This result confirms that widely used motif discovery programs retrieve distinct sets of motifs and that combining the predictions increases the chance of discovering new regulatory motifs.</p>
<p>Concerning motif ranking using the MNCP score, the analysis using the benchmark Model Real data set showed that as the MNCP score of a predicted motif increased, the chance that it was composed of nucleotide false positives decreased (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S3).</p>
</sec>
<sec>
<title>Validation of Promzea by comparing motif predictions to experimentally defined motifs in the maize anthocyanin and phlobaphene biosynthetic pathways</title>
<p>The effectiveness of Promzea was tested based on its ability to detect experimentally defined binding sites for the maize transcription factors, C1 and P, which upregulate enzymes responsible for the biosynthesis of anthocyanin and phlobaphene, respectively (Figure 
<xref ref-type="fig" rid="F4">4</xref>
) [
<xref ref-type="bibr" rid="B17">17</xref>
-
<xref ref-type="bibr" rid="B20">20</xref>
]. Eight gene promoters containing the C1 and P binding sites were selected (Figure 
<xref ref-type="fig" rid="F4">4</xref>
, red labels). The corresponding cDNAs (including all close homologs, 12 in total; see Additional file
<xref ref-type="supplementary-material" rid="S5">5</xref>
for a list of sequences), were used as input into Promzea following the parameters described (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: supplementary materials and methods). Promzea retrieved 29 genes that matched these cDNAs after BLAST searching (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S4); from the corresponding promoters, five motifs were identified along with their MNCP scores (Figure 
<xref ref-type="fig" rid="F5">5</xref>
).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>The maize anthocyanin and phlobaphene biosynthesis pathways regulated by transcription factors C1 and P.</bold>
Genes encoding biosynthetic enzymes regulated by C1 are shown in red text; those also regulated by P are underlined. C1 and P are homologous proteins [
<xref ref-type="bibr" rid="B21">21</xref>
], and they have been shown to interact with identical binding sites in the
<italic>A1</italic>
promoter [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
].</p>
</caption>
<graphic xlink:href="1471-2229-13-42-4"></graphic>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>Motifs predicted by Promzea for genes encoding the maize anthocyanin biosynthesis pathway.</bold>
Promzea searched for motifs in sequences upstream (−200 bp to +1) of the genes indicated in Figure 
<xref ref-type="fig" rid="F4">4</xref>
as well as their closest DNA sequence paralogs (see Methods). Shown are the sequence logos, the motif discovery program that identified each motif and the corresponding MNCP score. BioP, BioProspector.</p>
</caption>
<graphic xlink:href="1471-2229-13-42-5"></graphic>
</fig>
<p>Of the five motifs predicted by Promzea with MNCP scores >1, two matched the experimentally defined P binding sites (Motif1 and Motif5, Figure 
<xref ref-type="fig" rid="F6">6</xref>
). The partially related C1 motif was found in Motif4 as described below. Based on STAMP [
<xref ref-type="bibr" rid="B37">37</xref>
], Promzea Motif1 and Motif5 were found to be highly similar to the two versions of the experimentally defined binding site of the P-protein (e-value = 2.00e-10 and 2.91e-10; Figure 
<xref ref-type="fig" rid="F6">6</xref>
) [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B38">38</xref>
]. Interestingly, Motif1 and Motif5 were overrepresented in the −60 to −40 and −80 to −60 promoter regions respectively (Figure 
<xref ref-type="fig" rid="F6">6</xref>
), consistent with the experimentally defined −65 to −55 binding site of P in the
<italic>A1</italic>
promoter [
<xref ref-type="bibr" rid="B18">18</xref>
]. Motif1 was also overrepresented in the −120 to −100 promoter region (Figure 
<xref ref-type="fig" rid="F6">6</xref>
), which was consistent with the other experimentally binding sites of P in the
<italic>A1</italic>
promoter at −123 to −88 [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B20">20</xref>
]. Promzea-predicted Motif1 or Motif5 were also retrieved in four out of the five input promoters shown experimentally to contain a P binding site in their promoters (Figure 
<xref ref-type="fig" rid="F4">4</xref>
, underlined red labels); copies of the P binding site were also predicted in the first 200 bp of the promoter of
<italic>PAL1</italic>
, encoding phenylalanine ammonia lyase (Figure 
<xref ref-type="fig" rid="F6">6</xref>
).</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption>
<p>
<bold>Motifs predicted by Promzea compared to experimentally defined motifs in the literature.</bold>
Shown are the motif binding sites for transcription factor P (and C1, see text) in the phlobaphene and anthocyanin biosynthetic pathways. The preferential position of each motif predicted by Promzea is indicated in the fourth column from the right. The e-value for STAMP is indicated by the False Discovery Ratio (FDR). The superscript number in the extreme right column represents the number of motif copies present in the promoter of the indicated gene (−200 bp to +1).</p>
</caption>
<graphic xlink:href="1471-2229-13-42-6"></graphic>
</fig>
<p>Promzea-predicted Motif2 was statistically close (e-value = 4.50e-07) to the MRE binding site identified in an Arabidopsis chalcone synthase promoter [
<xref ref-type="bibr" rid="B19">19</xref>
,
<xref ref-type="bibr" rid="B39">39</xref>
] (Figure 
<xref ref-type="fig" rid="F6">6</xref>
). In Arabidopsis, the MRE motif mediates light responsiveness [
<xref ref-type="bibr" rid="B39">39</xref>
]. Motif2 was retrieved by Promzea in the maize chalcone synthase (
<italic>C2</italic>
) promoter but also in six out of seven other input gene promoters, validating this Promzea prediction (Figure 
<xref ref-type="fig" rid="F6">6</xref>
).</p>
<p>Promzea-predicted Motif4 was similar to motif ACIIPVPAL2 (e-value = 6.50e-08; Figure 
<xref ref-type="fig" rid="F6">6</xref>
) discovered in beans [
<xref ref-type="bibr" rid="B40">40</xref>
]. The ACIIPVPAL2-like element was found in the promoter of
<italic>PAL2</italic>
(
<italic>Phenylalanine Ammonia Lyase 2</italic>
), an ortholog of the maize PAL genes necessary for the biosynthesis of phenylpropanoid secondary metabolites including anthocyanins. PAL1 is the rate-limiting step in anthocyanin biosynthesis. Promzea retrieved the ACIIPVPAL2-like motif in the promoters of
<italic>PAL1</italic>
and four additional anthocyanin genes (
<italic>C2</italic>
,
<italic>A1</italic>
,
<italic>A2</italic>
and
<italic>Bz1</italic>
), again validating Promzea predictions. Interestingly, the CA-rich region at the beginning of Motif4 was related to the C1 consensus binding site (CAACCACCAGTCAA GAC) that was previously defined experimentally [
<xref ref-type="bibr" rid="B20">20</xref>
].</p>
<p>The ability of Promzea to retrieve promoter motifs associated with the anthocyanin pathway that were defined experimentally not only in maize, but in also in other plant species, validates Promzea as an accurate tool for motif discovery.</p>
</sec>
<sec>
<title>A novel candidate motif in the anthocyanin pathway and expansion of the regulatory network to the branched amino acid metabolic pathway</title>
<p>Promzea also retrieved Motif3 as a candidate motif in the anthocyanin biosynthetic pathway, a motif not previously defined experimentally (Figure 
<xref ref-type="fig" rid="F6">6</xref>
). Promzea Motif3 was retrieved from the promoter of
<italic>A1</italic>
and additional paralogs of genes in the anthocyanin pathway (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S4). Motif 3 was over-represented in the −40 to −20 promoter regions of these promoters (Figures 
<xref ref-type="fig" rid="F6">6</xref>
and
<xref ref-type="fig" rid="F7">7</xref>
). In a subsequent search of the maize genome, Motif 3 was retrieved in a total of 762 promoters (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S5); the over-represented GO annotations of the corresponding genes, based on the hypergeometric test, identified these genes as being related to zinc ion binding (p =2.71e-04) and branched chain family amino acid metabolic processes (p = 4.63e-03) (Figure 
<xref ref-type="fig" rid="F7">7</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). The latter annotation was also enriched in the four other predicted motifs (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). As anthocyanin and phlobaphene are derived from phenylalanine, a branched amino acid, this finding appears to validate novel Motif3 as well as the Promzea pipeline, and predicts that anthocyanin biosynthesis may be transcriptionally coordinated with branched chain amino acid biosynthesis.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption>
<p>
<bold>Example of the Promzea output for anthocyanin pathway Motif3.</bold>
For each predicted motif, the following outputs are displayed: (
<bold>A</bold>
) the sequence logo (upper) and the plain consensus sequence (lower); (
<bold>B</bold>
) the frequency of occurrence of the motif at each upstream position range from the user input data set; (
<bold>C</bold>
) summary of annotations of genes containing the motif from the genome-wide retrieval (when applicable). A user can click on the Gene List link and Over-Represented Annotation link to retrieve lists of genes containing the motif and detailed gene annotations, respectively.</p>
</caption>
<graphic xlink:href="1471-2229-13-42-7"></graphic>
</fig>
</sec>
<sec>
<title>Promzea retrieved additional genes that contain the same candidate motifs as the anthocyanin input promoters</title>
<p>As noted above for Motif3, each motif predicted by Promzea from the anthocyanin pathway was used to search the genome to retrieve genes containing that motif (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
; in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S5, anthocyanin pathway genes removed). Interestingly, the five motifs were associated with the same GO annotations: branched chain family amino acid metabolic process, heat shock protein binding, myosin complex or motor activity (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). In total, Promzea retrieved between 131 genes (Motif1) and 762 genes (Motif3) with promoters enriched for any one of these motifs (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S5).</p>
<p>Interestingly, Promzea retrieved 127 genes with promoters that contained all five motifs in the −200 bp regions of their promoters (Table 
<xref ref-type="table" rid="T3">3</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
; in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S6). This list included genes encoding: PAL1, the rate-limiting step in phenylpropanoid biosynthesis which includes anthocyanins; branched amino acid enzymes (as already noted anthocyanin is derived from the branched amino acid phenylalanine); ABC-type transporters (which have been implicated in anthocyanin transport across vacuolar membranes); and regulatory proteins including transcription factors and kinases. Intriguingly, all five anthocyanin promoter motifs were also predicted in the promoters of genes similar to those involved in coordinating sugar, light, cold-temperature and low phosphate dependent activation of anthocyanin biosynthesis, namely: genes similar to gibberellin receptor GID1L2 and gibberellin 20 oxidase; genes similar to those encoding the light-regulatory pathway proteins COP1 and PIF3 (Phytochrome Interacting Factor 3) and numerous sugar transfer/modification enzymes (Table 
<xref ref-type="table" rid="T3">3</xref>
; in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S6).</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>Annotated list of non-anthocyanin pathway genes in the maize genome with promoters containing all 5 of the anthocyanin/phlobaphene-related motifs predicted by Promzea (Motifs 1–5)</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Maize ID</bold>
</th>
<th align="left">
<bold>Annotation (PFAM ID, Maize GDB)</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Branched amino acid phenylpropanoid pathway</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G153536
<hr></hr>
</td>
<td align="left" valign="bottom">Aminotransferase class IV -- Branched-chain-amino-acid aminotransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G055899
<hr></hr>
</td>
<td align="left" valign="bottom">Aminotransferase class IV (branched-chain amino acid aminotransferase 5)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G074604
<hr></hr>
</td>
<td align="left" valign="bottom">Phenylalanine ammonia lyase 1 (PAL1)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Putative light signaling</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G104920
<hr></hr>
</td>
<td align="left" valign="bottom">COP1, putative; Zinc finger, C3HC4 type (RING finger)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G062541
<hr></hr>
</td>
<td align="left" valign="bottom">HLH DNA-binding domain related to phytochrome interacting factor 3 (PIF3)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Putative gibberellin</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G013016
<hr></hr>
</td>
<td align="left" valign="bottom">Gibberellin response modulator protein (GRAS family transcription factor)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G021051
<hr></hr>
</td>
<td align="left" valign="bottom">2OG-Fe(II) oxygenase superfamily related to gibberellin 20 oxidase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G026095
<hr></hr>
</td>
<td align="left" valign="bottom">Carboxylesterase family related to gibberellin receptor GID1L2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Sugar</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC211474.3_FG006
<hr></hr>
</td>
<td align="left" valign="bottom">GDP-fucose protein O-fucosyltransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G018022
<hr></hr>
</td>
<td align="left" valign="bottom">UTP-glucose-1-phosphate uridylyltransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G021243
<hr></hr>
</td>
<td align="left" valign="bottom">GDP-fucose protein O-fucosyltransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G035749
<hr></hr>
</td>
<td align="left" valign="bottom">Glycosyl hydrolase family 14
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G050273
<hr></hr>
</td>
<td align="left" valign="bottom">Raffinose synthase or seed inhibition protein Sip1
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G074462
<hr></hr>
</td>
<td align="left" valign="bottom">Starch binding domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G082037
<hr></hr>
</td>
<td align="left" valign="bottom">UDP-glucoronosyl and UDP-glucosyl transferase related to Flavonol 3-O- glucosyltransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G176630
<hr></hr>
</td>
<td align="left" valign="bottom">Galactosyltransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G178278
<hr></hr>
</td>
<td align="left" valign="bottom">Galactosyltransferase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G368827
<hr></hr>
</td>
<td align="left" valign="bottom">Sugar efflux transporter for intercellular exchange/MTN3 family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Transporter</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC206030.4_FG001
<hr></hr>
</td>
<td align="left" valign="bottom">Drug transmembrane transporter
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G094490
<hr></hr>
</td>
<td align="left" valign="bottom">ABC-2 type transporter domain containing protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G361066
<hr></hr>
</td>
<td align="left" valign="bottom">ABC-2 type transporter
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Regulatory</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G074373
<hr></hr>
</td>
<td align="left" valign="bottom">bZIP transcription factor
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G366434
<hr></hr>
</td>
<td align="left" valign="bottom">AP2-like ethylene-responsive transcription factor PLETHORA 2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G459540
<hr></hr>
</td>
<td align="left" valign="bottom">C2H2-like zinc finger protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G018631
<hr></hr>
</td>
<td align="left" valign="bottom">Zinc finger, C3HC4 type (RING finger)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC196161.3_FG002
<hr></hr>
</td>
<td align="left" valign="bottom">Transcription factor
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G356718
<hr></hr>
</td>
<td align="left" valign="bottom">Myb-like DNA-binding domain and Protein Phosphatase 2C
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G398758
<hr></hr>
</td>
<td align="left" valign="bottom">Myb-like DNA-binding domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G027253
<hr></hr>
</td>
<td align="left" valign="bottom">B3 DNA binding domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G109627
<hr></hr>
</td>
<td align="left" valign="bottom">No apical meristem (NAM) protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC203972.3_FG001
<hr></hr>
</td>
<td align="left" valign="bottom">NB-ARC domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G088140
<hr></hr>
</td>
<td align="left" valign="bottom">G-box binding protein MFMR
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G063961
<hr></hr>
</td>
<td align="left" valign="bottom">Protein kinase domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G142390
<hr></hr>
</td>
<td align="left" valign="bottom">Protein kinase domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G166719
<hr></hr>
</td>
<td align="left" valign="bottom">Protein kinase domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G163297
<hr></hr>
</td>
<td align="left" valign="bottom">RNA recognition motif.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G459746
<hr></hr>
</td>
<td align="left" valign="bottom">RNA recognition motif
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G005622
<hr></hr>
</td>
<td align="left" valign="bottom">F-box family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC209810.3_FG002
<hr></hr>
</td>
<td align="left" valign="bottom">Cysteine protease
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Ribosomal</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G018403
<hr></hr>
</td>
<td align="left" valign="bottom">Ribosomal prokaryotic L21 protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G135095
<hr></hr>
</td>
<td align="left" valign="bottom">Ribosomal protein S18
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G170420
<hr></hr>
</td>
<td align="left" valign="bottom">Ribosomal family S4e
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM5G861978
<hr></hr>
</td>
<td align="left" valign="bottom">Chloroplast 50S ribosomal protein L22
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Chaperone</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G005753
<hr></hr>
</td>
<td align="left" valign="bottom">DnaJ domain (Chaperone)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G085934
<hr></hr>
</td>
<td align="left" valign="bottom">Hsp20/alpha crystallin family chaperone
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G434839
<hr></hr>
</td>
<td align="left" valign="bottom">DnaJ central domain (Chaperone)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Cell trafficking</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC155377.1_FG001
<hr></hr>
</td>
<td align="left" valign="bottom">Myosin family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G044348
<hr></hr>
</td>
<td align="left" valign="bottom">Signal peptide peptidase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G047214
<hr></hr>
</td>
<td align="left" valign="bottom">Nuclear Pore Localization 4 (NPL4) family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G077696
<hr></hr>
</td>
<td align="left" valign="bottom">Regulator of Vps4 ATPase activity in the MVB sorting pathway
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G095441
<hr></hr>
</td>
<td align="left" valign="bottom">Syntaxin
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G113319
<hr></hr>
</td>
<td align="left" valign="bottom">Myosin family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G115775
<hr></hr>
</td>
<td align="left" valign="bottom">SNARE domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Cytochrome P450 oxidoreductase</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G394783
<hr></hr>
</td>
<td align="left" valign="bottom">Oxidoreductase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC217947.4_FG002
<hr></hr>
</td>
<td align="left" valign="bottom">NADPH cytochrome P450 reductase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G106650
<hr></hr>
</td>
<td align="left" valign="bottom">Cytochrome P450
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G147245
<hr></hr>
</td>
<td align="left" valign="bottom">Cytochrome P450 related to cinnamate-4-hydroxylase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G415579
<hr></hr>
</td>
<td align="left" valign="bottom">NAD(P)H-dependent oxidoreductase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Heme</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G025031
<hr></hr>
</td>
<td align="left" valign="bottom">Uroporphyrinogen decarboxylase (URO-D), 5th step in heme biosynthesis
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G071745
<hr></hr>
</td>
<td align="left" valign="bottom">Cytochrome b5-like Heme/Steroid binding domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G028986
<hr></hr>
</td>
<td align="left" valign="bottom">Cytochrome b5-like Heme/Steroid binding domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Cell wall or modification</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G110145
<hr></hr>
</td>
<td align="left" valign="bottom">Cellulose synthase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G113057
<hr></hr>
</td>
<td align="left" valign="bottom">Hydroxyproline-rich glycoprotein family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G336879
<hr></hr>
</td>
<td align="left" valign="bottom">Pectinacetylesterase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G352381
<hr></hr>
</td>
<td align="left" valign="bottom">Pectinacetylesterase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Other</bold>
<hr></hr>
</td>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">AC209810.3_FG002
<hr></hr>
</td>
<td align="left" valign="bottom">Cysteine protease
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G312061
<hr></hr>
</td>
<td align="left" valign="bottom">Cystatin domain and phloem filament protein PP1, proteinase inhibitor
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G325008
<hr></hr>
</td>
<td align="left" valign="bottom">Cystatin domain and phloem filament protein PP1, proteinase inhibitor
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G004188
<hr></hr>
</td>
<td align="left" valign="bottom">Nuclear excision repair XPG N-terminal domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G021277
<hr></hr>
</td>
<td align="left" valign="bottom">Pyridoxal-dependent decarboxylase conserved domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G027241
<hr></hr>
</td>
<td align="left" valign="bottom">Abscisic acid responsive TB2/DP1, HVA22 family
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G027851
<hr></hr>
</td>
<td align="left" valign="bottom">Sodium/hydrogen exchanger family
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G043749
<hr></hr>
</td>
<td align="left" valign="bottom">Uncharacterised protein family (UPF0041)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G047412
<hr></hr>
</td>
<td align="left" valign="bottom">Chromosome segregation protein Spc25
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G070279
<hr></hr>
</td>
<td align="left" valign="bottom">Short chain dehydrogenase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G125448
<hr></hr>
</td>
<td align="left" valign="bottom">Transferase family
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G129979
<hr></hr>
</td>
<td align="left" valign="bottom">G10 protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G143703
<hr></hr>
</td>
<td align="left" valign="bottom">Hydrolase, alpha/beta fold family protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G146207
<hr></hr>
</td>
<td align="left" valign="bottom">Tetratricopeptide repeat containing protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G152370
<hr></hr>
</td>
<td align="left" valign="bottom">WD domain, G-beta repeat
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G168675
<hr></hr>
</td>
<td align="left" valign="bottom">Late embryogenesis abundant protein
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G176129
<hr></hr>
</td>
<td align="left" valign="bottom">NADH dehydrogenase transmembrane subunit
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G325575
<hr></hr>
</td>
<td align="left" valign="bottom">Ferritin-1, iron storage, chloroplastic precursor
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G348039
<hr></hr>
</td>
<td align="left" valign="bottom">Mitochondrial fission ELM1
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G465046
<hr></hr>
</td>
<td align="left" valign="bottom">GDSL-like Lipase/Acylhydrolase
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM2G472236
<hr></hr>
</td>
<td align="left" valign="bottom">Seed maturation protein/LEA
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">GRMZM5G838435
<hr></hr>
</td>
<td align="left" valign="bottom">Hydrolase, alpha/beta fold family domain
<hr></hr>
</td>
</tr>
<tr>
<td align="left">GRMZM5G890241</td>
<td align="left">Leucine rich repeat containing protein</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>These data demonstrate that the genome-wide motif retrieval function of Promzea may allow researchers to predict new genes that may be part of a broader co-regulated network.</p>
</sec>
<sec>
<title>Testing of Promzea using the maize development atlas</title>
<p>To further test the Promzea pipeline using data similar to a typical user, microarray data was used from the Maize Development Atlas, a microarray data set of tissue-specific gene expression [
<xref ref-type="bibr" rid="B23">23</xref>
]. Select motifs associated with each tissue are presented (Figure 
<xref ref-type="fig" rid="F8">8</xref>
) as well as all predicted motifs (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
).</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption>
<p>
<bold>Promzea predictions of promoter motifs associated with tissue-specific gene expression from the maize development atlas </bold>
[
<xref ref-type="bibr" rid="B23">23</xref>
]
<bold>.</bold>
Tissue-specific microarray data was used as input into Promzea, and selected motif predictions are shown and compared to previously identified promoter motifs. Please see Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
for all input sequence data and results.</p>
</caption>
<graphic xlink:href="1471-2229-13-42-8"></graphic>
</fig>
<p>As one case study, a list of 48 embryo-specific transcripts was used as input into Promzea (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
) from which 13 associated promoter motifs were predicted (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
). Using Clover, Promzea then retrieved genes associated with promoters in the genome that contained these motifs along with their associated GO annotation terms: genes enriched with any one of nine of the 13 motifs were annotated as having nutrient reservoir activity (Figure 
<xref ref-type="fig" rid="F8">8</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
), consistent with the embryo being part of the seed. Predicted embryo Motif2 and Motif6 were highly similar to the ABADESI2
<italic>cis</italic>
-acting element (p = 5.06e-08 and p = 1.10e-11 respectively, Figure 
<xref ref-type="fig" rid="F8">8</xref>
), known to be involved in ABA dependent desiccation during seed maturation [
<xref ref-type="bibr" rid="B41">41</xref>
].</p>
<p>As another case study, a total of 134 tassel-specific transcripts were investigated using Promzea, from which 11 motifs were predicted (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
). Genes enriched with any one of 9 out of the 11 motifs in their promoters were annotated as being involved in sexual reproduction (GO:0019953) consistent with the function of the tassel (Figure 
<xref ref-type="fig" rid="F8">8</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
).</p>
<p>From another reproductive tissue, the silk, 12 tissue-specific transcripts were entered into Promzea (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
). Promzea predicted 10 promoter motifs enriched in the promoters of the associated genes, of which six motifs were enriched in promoters retrieved from genome-wide searches, associated with genes involved in sucrose metabolism; other motifs were enriched in genes associated with defence responses to fungi (Figure 
<xref ref-type="fig" rid="F8">8</xref>
), which is consistent with this tissue (e.g. against
<italic>Fusarium</italic>
which can enter through silks).</p>
<p>Interestingly, motifs similar to the Nonamer motif or NONAMERATH4 motif (AGATCGACG) were most frequently predicted by Promzea in silks (four out of 10 motifs), roots (3 out of 10 motifs) and leaves (one out of six motifs) (Figure 
<xref ref-type="fig" rid="F8">8</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
- STAMP outputs). This motif was discovered in the promoter of the Arabidopsis gene encoding Histone 4 [
<xref ref-type="bibr" rid="B42">42</xref>
]. A mutation in Histone 4 was shown to be deleterious to cell specificity of gene expression [
<xref ref-type="bibr" rid="B42">42</xref>
].</p>
<p>These results appear to confirm that Promzea retrieves meaningful motifs associated with co-expressed, tissue-specific genes in data sets that would be typical of users.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>Promzea provides the plant community with a customized interface to detect
<italic>de novo cis</italic>
-acting motifs that are over-represented in the promoters or introns of co-expressed maize genes. By filtering and combining the results of multiple standalone motif discovery programs, Promzea predicts more true motifs than current individual programs without increasing the false discovery ratio (Figure 
<xref ref-type="fig" rid="F3">3</xref>
). For each run output, Promzea provides a ranking of the predicted motifs based on their MNCP scores (Figure 
<xref ref-type="fig" rid="F5">5</xref>
). An MNCP score of ≤1 means that the motif is more frequently present in a random set of maize sequences than the user data set of co-expressed genes. MNCP scores can help eliminate motifs that have a general function in the plant and that are not necessary specific to a condition (e.g. tissue specificity). False positives caused by transposons and retro-elements, which are abundant in the maize and rice genomes [
<xref ref-type="bibr" rid="B43">43</xref>
], were reduced by the use of repeat masked promoter data in addition to the use of MNCP scores. False positives are a problem in any motif discovery program; furthermore,
<italic>cis-</italic>
acting motifs regulate genes at different biological levels that may or may not be of interest (e.g. developmental cue versus an environmental stimulus). Given these caveats, Promzea generates additional outputs to help a user decide which motif(s) to pursue, placing the emphasis back on the user. Promzea searches the maize genome for genes that contain each predicted motif; the corresponding gene annotations are summarized so that a user can decide whether the predicted motif is relevant to the input gene cluster (e.g. belongs to the biological pathway of interest; Figure 
<xref ref-type="fig" rid="F7">7</xref>
C; in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S5). As gene annotations can be limiting, Promzea also generates the complete list of genes that contain each predicted motif (in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S5); a user can then search the list using relevant keywords to determine whether a predicted motif retrieves expected genes. Promzea thus narrows the number of candidate
<italic>cis-</italic>
acting motifs for subsequent experimental validation. Promzea should be especially useful to molecular biologists for the prediction of specific promoters for transgene research and targeted maize improvement; few such promoters currently exist for the maize community.</p>
<p>Users can maximize the utility of Promzea. First, prior to using Promzea, it is critical for the user to define robust clusters of co-expressed genes since motif discovery can be diluted by the presence of extra genes that are not part of the real gene network of interest [
<xref ref-type="bibr" rid="B44">44</xref>
,
<xref ref-type="bibr" rid="B45">45</xref>
]. Second, it is important for the user to know that Promzea employs algorithms that are stochastic in nature, including BioProspector and the selection of random background sequences required for the filtering process. As a result, each Promzea run can generate slightly different outputs. Users are recommended to run Promzea multiple times to verify the uniformity of their results. Finally, Promzea does not compare predicted motifs to motifs previously defined by the research community; for this, the user is encouraged to use STAMP to match a motif to online databases [
<xref ref-type="bibr" rid="B37">37</xref>
], or Matalign [
<xref ref-type="bibr" rid="B38">38</xref>
] for comparisons to motifs found in the literature (Figures 
<xref ref-type="fig" rid="F6">6</xref>
and
<xref ref-type="fig" rid="F8">8</xref>
). Matalign may also be used to compare the different motifs predicted by Promzea to determine if there are likely duplicates.</p>
<p>In this study, the Promzea pipeline was validated, first, by its ability to retrieve experimentally defined binding sites for transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways (Figure 
<xref ref-type="fig" rid="F4">4</xref>
) [
<xref ref-type="bibr" rid="B18">18</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B46">46</xref>
-
<xref ref-type="bibr" rid="B48">48</xref>
]. Our case study revealed that Promzea could potentially identify motifs not only from co-expression data, but also from a virtual data set, which might be expected to have a common
<italic>cis-</italic>
acting motif, such as in promoters of genes belonging to a specific biochemical pathway (Figure 
<xref ref-type="fig" rid="F4">4</xref>
). Our case study also demonstrated that Promzea could not only retrieve valid
<italic>cis-</italic>
acting motifs, but could make novel predictions about the corresponding biological network, as 127 genes in the maize genome had promoters containing all five predicted motifs in the first 200 bp of their promoters (Table 
<xref ref-type="table" rid="T3">3</xref>
; in Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S6). Promzea has thus predicted a broader putative co-regulated gene network than has been identified experimentally, a finding that will need further investigation.</p>
<p>Promzea was also tested using tissue-specific microarray data from the Maize Development Atlas [
<xref ref-type="bibr" rid="B23">23</xref>
] since this type of data is similar to that of a typical Promzea user (Figure 
<xref ref-type="fig" rid="F8">8</xref>
). GO annotations of genes enriched for promoter motifs predicted by Promzea appeared to be logical for the specific tissue (Figure 
<xref ref-type="fig" rid="F8">8</xref>
; Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
): for instance, the GO term ‘sexual reproduction’ was over-represented in 9 out of 11 motifs predicted for tassel-specific transcripts, while the GO term ‘nutrient reserve’ was over-represented in 11 out of 13 embryo predicted motifs. Motifs in some tissues were associated with GO annotations that were not expected, or else there were multiple GO annotations, perhaps suggesting the importance of biological sampling: for example, separating cell types may be critical for software to predict meaningful
<italic>cis</italic>
-acting elements.</p>
<p>As a final lesson, it is noteworthy that mutants in maize transcription factors C1 and P were isolated and characterized 100 years ago [
<xref ref-type="bibr" rid="B49">49</xref>
]. The genes encoding these transcription factors began to be isolated 70–80 years later [
<xref ref-type="bibr" rid="B48">48</xref>
,
<xref ref-type="bibr" rid="B50">50</xref>
]. The binding sites for C1 and P were defined biochemically one decade later [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
]. Our study shows that the bioinformatics prediction of
<italic>cis-</italic>
acting motifs may help to uncover genetic relationships even in well-studied biological pathways, in this case additional genes that are putatively co-regulated with genes encoding anthocyanin and phlobaphene biosynthetic enzymes.</p>
</sec>
<sec sec-type="conclusions">
<title>Conclusions</title>
<p>There was a need for a software program to help maize researchers identify
<italic>de novo cis-</italic>
acting motifs underlying co-expressed suites of genes. Here, we analyzed the accuracy of the most widely used motif discovery programs and showed that they had limited accuracy and retrieved distinct sets of motifs. We applied statistical filters to reduce the false discovery ratios of these programs and then combined the search results to improve motif prediction, and validated this approach using benchmark data. These principles were integrated into an online software program for motif discovery that was customized for maize called Promzea. Promzea was subsequently expanded to include rice and Arabidopsis. Promzea was able to retrieve experimentally defined binding sites of maize transcription factors known to regulate the anthocyanin and phlobaphene biosynthetic pathways. Interestingly, the genome-wide motif discovery function of Promzea predicted a broader network of co-regulated genes. Promzea was also tested using tissue specific microarray data from maize as input. Promzea should be a useful tool for
<italic>de novo</italic>
predictions of
<italic>cis-</italic>
acting motifs from transcriptome data. Promzea is publicly available at
<ext-link ext-link-type="uri" xlink:href="http://www.Promzea.org">http://www.Promzea.org</ext-link>
and on the Discovery Environment of the iPlant Collaborative website.</p>
</sec>
<sec>
<title>Availability and requirements</title>
<p>Promzea is accessible at
<ext-link ext-link-type="uri" xlink:href="http://www/promzea.org">http://www/promzea.org</ext-link>
and was tested on Firefox web browsers.</p>
<p>
<bold>Project Name:</bold>
Promzea</p>
<p>
<bold>Project Home Page:</bold>
<ext-link ext-link-type="uri" xlink:href="http://www.promzea.org">http://www.promzea.org</ext-link>
</p>
<p>
<bold>Operating system(s):</bold>
Platform independent</p>
<p>
<bold>Other requirements:</bold>
None</p>
<p>
<bold>Programming language:</bold>
Perl</p>
<p>
<bold>License:</bold>
Freely available for use</p>
<p>Any restrictions to use by non-academics: Promzea uses programs that require a licence for non-academics users; refer to the individual program licences.</p>
</sec>
<sec>
<title>Abbreviations</title>
<p>HG: Hypergeometric distribution; MEME: Multiple Expectation-maximization for Motif Elicitation; MNCP: Mean Normalized Conditional Probability; nCC: Score, nucleotide correlation coefficient; nFDR: Nucleotide false discovery ratio; nFP: Nucleotide false positive; nTP: Nucleotide true positive; PWM: Position weight matrix</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors’ contributions</title>
<p>CLM developed and implemented Promzea software. CLM, MNR, DA, PDM, FF, MS, participated in the pipeline design. CLM and TL have tested and optimized Promzea Software. CLM and MNR wrote the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>Supplemental materials and methods, and supplemental results.</bold>
Supplementary materials and methods describing the details of the Promzea pipeline including the calculations and optimization of the parameters for filtering, ranking and visualizations. Additional File 1 also contains the supplementary results.</p>
</caption>
<media xlink:href="1471-2229-13-42-S1.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2: Table S1</title>
<p>Summary of promoters and GO annotated genes incorporated into Promzea from maize, Arabidopsis and rice. This table shows the compilation of numbers of promoters, GO annotations and GO-annotated genes retrieved for each plant genome.
<bold>Table S2</bold>
. Effectiveness of combining different motif discovery programs based on nucleotide sensitivity scores (nSn).
<bold>Table S3</bold>
. The effect of applying different MNCP score cut-offs.
<bold>Table S4</bold>
. List of input cDNAs and their corresponding genes from the maize anthocyanin and phlobaphene pathways used for Promzea motif searches. Identification of additional paralogs of genes associated with the maize anthocyanin and phlobaphene biosynthetic pathways. Homologous gene sequences were retrieved that also contained similar promoter motifs, following genome-wide searches by Promzea using the motifs as input. The cDNA sequences were retrieved from Genbank. This list shows corresponding genes from MaizeSequence.org (red text, true loci; blue text, closest paralogs) and additional functional paralogs (extreme right column).
<bold>Table S5</bold>
. Gene lists and annotations found in genome-wide searches for Promzea-predicted Motifs 1–5 from promoters of the maize anthocyanin and phlobaphene biosynthetic pathways.
<bold>Table S6</bold>
. List of the 127 genes in the maize genome with promoters containing all five of the anthocyanin/phlobaphene-related motifs predicted by Promzea.</p>
</caption>
<media xlink:href="1471-2229-13-42-S2.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3</title>
<p>
<bold>Comparison of standalone motif discovery programs.</bold>
Different motif discovery programs predicted motifs embedded in 125 sets of sequences belonging to the Sandve et al. (2007) benchmark data set. The benchmark software calculated the nucleotide Correlation Coefficient scores (nCC scores), a measure of the correlation between the known nucleotide positions and the predicted nucleotide positions. The nCC scores are compared for: (A) BioProspector and MEME, (B) Weeder and MEME, and (C) Weeder and BioProspector. The Spearman correlation (r) between the sets of nCC scores is indicated.</p>
</caption>
<media xlink:href="1471-2229-13-42-S3.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional file 4</title>
<p>
<bold>Effectiveness of combining different motif discovery programs.</bold>
The output of each motif discovery program, applied to the Sandve et al. (2007) benchmark data set, was measured using the Nucleotide Correlation Coefficient (nCC) and the nucleotide Sensitivity (nSn). Shown are scores for the three data sets that comprise the Sandve data set: (A) synthetic (Algorithm Markov), (B) semi-synthetic (Algorithm Real), and (C) real promoters (Model Real). Shown are the scores of each standalone, unfiltered program, as well as the scores after combining the outputs of the three programs with filtering (combined). The error bars represent the 95% mean confidence interval.</p>
</caption>
<media xlink:href="1471-2229-13-42-S4.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional file 5</title>
<p>
<bold>Anthocyanin and phlobaphene pathway gene sequences.</bold>
The sequences of the cDNAs encoding the enzymes involved in the maize anthocyanin and phlobaphene biosynthetic pathways. A subset of these cDNAs is known to contain experimentally defined
<italic>cis</italic>
-acting elements in their promoters that permit co-expression.</p>
</caption>
<media xlink:href="1471-2229-13-42-S5.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional file 6</title>
<p>
<bold>Promzea output for searches of the maize genome with the anthocyanin/phlobaphene-related motifs predicted by Promzea.</bold>
Shown is the user output from the Promzea website or command line.</p>
</caption>
<media xlink:href="1471-2229-13-42-S6.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S7">
<caption>
<title>Additional file 7</title>
<p>
<bold>Supplemental files for testing Promzea with data sets from the Maize Development Atlas.</bold>
The zip folder contains 3 folders. The first contains the promoter input for Promzea for each maize tissue; the second folder has all the outputs from Promzea; the third folder contains the STAMP website outputs for comparisons of the predicted motifs with experimentally defined motifs.</p>
</caption>
<media xlink:href="1471-2229-13-42-S7.zip">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>We thank Lewis Lukens and Gregory Downs (University of Guelph, Canada) for technical advice and use of server space; Mike Peppard, Paul Hobbs and Sean Yo (University of Guelph, Canada) for assistance in setting up server access; and Geir Kjetil Sandve and Kjetil Klepper (Norwegian University of Science and Technology, Norway) for assistance with their benchmark data set.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Vandepoele</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Quimbaya</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Casneuf</surname>
<given-names>T</given-names>
</name>
<name>
<surname>De Veylder</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
<article-title>Unraveling transcriptional control in Arabidopsis using
<italic>cis</italic>
-regulatory elements and coexpression networks</article-title>
<source>Plant Physiol</source>
<year>2009</year>
<volume>150</volume>
<issue>2</issue>
<fpage>535</fpage>
<lpage>546</lpage>
<pub-id pub-id-type="doi">10.1104/pp.109.136028</pub-id>
<pub-id pub-id-type="pmid">19357200</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>MacLean</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jerome</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>J</given-names>
</name>
<article-title>Co-regulation of nuclear genes encoding plastid ribosomal proteins by light and plastid signals during seedling development in tobacco and
<italic>Arabidopsis</italic>
</article-title>
<source>Plant Mol Biol</source>
<year>2008</year>
<volume>66</volume>
<issue>5</issue>
<fpage>475</fpage>
<lpage>490</lpage>
<pub-id pub-id-type="doi">10.1007/s11103-007-9279-z</pub-id>
<pub-id pub-id-type="pmid">18193395</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Schnable</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Ware</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pasternak</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Graves</surname>
<given-names>TA</given-names>
</name>
<article-title>The B73 maize genome: complexity, diversity, and dynamics</article-title>
<source>Science</source>
<year>2009</year>
<volume>326</volume>
<issue>5956</issue>
<fpage>1112</fpage>
<lpage>1115</lpage>
<pub-id pub-id-type="doi">10.1126/science.1178534</pub-id>
<pub-id pub-id-type="pmid">19965430</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Yilmaz</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nishiyama</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Fuentes</surname>
<given-names>BG</given-names>
</name>
<name>
<surname>Souza</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Janies</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Grotewold</surname>
<given-names>E</given-names>
</name>
<article-title>GRASSIUS: a platform for comparative regulatory genomics across the grasses</article-title>
<source>Plant Physiol</source>
<year>2009</year>
<volume>149</volume>
<issue>1</issue>
<fpage>171</fpage>
<lpage>180</lpage>
<pub-id pub-id-type="doi">10.1104/pp.108.128579</pub-id>
<pub-id pub-id-type="pmid">18987217</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Chang</surname>
<given-names>W-C</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>T-Y</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>H-D</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>H-Y</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>R-L</given-names>
</name>
<article-title>PlantPAN: Plant promoter analysis navigator, for identifying combinatorial
<italic>cis</italic>
-regulatory elements with distance constraint in plant gene groups</article-title>
<source>BMC Genomics</source>
<year>2008</year>
<volume>9</volume>
<issue>1</issue>
<fpage>561</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-9-561</pub-id>
<pub-id pub-id-type="pmid">19036138</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Higo</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ugawa</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Iwamoto</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Korenaga</surname>
<given-names>T</given-names>
</name>
<article-title>Plant
<italic>cis</italic>
-acting regulatory DNA elements (PLACE) database: 1999</article-title>
<source>Nucleic Acids Res</source>
<year>1999</year>
<volume>27</volume>
<issue>1</issue>
<fpage>297</fpage>
<lpage>300</lpage>
<pub-id pub-id-type="doi">10.1093/nar/27.1.297</pub-id>
<pub-id pub-id-type="pmid">9847208</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>J</given-names>
</name>
<article-title>PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>39</volume>
<issue>suppl 1</issue>
<fpage>D1114</fpage>
<lpage>D1117</lpage>
<pub-id pub-id-type="pmid">21097470</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Pavesi</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zambelli</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pesole</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Weeder</surname>
<given-names>H</given-names>
</name>
<article-title>An algorithm for finding conserved regulatory motifs and regions in homologous sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<issue>1</issue>
<fpage>46</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-46</pub-id>
<pub-id pub-id-type="pmid">17286865</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Stormo</surname>
<given-names>GD</given-names>
</name>
<article-title>Consensus patterns in DNA</article-title>
<source>Methods Enzymol</source>
<year>1990</year>
<volume>183</volume>
<fpage>211</fpage>
<lpage>221</lpage>
<pub-id pub-id-type="pmid">2179676</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="book">
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Brutlag</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<person-group person-group-type="editor">Altman RB, Dunker AK, Hunter L, Klein TE</person-group>
<article-title>BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes</article-title>
<source>Pacific Symposium on Biocomputing 2001</source>
<year>2001</year>
<publisher-name>Hackensack, New Jersey, USA: World Scientific Press</publisher-name>
<fpage>127</fpage>
<lpage>138</lpage>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="book">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Elkan</surname>
<given-names>C</given-names>
</name>
<article-title>Fitting a mixture model by expectation maximization to discover motifs in biopolymers</article-title>
<source>Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology</source>
<year>1994</year>
<publisher-name>Menlo Park, California: AAAI Press</publisher-name>
<fpage>28</fpage>
<lpage>36</lpage>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Lawrence</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Boguski</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Neuwald</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Wootton</surname>
<given-names>JC</given-names>
</name>
<article-title>Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment</article-title>
<source>Science</source>
<year>1993</year>
<volume>262</volume>
<issue>5131</issue>
<fpage>208</fpage>
<lpage>214</lpage>
<pub-id pub-id-type="doi">10.1126/science.8211139</pub-id>
<pub-id pub-id-type="pmid">8211139</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Hu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D</given-names>
</name>
<article-title>EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<issue>1</issue>
<fpage>342</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-7-342</pub-id>
<pub-id pub-id-type="pmid">16839417</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Che</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
<article-title>BEST: Binding-site estimation suite of tools</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<issue>12</issue>
<fpage>2909</fpage>
<lpage>2911</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti425</pub-id>
<pub-id pub-id-type="pmid">15814553</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Wijaya</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>S-M</given-names>
</name>
<name>
<surname>Son</surname>
<given-names>NT</given-names>
</name>
<name>
<surname>Kanagasabai</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sung</surname>
<given-names>W-K</given-names>
</name>
<article-title>MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<issue>20</issue>
<fpage>2288</fpage>
<lpage>2295</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn420</pub-id>
<pub-id pub-id-type="pmid">18697768</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Sandve</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Abul</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Walseng</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Drablos</surname>
<given-names>F</given-names>
</name>
<article-title>Improved benchmarks for computational motif discovery</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<issue>1</issue>
<fpage>193</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-193</pub-id>
<pub-id pub-id-type="pmid">17559676</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Dooner</surname>
<given-names>HK</given-names>
</name>
<name>
<surname>Robbins</surname>
<given-names>TP</given-names>
</name>
<name>
<surname>Jorgensen</surname>
<given-names>RA</given-names>
</name>
<article-title>Genetic and developmental control of anthocyanin biosynthesis</article-title>
<source>Annu Rev Genet</source>
<year>1991</year>
<volume>25</volume>
<issue>1</issue>
<fpage>173</fpage>
<lpage>199</lpage>
<pub-id pub-id-type="doi">10.1146/annurev.ge.25.120191.001133</pub-id>
<pub-id pub-id-type="pmid">1839877</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Grotewold</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Drummond</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Bowen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>T</given-names>
</name>
<article-title>The myb-homologous
<italic>P</italic>
gene controls phlobaphene pigmentation in maize floral organs by directly activating a flavonoid biosynthetic gene subset</article-title>
<source>Cell</source>
<year>1994</year>
<volume>76</volume>
<issue>3</issue>
<fpage>543</fpage>
<lpage>553</lpage>
<pub-id pub-id-type="doi">10.1016/0092-8674(94)90117-1</pub-id>
<pub-id pub-id-type="pmid">8313474</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Lesnick</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Chandler</surname>
<given-names>VL</given-names>
</name>
<article-title>Activation of the maize anthocyanin gene A2 is mediated by an element conserved in many anthocyanin promoters</article-title>
<source>Plant Physiol</source>
<year>1998</year>
<volume>117</volume>
<issue>2</issue>
<fpage>437</fpage>
<lpage>445</lpage>
<pub-id pub-id-type="doi">10.1104/pp.117.2.437</pub-id>
<pub-id pub-id-type="pmid">9625696</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Tuerck</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Fromm</surname>
<given-names>ME</given-names>
</name>
<article-title>Elements of the maize
<italic>A1</italic>
promoter required for transactivation by the anthocyanin
<italic>B/C1</italic>
or phlobaphene
<italic>P</italic>
regulatory genes</article-title>
<source>Plant Cell</source>
<year>1994</year>
<volume>6</volume>
<issue>11</issue>
<fpage>1655</fpage>
<lpage>1663</lpage>
<pub-id pub-id-type="pmid">7827497</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Grotewold</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sainz</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Tagliani</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hernandez</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Bowen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Chandler</surname>
<given-names>VL</given-names>
</name>
<article-title>Identification of the residues in the Myb domain of maize C1 that specify the interaction with the bHLH cofactor R</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2000</year>
<volume>97</volume>
<issue>25</issue>
<fpage>13579</fpage>
<lpage>13584</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.250379897</pub-id>
<pub-id pub-id-type="pmid">11095727</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<name>
<surname>Sainz</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Grotewold</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Chandler</surname>
<given-names>VL</given-names>
</name>
<article-title>Evidence for direct activation of an anthocyanin promoter by the maize C1 protein and comparison of DNA binding by related Myb domain proteins</article-title>
<source>Plant Cell</source>
<year>1997</year>
<volume>9</volume>
<issue>4</issue>
<fpage>611</fpage>
<lpage>625</lpage>
<pub-id pub-id-type="pmid">9144964</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Sekhon</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Childs</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Hansey</surname>
<given-names>CN</given-names>
</name>
<name>
<surname>Buell</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>de Leon</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kaeppler</surname>
<given-names>SM</given-names>
</name>
<article-title>Genome-wide atlas of transcription during maize development</article-title>
<source>Plant J</source>
<year>2011</year>
<volume>66</volume>
<issue>4</issue>
<fpage>553</fpage>
<lpage>563</lpage>
<pub-id pub-id-type="doi">10.1111/j.1365-313X.2011.04527.x</pub-id>
<pub-id pub-id-type="pmid">21299659</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Karolchik</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Roskin</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Sugnet</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Haussler</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<article-title>The UCSC Table Browser data retrieval tool</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<issue>suppl 1</issue>
<fpage>D493</fpage>
<lpage>D496</lpage>
<pub-id pub-id-type="pmid">14681465</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Schmid</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Bucher</surname>
<given-names>P</given-names>
</name>
<article-title>ChIP-Seq data reveal nucleosome architecture of human promoters</article-title>
<source>Cell</source>
<year>2007</year>
<volume>131</volume>
<issue>5</issue>
<fpage>831</fpage>
<lpage>832</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2007.11.017</pub-id>
<pub-id pub-id-type="pmid">18045524</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Goff</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Vaughn</surname>
<given-names>M</given-names>
</name>
<name>
<surname>McKay</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stapleton</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Gessler</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Matasci</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hanlon</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lenards</surname>
<given-names>A</given-names>
</name>
<article-title>The iPlant Collaborative: cyberinfrastructure for plant biology</article-title>
<source>Frontiers Plant Sci</source>
<year>2011</year>
<volume>2</volume>
<fpage>34</fpage>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Grant</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Noble</surname>
<given-names>WS</given-names>
</name>
<article-title>FIMO: scanning for occurrences of a given motif</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<issue>7</issue>
<fpage>1017</fpage>
<lpage>1018</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr064</pub-id>
<pub-id pub-id-type="pmid">21330290</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Zambelli</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pesole</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pavesi</surname>
<given-names>G</given-names>
</name>
<article-title>Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<issue>suppl 2</issue>
<fpage>W247</fpage>
<lpage>W252</lpage>
<pub-id pub-id-type="pmid">19487240</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Hansen</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z</given-names>
</name>
<article-title>Detection of functional DNA motifs via statistical over-representation</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<issue>4</issue>
<fpage>1372</fpage>
<lpage>1381</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh299</pub-id>
<pub-id pub-id-type="pmid">14988425</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Clarke</surname>
<given-names>ND</given-names>
</name>
<name>
<surname>Granek</surname>
<given-names>JA</given-names>
</name>
<article-title>Rank order metrics for quantifying the association of sequence features with gene regulation</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<issue>2</issue>
<fpage>212</fpage>
<lpage>218</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/19.2.212</pub-id>
<pub-id pub-id-type="pmid">12538241</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<collab>Sequencing Project International Rice G</collab>
<article-title>The map-based sequence of the rice genome</article-title>
<source>Nature</source>
<year>2005</year>
<volume>436</volume>
<issue>7052</issue>
<fpage>793</fpage>
<lpage>800</lpage>
<pub-id pub-id-type="doi">10.1038/nature03895</pub-id>
<pub-id pub-id-type="pmid">16100779</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<name>
<surname>Lamesch</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Berardini</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Swarbreck</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wilks</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sasidharan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Muller</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Dreher</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Garcia-Hernandez</surname>
<given-names>M</given-names>
</name>
<article-title>The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools</article-title>
<source>Nucleic Acids Res</source>
<year>2012</year>
<volume>40</volume>
<issue>D1</issue>
<fpage>D1202</fpage>
<lpage>D1210</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkr1090</pub-id>
<pub-id pub-id-type="pmid">22140109</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Levine</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tjian</surname>
<given-names>R</given-names>
</name>
<article-title>Transcription regulation and animal diversity</article-title>
<source>Nature</source>
<year>2003</year>
<volume>424</volume>
<issue>6945</issue>
<fpage>147</fpage>
<lpage>151</lpage>
<pub-id pub-id-type="doi">10.1038/nature01763</pub-id>
<pub-id pub-id-type="pmid">12853946</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Zheng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Kawagoe</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Okita</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Hau</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Murai</surname>
<given-names>N</given-names>
</name>
<article-title>5′ distal and proximal
<italic>cis</italic>
-acting regulator elements are required for developmental control of a rice seed storage protein glutelin gene</article-title>
<source>Plant J</source>
<year>1993</year>
<volume>4</volume>
<issue>2</issue>
<fpage>357</fpage>
<lpage>366</lpage>
<pub-id pub-id-type="doi">10.1046/j.1365-313X.1993.04020357.x</pub-id>
<pub-id pub-id-type="pmid">8220486</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Crooks</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chandonia</surname>
<given-names>J-M</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
<article-title>WebLogo: a sequence logo generator</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<issue>6</issue>
<fpage>1188</fpage>
<lpage>1190</lpage>
<pub-id pub-id-type="doi">10.1101/gr.849004</pub-id>
<pub-id pub-id-type="pmid">15173120</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="other">
<name>
<surname>Watson</surname>
<given-names>CG</given-names>
</name>
<source>Chart-Clicker</source>
<year>2010</year>
<comment>In:
<ext-link ext-link-type="uri" xlink:href="http://searchcpanorg/~gphat/Chart-Clicker-267/lib/Chart/Clickerpm">http://searchcpanorg/~gphat/Chart-Clicker-267/lib/Chart/Clickerpm</ext-link>
. 2.67 edn: the CPAN</comment>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Mahony</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Benos</surname>
<given-names>PV</given-names>
</name>
<article-title>STAMP: a web tool for exploring DNA-binding motif similarities</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<issue>Web Server issue</issue>
<fpage>W253</fpage>
<lpage>W258</lpage>
<pub-id pub-id-type="pmid">17478497</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Kankainen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Loytynoja</surname>
<given-names>A</given-names>
</name>
<article-title>MATLIGN: a motif clustering, comparison and matching tool</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<issue>1</issue>
<fpage>189</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-189</pub-id>
<pub-id pub-id-type="pmid">17559640</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Hartmann</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Valentine</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Christie</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Hays</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jenkins</surname>
<given-names>GI</given-names>
</name>
<name>
<surname>Weisshaar</surname>
<given-names>B</given-names>
</name>
<article-title>Identification of UV/blue light-response elements in the
<italic>Arabidopsis thaliana</italic>
chalcone synthase promoter using a homologous protoplast transient expression system</article-title>
<source>Plant Mol Biol</source>
<year>1998</year>
<volume>36</volume>
<issue>5</issue>
<fpage>741</fpage>
<lpage>754</lpage>
<pub-id pub-id-type="doi">10.1023/A:1005921914384</pub-id>
<pub-id pub-id-type="pmid">9526507</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>Hatton</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sablowski</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Yung</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Schuch</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Bevan</surname>
<given-names>M</given-names>
</name>
<article-title>Two classes of
<italic>cis</italic>
sequences contribute to tissue-specific expression of a PAL2 promoter in transgenic tobacco</article-title>
<source>Plant J</source>
<year>1995</year>
<volume>7</volume>
<issue>6</issue>
<fpage>859</fpage>
<lpage>876</lpage>
<pub-id pub-id-type="doi">10.1046/j.1365-313X.1995.07060859.x</pub-id>
<pub-id pub-id-type="pmid">7599647</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Lam</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Chua</surname>
<given-names>NH</given-names>
</name>
<article-title>Tetramer of a 21-base pair synthetic element confers seed expression and transcriptional enhancement in response to water stress and abscisic acid</article-title>
<source>J Biol Chem</source>
<year>1991</year>
<volume>266</volume>
<issue>26</issue>
<fpage>17131</fpage>
<lpage>17135</lpage>
<pub-id pub-id-type="pmid">1832669</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Chaubet</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Flenet</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Clement</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Brignon</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Gigot</surname>
<given-names>C</given-names>
</name>
<article-title>Identification of cis-elements regulating the expression of an Arabidopsis histone H4 gene</article-title>
<source>Plant J</source>
<year>1996</year>
<volume>10</volume>
<issue>3</issue>
<fpage>425</fpage>
<lpage>435</lpage>
<pub-id pub-id-type="doi">10.1046/j.1365-313X.1996.10030425.x</pub-id>
<pub-id pub-id-type="pmid">8811858</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Baucom</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Estill</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Chaparro</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Upshaw</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Jogi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Deragon</surname>
<given-names>J-M</given-names>
</name>
<name>
<surname>Westerman</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>SanMiguel</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Bennetzen</surname>
<given-names>JL</given-names>
</name>
<article-title>Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome</article-title>
<source>PLoS Genet</source>
<year>2009</year>
<volume>5</volume>
<issue>11</issue>
<fpage>e1000732</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pgen.1000732</pub-id>
<pub-id pub-id-type="pmid">19936065</pub-id>
</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>E-Y</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S-Y</given-names>
</name>
<name>
<surname>Ashlock</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Nam</surname>
<given-names>D</given-names>
</name>
<article-title>MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<issue>1</issue>
<fpage>260</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-260</pub-id>
<pub-id pub-id-type="pmid">19698124</pub-id>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="journal">
<name>
<surname>McNicholas</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>TB</given-names>
</name>
<article-title>Model-based clustering of microarray expression data via latent Gaussian mixture models</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>21</issue>
<fpage>2705</fpage>
<lpage>2712</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq498</pub-id>
<pub-id pub-id-type="pmid">20802251</pub-id>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="journal">
<name>
<surname>Carey</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Strahle</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Selinger</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Chandler</surname>
<given-names>VL</given-names>
</name>
<article-title>Mutations in the
<italic>pale aleurone color1</italic>
regulatory gene of the
<italic>Zea mays</italic>
anthocyanin pathway have distinct phenotypes relative to the functionally similar
<italic>TRANSPARENT TESTA GLABRA1</italic>
gene in
<italic>Arabidopsis thaliana</italic>
</article-title>
<source>Plant Cell</source>
<year>2004</year>
<volume>16</volume>
<issue>2</issue>
<fpage>450</fpage>
<lpage>464</lpage>
<pub-id pub-id-type="doi">10.1105/tpc.018796</pub-id>
<pub-id pub-id-type="pmid">14742877</pub-id>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="journal">
<name>
<surname>Bodeau</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Walbot</surname>
<given-names>V</given-names>
</name>
<article-title>Structure and regulation of the maize
<italic>Bronze2</italic>
promoter</article-title>
<source>Plant Mol Biol</source>
<year>1996</year>
<volume>32</volume>
<issue>4</issue>
<fpage>599</fpage>
<lpage>609</lpage>
<pub-id pub-id-type="doi">10.1007/BF00020201</pub-id>
<pub-id pub-id-type="pmid">8980512</pub-id>
</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="journal">
<name>
<surname>Cone</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Burr</surname>
<given-names>FA</given-names>
</name>
<name>
<surname>Burr</surname>
<given-names>B</given-names>
</name>
<article-title>Molecular analysis of the maize anthocyanin regulatory locus
<italic>C1</italic>
</article-title>
<source>Proc Natl Acad Sci</source>
<year>1986</year>
<volume>83</volume>
<issue>24</issue>
<fpage>9631</fpage>
<lpage>9635</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.83.24.9631</pub-id>
<pub-id pub-id-type="pmid">3025847</pub-id>
</mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="journal">
<name>
<surname>East</surname>
<given-names>EM</given-names>
</name>
<article-title>Inheritance of color in the aleurone cells of maize</article-title>
<source>Am Nat</source>
<year>1912</year>
<volume>46</volume>
<issue>546</issue>
<fpage>363</fpage>
<lpage>365</lpage>
<pub-id pub-id-type="doi">10.1086/279285</pub-id>
</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="journal">
<name>
<surname>Styles</surname>
<given-names>ED</given-names>
</name>
<name>
<surname>Ceska</surname>
<given-names>O</given-names>
</name>
<article-title>The genetic control of flavonoid synthesis in maize</article-title>
<source>Can J Genet Cytol</source>
<year>1977</year>
<volume>19</volume>
<issue>2</issue>
<fpage>289</fpage>
<lpage>302</lpage>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000275 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000275 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3658923
   |texte=   Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:23497159" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024